Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

A new golf challenge - Natural Sorting

by theroninwins (Friar)
on Mar 17, 2006 at 12:53 UTC ( [id://537451]=perlquestion: print w/replies, xml ) Need Help??

theroninwins has asked for the wisdom of the Perl Monks concerning the following question:

Ok since I was away for quite some time I thought I throw in a new golf challenge... (It is an official one that is over but I still thing it is a good one):

"Human Sort"

Sort a set of lines, but with a twist.

When sorting a set of lines, humans like to see the numeric parts of a line sorted numerically and the alphabetical parts case insensitive. For the numeric parts you may ignore the possibility of negative numbers, floating point numbers and leading zeros, the numbers however are almost unconstrained in length. In other words, a sequence of one or more digits plays the same sort of role as some kind of single character, and such a sequence sorts before all normal letters, but among themselves they are ordered by numerical value. Example:

1 < A < amstelveen < Amsterdam < Amsterdam5 < Amsterdam40 < Amsterdamned

You are free in how you order e.g. fOo versus FoO or a digit versus space.

Notes:

* watch out, the original version of this question did not request case insensitivity.
* watch out, the original version of this question allowed leading zeros in the number parts, but that made programs substantially more complicated and less fun. While this change won't invalidate any existing solution, maybe you can do better now....
* The strings can also be things like: 57amstel25venn865314457646576532325988watte
* Remember, if your program is going to generate (temporary) strings of size > 2**32, it is no good. So avoid things like
"a865314457646576532325988watte" =~ /(\d+)/; 0 x $1;

OK happy golfing. I hope you like this one and YES it is something beginners to perl can try as well.

Replies are listed 'Best First'.
Re: A new golf challenge - Natural Sorting
by jdporter (Paladin) on Mar 17, 2006 at 14:30 UTC

    Not truly golfed; more of a canonical solution:

    sub _nsc { my( $x, @a ) = @{$_[0]}; my( $y, @b ) = @{$_[1]}; @a && @b ? ( $x =~ /\d/ ? ( $y =~ /\d/ ? $x <=> $y || _nsc(\@a,\@b) : -1 ) : ( $y =~ /\d/ ? 1 : lc($x) cmp lc($y) || _nsc(\@a,\@b) )) : ( @a ? 1 : @b ? -1 : 0 ) } sub natsort { map join('',@$_), sort {_nsc($a,$b)} map [/(\d+|\D+)/g], @_ }
    We're building the house of the future together.

      Wups - there's a bug in that version of _nsc. Here's a better one:

      sub _nsc { my @a = @{$_[0]}; my @b = @{$_[1]}; @a && @b or return( @a ? 1 : @b ? -1 : 0 ); my $x = shift @a; my $y = shift @b; $x =~ /\d/ ? ( $y =~ /\d/ ? length($x)<=>length($y) || $x <=> $y || _nsc(\@a,\@b) : -1 ) : ( $y =~ /\d/ ? 1 : lc($x) cmp lc($y) || _nsc(\@a,\@b) ) }

      (This version also optimizes to only compare numeric strings as numbers iff they have the same length.)

      We're building the house of the future together.
      Forget golf, it doesn't even work. <=> will fail for 865314457646576532325988.
        Which means that a solution more robust than <=> will have to compare numbers using arbitrary precision (and that just pushes the threshold a few orders of magnitude farther out). Which means that numbers will have to be recognized as such, so that -123.4e-45.6E789 would not be taken as a single number, but -123.4e-456 will be. And then we get into whether whitespace, underscores, and commas/decimals (depending on locale) are allowed, and what they mean.

        It's a long tee shot, onto a glacier, and only transparent golf balls are available.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

        Ouch. ;-)
Re: A new golf challenge - Natural Sorting
by thundergnat (Deacon) on Mar 17, 2006 at 17:41 UTC

    Not so much golfed as untidy... :-(

    Update: chopped out a few more strokes. later: and a few more...

    sub n{for(\@a,\@b){@$_=split/(?<=\d)(?=\D)|(?<=\D)(?=\d)/,lc shift@_};for(0..$#a){$j=$a[$_];$k=$b[$_];return$j=~/\d/&$k=~/\d/?($j== +$k?next:$j<=>$k):($j=~/\D/&$k=~/\D/?($j eq$k?next:$j cmp$k):(defined$k?$j cmp$k:1))}} @testarray = (qw/86 101 10 20 1 A amstelveen a10 A1 Amsterdam Amsterdam5 Amsterdam40 Amsterdamned 112kimn678mn09 112kimm678mn19 112kimm678mn09 112kimm678mn9 865314457646576532325988/); print join "\n", sort { n($a,$b)} @testarray;
Re: A new golf challenge - Natural Sorting (OneTruePerlSort)
by tye (Sage) on Mar 17, 2006 at 20:42 UTC

    It is small (97 strokes, including the meaningful newline), but it is also very fast. There are certainly places where a few characters can be shaved but several of those break subtle features such as one demonstrated in the example code: sorting the sample data correctly even when each record ends with a newline.

    use strict; use warnings; sub naturalSort { my$i;s/(\d+)/pack"aNa*",0,length$1,$1/ge,$_.=$".$i++for # 345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345 my@x=map{lc}@_;@_[map{(split)[-1]}sort@x] #8 6 2345678 7 2345678 8 2345678 9 234567 } my @data= <DATA>; chomp @data if $\; print for naturalSort(@data); __END__ amsTerdam40 amstelveen 1 AmstErdam5 Amsterdam amsterDamed A

    Run it normally or with "perl -l ..." to see that it sorts the same with or without the newlines in the strings being sorted.

    - tye        

Re: A new golf challenge - Natural Sorting
by smokemachine (Hermit) on Mar 17, 2006 at 16:14 UTC
    perl -e '$\=$/;$,=" ";@d=@a=sort{lc$a cmp lc$b}@ARGV;for(0..$#a){if($a +[$_]=~/(\d+)/){$b{$1}=$_;push@c,$_}} for(sort{$a<=>$b}keys%b){$d[shif +t@c]=$a[$b{$_}]}print@d' 1 A amstelveen Amsterdam Amsterdam5 Amsterda +m40 Amsterdamned
      $ perl -e 'sub k{for(@p=$_[0]=~/\d+|[a-z]+/gi){if(/^\d/){s/^0+//;$l=le +ngth;$_="9"x($l/9).$l%9 .$_}}lc join"\0",@p}@s=sort{k($a)cmp k$b}@ARG +V;print"@s\n"' 1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amster +damned 1 A amstelveen Amsterdam Amsterdam5 Amsterdam40 Amsterdamned
      and numbers are not limited to any range.
Re: A new golf challenge - Natural Sorting
by bart (Canon) on Apr 03, 2006 at 10:15 UTC
    I'm not really interested in Perl Golf, hence this late reply, but I'd like to contribute by stating my favourite approach for natural sorting, that apparently nobody has touched in this thread. The basis is this:
    split /(\d+)/
    This will split the example strings like this:
    1
    "", 1
    A
    "A"
    amstelveen
    "amstelveen"
    Amsterdam
    "Amsterdam"
    Amsterdam5
    "Amsterdam", 5
    Amsterdam40
    "Amsterdam", 40
    Amsterdam40b
    "Amsterdam", 40, "b"
    Amsterdamned
    "Amsterdamned"
    and now for some more complex examples:
    Chapter 1 Section 3
    "Chapter ", 1, "Section ", 3
    Chapter 1 Section 10
    "Chapter ", 1, "Section ", 10

    Now the basic trick to do natural sorting, is partwise comparison of the items, comparing the even numbered items (starting at 0) with alphabetical comparison, case insensitive even if you like, and the odd numbered items with numerical comparison.

    This is code that does basically that:

    sub natcomp { my @a = split /(\d+)/, $a; my @b = split /(\d+)/, $b; my $last = min(scalar @a, scalar @b)-1; my $cmp; for my $i (0 .. $last) { unless($i & 1) { # even $cmp = lc $a[$i] cmp lc $b[$i] || $a[$i] cmp $b[$i] and re +turn $cmp; }else { # odd $cmp = $a[$i] <=> $b[$i] and return $cmp; } } return scalar @a <=> scalar @b; # shortest array comes first }

    Let's try this with:

    chomp(my @array = <DATA>); $\ = "\n"; print for sort natcomp @array; __DATA__ 1 A amstelveen Amsterdam Amsterdam40 Amsterdam40b Amsterdam5 Amsterdamned Chapter 1 Section 10 Chapter 1 Section 3 Chapter 10 Section 2 Chapter 2 Section 1
    (Note that the DATA section is sorted alphabetically, and that it contains an empty string)

    The result is:

    
    1
    A
    amstelveen
    Amsterdam
    Amsterdam5
    Amsterdam40
    Amsterdam40b
    Amsterdamned
    Chapter 1 Section 3
    Chapter 1 Section 10
    Chapter 2 Section 1
    Chapter 10 Section 2
    

    I hope that order is to your liking.

Re: A new golf challenge - Natural Sorting
by GrandFather (Saint) on Mar 17, 2006 at 18:50 UTC

    Perhaps OP could provide a test data set as a __DATA__ block and provide sample output illustrating the expected sort order for the given data? It could well be that test strings will be added as new edge cases are described. Perhaps the first line should be: 0 Data set version 1 so that it is obvious in sample output which "rules" any particular solution has been tested against.


    DWIM is Perl's answer to Gödel

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://537451]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-03-28 17:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found