http://qs321.pair.com?node_id=521135

BUU has asked for the wisdom of the Perl Monks concerning the following question:

I have a series of strings that look much like:
0/1/2/3 0 0/4/5/6 0/4 0/1 0/1/2 0/4/5 0/10/111/145 0/10/111 0/10
Now I'm trying to sort them. The correct order is:
0 0/1 0/1/2 0/1/2/3 0/4 0/4/5 0/4/5/6 0/10 0/10/111 0/10/111/145
Perl's default sort using cmp comes very close, but it places 0/10 before 0/4, and it should not. I've found a really ugly solution involving a couple of for loops and such things, but I was really hoping for anything faster and cleaner. Any ideas?

Replies are listed 'Best First'.
Re: Sorting path like strings
by BrowserUk (Patriarch) on Jan 05, 2006 at 08:46 UTC

    If your numeric values are all less than 255, then this works. You could swap the ST to a GRT or OM if performance is a concern. If your numbers get larger than 255 you could probably use unicode pack 'U*', but I haven't tested that.

    @data = qw[ 0/1/2/3 0 0/4/5/6 0/4 0/1 0/1/2 0/4/5 0/10/111/145 0/10/111 0/10 ];; print for map{ $_->[0] } sort{ $a->[1] cmp $b->[1] } map{ [ $_, pack 'C*', split '/', ] } @data;; 0 0/1 0/1/2 0/1/2/3 0/4 0/4/5 0/4/5/6 0/10 0/10/111 0/10/111/145

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      How do you print newlines ("\n") here?

        My perl shell has the -l switch on the shebang line which means amongst other things) that print statements have a newline appended to them automatically. See perlrun for the details.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Are you sure that non-C locales won’t cause this to sort in orders other than ASCIIbetical (and thus wrongly), particularly if you add Unicode to the mix?

      Makeshifts last the longest.

        Are you sure that non-C locales won’t cause this to sort in orders other than ASCIIbetical

        Um, no. Did I imply that I was?

        I assume that anyone using non-C locales will know they are and know that they will have to take special steps to get sorts to operate correctly. This assumption is based upon the fact that many uses of GRT (including yours elsewhere in the thread), and as far as I can tell from looking, many of the various sorting modules on CPAN will be similarly affected.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sorting path like strings
by borisz (Canon) on Jan 05, 2006 at 08:45 UTC
    sub do_the_sort { my @aa = split /\//, $a; my @bb = split /\//, $b; for ( 0 .. $#aa ) { return 1 unless defined $bb[$_]; my $zz = $aa[$_] <=> $bb[$_]; return $zz if $zz; } -1; } my @y = sort do_the_sort @x;
    Boris
      borisz,
      I think the following recursive implementation does basically the same thing - don't ask me how I came up with it.
      my @data = map {chomp; [ split '/' ] } <DATA>; @data = map { join '/', @$_ } sort my_sort @data; sub my_sort { my $i = shift || 0; if ( defined $a->[$i] && defined $b->[$i] ) { return $a->[$i] <=> $b->[$i] || my_sort(++$i); } return defined $a->[$i] ? 1 : -1; }

      Cheers - L~R

Re: Sorting path like strings
by athomason (Curate) on Jan 05, 2006 at 08:52 UTC
    Your sort operator needs to impose a lexicographic ordering on the lists defined by your "paths", as in this example:
    #!/usr/bin/perl use strict; use warnings; # parse the paths into lists my @data; while ( <DATA> ) { chomp; push @data, [ split m:/:, $_ ]; } sub compare_lists_lexicographically { # make copies of the input lists my @left = @$a; my @right = @$b; # chew threw the lists until # a) we discover they're different lengths, or # b) the elements are different, or # c) neither while ( 1 ) { if ( !@left && !@right ) { # ran out at the same time: they're equal return 0; } elsif ( !@left ) { # left is shorter than right: left is first return -1; } elsif ( !@right ) { # right is shorter than left: right is first return 1; } else { # check the next element from each list my ( $l, $r ) = ( shift @left, shift @right ); # NB: use < if fields are numeric, lt otherwise if ( $l < $r ) { # next element of left comes first return -1; } elsif ( $r < $l ) { # next element of right comes first return 1; } # else equal, keep checking } } } # display the output printf "%s\n", join '/', @$_ for sort compare_lists_lexicographically +@data; __DATA__ 0/1/2/3 0 0/4/5/6 0/4 0/1 0/1/2 0/4/5 0/10/111/145 0/10/111 0/10
    HTH,
    --athomason
Re: Sorting path like strings
by educated_foo (Vicar) on Jan 05, 2006 at 08:53 UTC
    Assuming you don't have too many subdirs at any level, does this

    perl -le 'print for map { join"/",map ord,split"",$_} sort map {chomp;join"",map chr,@x=split"/",$_}<>'

    get the right order?

Re: Sorting path like strings
by Aristotle (Chancellor) on Jan 05, 2006 at 14:42 UTC

    I’d use a GRT variant.

    use List::Util qw( max ); my @padded = do { my @copy = @path; my @segment_length; while( 1 ) { my $len = max map { s!([^/]*/|[^/]+)!! ? length( $1 ) : -1 } @ +copy; last if $len == -1; push @segment_length, $len; } my $format = join '', map "%${_}s", @segment_length; map {; no warnings 'uninitialized'; sprintf $format, split m!/!, $_, -1; } @path; }; @path = @path[ sort { $padded[ $a ] cmp $padded[ $b ] } 0 .. $#padded +];

    Makeshifts last the longest.

Re: Sorting path like strings
by NiJo (Friar) on Jan 05, 2006 at 21:37 UTC
    The standard solution to the lexical vs. numerical sorting problem is using leading 0 s. Comparing '04' and '10' works in both cases. It is also reversible:
    foreach (@in) {s|(\d+)|sprintf '%10.10d', $1|eg}; @in = sort @in; foreach (@in) {s|(\d+)|sprintf '%d', $1|eg};
    This assumes that your numbers have up to 10 digits.