Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

sorting numbered words

by Anonymous Monk
on Dec 18, 2006 at 13:43 UTC ( [id://590431]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a 'good' way to sort a list that looks like this:

text1
text2
text3
text10
text11


so that it ends up in that order?

Using the normal sort it puts the 10 and 11 before the 2 (as expected).

The only thing i've been able to think of is to use a regex to break up anything that matches that pattern, drop the pieces in a hash then use || to sort on 2 values. This seems like a bit of a hack to me though.

Replies are listed 'Best First'.
Re: sorting numbered words
by McDarren (Abbot) on Dec 18, 2006 at 14:56 UTC
    Sort::Naturally will do this:

    (Updated to show that it will also work fine with something like "text69.something-14-14")

    (Update 2: Sort::Naturally also handles the data used by BrowserUK below "correctly") :)

    #!/usr/bin/perl -w use strict; use Sort::Naturally; print for nsort(<DATA>); __DATA__ text1 text69.something-14-14 text2 text37 text5 text3 text10 text11

    Gives:

    text1 text2 text3 text5 text10 text11 text37 text69.something-14-14

    Cheers,
    Darren :)

Re: sorting numbered words
by shmem (Chancellor) on Dec 18, 2006 at 14:06 UTC
    You could use a Schwartzian Transform:
    @sorted = map {$_->[2]} sort {$a->[0] cmp $b->[0] || $a->[1] <=> $b->[1]} map {/(\D+)(\d+)/;[$1,$2,$_]} @unsorted;

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      this fails on files named like "test69.something-14-14"

        If my example doesn't work for you, modify it accordingly.

        That said, I don't see how it fails (as I don't know what you expect):

        #!/usr/bin/perl -w use strict; my @unsorted = <DATA>; my @sorted = map {$_->[2]} sort {$a->[0] cmp $b->[0] || $a->[1] <=> $b->[1]} map {/(\D+)(\d+)/;[$1,$2,$_]} @unsorted; print for @sorted; __DATA__ test69.something-14-14 test28.something-14-14 foo52.something-14-14 test13.something-14-14 test4.something-14-14 foo58.something-14-14 test31.something-14-14 test15.something-14-14 test59.something-14-14 foo5.something-14-14 test41.something-14-14 test38.something-14-14 foo11.something-14-14 test10.something-14-14 test8.something-14-14 test49.something-14-14 foo24.something-14-14 foo7.something-14-14 bar27.something-14-14 bar0.something-14-14 test3.something-14-14 __END__ bar0.something-14-14 bar27.something-14-14 foo5.something-14-14 foo7.something-14-14 foo11.something-14-14 foo24.something-14-14 foo52.something-14-14 foo58.something-14-14 test3.something-14-14 test4.something-14-14 test8.something-14-14 test10.something-14-14 test13.something-14-14 test15.something-14-14 test28.something-14-14 test31.something-14-14 test38.something-14-14 test41.something-14-14 test49.something-14-14 test59.something-14-14 test69.something-14-14

        The OP didn't

        1. contain filenames like "test69.something-14-14"
        2. specifiy that filenames as a composite of any number of fields of strings and digits were to be sorted string- and number-wise
        3. specify how non-letter and non-digits should be treated.

        Bug reports without logs are useless.

        Further recommended reading: I know what I mean. Why don't you?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: sorting numbered words
by shonorio (Hermit) on Dec 18, 2006 at 13:50 UTC
Re: sorting numbered words
by jettero (Monsignor) on Dec 18, 2006 at 13:48 UTC

    Seems to me you want to compare just the numbers, numerically ...

    @a = sort { my ($c, $d); $c = $1 if $a =~ m/(\d+)/; $d = $1 if $b =~ m +/(\d+)/; $c<=>$d } @a

    Or something like that anyway. Hopefully someone perl-golfs me. I'd like to see the most concise way to say the above. Could be shmem came up with something better below. It's not shorter, but probably better.

    -Paul

Re: sorting numbered words
by BrowserUk (Patriarch) on Dec 18, 2006 at 15:09 UTC

    Update: A GT form which might be quicker, but makes an assumption about the maximum length of the inputs:

    print for map{ unpack 'x255A*', $_ } sort map { ( my $x = $_ ) =~ s[(\d+)]{ pack 'N', $1 }ge; pack 'a255a*', $x, $_; } <DATA>;

    Not the fastest implementation, but I think it will handle most contingencies, provided no individual numeric field exceeds 2**32:

    #! perl -sw use strict; print for map{ $_->[0] } sort{ $a->[1] cmp $b->[1] } map { [ $_ , do{ ( my $x = $_ ) =~ s[(\d+)]{ pack 'N', $1 }ge; $x } ] } <DATA>; __DATA__ text3 text10 text2 text11 text1 test69.something-14-14 test60.something-14-14 test69.something-1-14 test60.something-1-14 test69.something-14-1 test60.something-14-1

    Gives:

    c:\test>junk test60.something-1-14 test60.something-14-1 test60.something-14-14 test69.something-1-14 test69.something-14-1 test69.something-14-14 text1 text2 text3 text10 text11

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: sorting numbered words
by throop (Chaplain) on Dec 18, 2006 at 15:38 UTC
    Others have given able comments on how to do this sort. May I take a different tack? I suspect you are forming the text1, text2... with code that looks something like this:
    my $counter = 1; foreach (lameExcuse()){ $token = 'text' . $counter++; doSomethineWith($token)}
    And later you're having to sort these tokens. If you instead
    my $token = 'text00001'; foreach (lameExcuse()){ doSomethineWith($token++)}
    You'll generate tokens text00001 text00002 text00003... which sort quite nicely up to 100K tokens.

    throop

    Actually, it works OK beyond 100K; you just roll over to 'texu00000'
Re: sorting numbered words
by gam3 (Curate) on Dec 19, 2006 at 15:00 UTC
    You might want to ftake a look at 442237. It discusses sorting words that contain decimal and version numbers as well as integers.
    -- gam3
    A picture is worth a thousand words, but takes 200K.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://590431]
Approved by JediWizard
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-03-28 15:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found