Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

problems with advanced array sort

by ultibuzz (Monk)
on May 19, 2006 at 12:53 UTC ( [id://550479]=perlquestion: print w/replies, xml ) Need Help??

ultibuzz has asked for the wisdom of the Perl Monks concerning the following question:

hi monks,
i have the following list of filenames

{ID#0000D128}-20060519_090519_00000ZURACF2954.zip
{ID#0000D129}-20060519_091438_00000ZURACF2955.zip
{ID#0000D12C}-20060519_092458_00000ZURACF2957.zip
{ID#0000D12D}-20060519_092911_00000ZURACF2956.zip

My problem is that i need to sort this list so that at the end the numbers are in right order, as you can see the 56 is after the 57 it shoud be the other way around

i ve read in a book that there are more complex sort functions but then the book stops at this position and continue with the a cmp b and the nummeric way a <=> b

i thought about spliting the filename into parts to sort everything as a part and then reatach the parts but when i do somthing like this i lost the connection between the parts


have someone a clue about this ?, as usual links tips and any help are kindly welcome ;)

Replies are listed 'Best First'.
Re: problems with advanced array sort
by blazar (Canon) on May 19, 2006 at 13:02 UTC

    As you may easily imagine, this kind of question gets asked quite so often that it should be easy to locate some info before asking. In particular I recommend you to look for Schwartzian transform and Guttman-Rosler transform. To stay within The Monastery, check the tutorials section, and in particular

    Update: minimal example using Guttman-Rosler transform follows. Note that I used : as a separator, which seems appropriate for this example. You may want to choose something different if needed. I also took for granted that the filenames always end with a sequence of four digits, i.e. that the numbers are possibly padded with zeroes. If this is not the case, then just do it yourself with sprintf.

    #!/usr/bin/perl -l use strict; use warnings; chomp(my @file=<DATA>); @file=map +(split /:/)[1], sort map +(/(\d+)\.zip/)[0] . ":$_", @file; print for @file; __END__ {ID#0000D128}-20060519_090519_00000ZURACF2954.zip {ID#0000D129}-20060519_091438_00000ZURACF2955.zip {ID#0000D12C}-20060519_092458_00000ZURACF2957.zip {ID#0000D12D}-20060519_092911_00000ZURACF2956.zip

    Update2: alternative version explicitly using sprintf as hinted above, and a simple substr instead of split on a separator, since we have fixed length "fields" anyway.

    @file=map { substr $_, 7 } sort map sprintf("%06d", /(\d+)\.zip/) . $_, @file;
Re: problems with advanced array sort
by davorg (Chancellor) on May 19, 2006 at 13:05 UTC

    You need to use a sorting subroutine which extracts the number that you want to sort by from the strings and sorts on that. It might look something like this:

    #!/usr/bin/perl use strict; use warnings; print sort my_sort <DATA>; sub my_sort { my ($a_num) = $a =~ /(\d+)\.zip/; my ($b_num) = $b =~ /(\d+)\.zip/; return $a_num <=> $b_num; } __DATA__ {ID#0000D128}-20060519_090519_00000ZURACF2954.zip {ID#0000D129}-20060519_091438_00000ZURACF2955.zip {ID#0000D12C}-20060519_092458_00000ZURACF2957.zip {ID#0000D12D}-20060519_092911_00000ZURACF2956.zip

    Or there's a Schwartzian Transform version:

    print map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, /(\d+)\.zip/ ] } <DATA>;

    Or, if you don't care about maintainability :)

    print sort { ($a =~ /(\d+)\.zip/)[0] <=> ($b =~ /(\d+)\.zip/)[0] } <DATA>;
    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: problems with advanced array sort
by salva (Canon) on May 19, 2006 at 13:17 UTC
    something simple:
    my @sorted = sort { ($a =~ /(\d+)\D*$/)[0] <=> ($b =~ /(\d+)\D*$/)[0] } @filenames;
    and something simple and fast:
    use Sort::Key qw(isortkey); my @sorted = isortkey { /(\d+)\D*$/; $1 } @filenames;
Re: problems with advanced array sort
by Limbic~Region (Chancellor) on May 19, 2006 at 13:04 UTC
    ultibuzz,
    Here is a Schwartzian Transform solution:
    @file = # getting back the filename map $_->[0] # ascending numerical sort on desired digits sort {$a->[1] <=> $b->[1]} # anonymous array 0 => filename, 1 => desired digits map {[$_, /(\d+)\.zip/]} @file;
    Keep in mind this solution is specific to the file name sample you provided. You will likely have to adapt it or, better yet, use File::Basename. Update: I added comments to what the ST was doing in case you weren't familiar with how it worked.

    Cheers - L~R

Re: problems with advanced array sort
by ultibuzz (Monk) on Jun 21, 2006 at 08:29 UTC

    hi all,
    here is the final code

    use strict; open(SORT_FILE, '<', "sort_test.txt") or die("open failed: $!"); my @to_sort = <SORT_FILE>; my @from_sort = map { $_->[0] } sort { $a->[2] <=> $b->[2] } map {[$_, (split /CF|\./) ]} @to_sort; foreach my $sorted (@from_sort) { print $sorted; } my $idx=1;$idx++ while ($idx < @from_sort) and substr($from_sort[$idx] +,41,4)-substr($from_sort[$idx-1],41,4) <=50; my @sort_splice = splice @from_sort, 0, $idx; my @sorted = (@from_sort, @sort_splice);

    thx to Corion for the while one liner :D

Re: problems with advanced array sort
by ultibuzz (Monk) on May 29, 2006 at 15:59 UTC

    hi all,

    sorry for my late response, thx for the great help, i really like the short one at the end

    i still have some problem with this sort thing
    the sequence numbers goes from 1-9999 or from 1-400
    after the 400 the 1 is the next number,and i dont get how i can mangae this with any of the sort subs

    so the filenames will be 0399,0400,0001,0002
    this is the correct order

    with the normal sort funktion it work ,sort {$a cmp $b}, how can i implement it in the more complex versions ?


    kind regards ultibuzz

      What? This is either ambiguous or stupid. Your call.

      Your example 0399,0400,0001,0002 and the ranges 1-9999;1-400 you mentioned above imply that the full sequence goes 0001, 0002, ..., 0399, 0400, 0001, 0002, ..., 0399, 0400, 0401, ..., 9999. How am I supposed to know whether any number between 1 and 400 falls into the earlier or later range of the full sequence?

        your right this is a problem of the software deliverd to us
        normaly you dont get a full sequence each day, so you can be quit sure if everything is ok, we can only see after a month if realy everything was ok or not you only have a 1-400 sequence or a 1-9999 sequence not both 2 different systems

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://550479]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-29 04:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found