Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Sorting files by 3 numbers in the name

by crusty_collins (Friar)
on May 26, 2017 at 13:37 UTC ( [id://1191282]=perlquestion: print w/replies, xml ) Need Help??

crusty_collins has asked for the wisdom of the Perl Monks concerning the following question:

I have a task that requires me to sort several files by 3 numbers in the file name. What I have so far is a sort by a single number (run).
Question : How can I sort by 3 numbers
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; # sort by run then by dist then by copy then by total # run district copy t +otal # | | | | #ASR0004994_8958_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_02_04_S +pr17_Initial_201705040951_41043.zip my @files = qw( ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_01_0 +2_Spr17_Initial_201705040952_41044.zip ASR0004520_8960_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_04_0 +4_Spr17_Initial_201705040952_41045.zip ASR0004994_8958_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_02_0 +4_Spr17_Initial_201705040951_41043.zip ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_02_0 +2_Spr17_Initial_201705040952_41044.zip ASR0005154_8957_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_01_0 +4_Spr17_Initial_201705040951_41042.zip ASR0005336_8959_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_03_0 +4_Spr17_Initial_201705040952_41044.zip ASR0005336_8972_ETSTexas_EOC052017P_0517_Candidate_RRD_178902_01_0 +1_Spr17_Initial_201705040952_41044.zip ); # this sorts by the run number my @returnfiles = sort { ( $a =~ /^[^\d]*\d+_(\d{4})/ )[0] <=> ( $b =~ + /^[^\d]*\d+_(\d{4})/ )[0] } @files ; print Dumper @returnfiles;
"We can't all be happy, we can't all be rich, we can't all be lucky – and it would be so much less fun if we were. There must be the dark background to show up the bright colours." Jean Rhys (1890-1979)

Replies are listed 'Best First'.
Re: Sorting files by 3 numbers in the name
by tobyink (Canon) on May 26, 2017 at 14:09 UTC
    # These constants make the code below more readable. # use constant { IX_FILENAME => 0, IX_RUN => 1, IX_DISTRICT => 2, IX_COPY => 3, IX_TOTAL => 4, }; # Read this bit from bottom to top: # my @sorted = # Now we've sorted our arrayrefs by the fields we're interested in # we loop through them again, pulling out just the filename and # discarding the other parts. map { $_->[IX_FILENAME] } # Sort by the fields we're interested in. Note that if the two # values for RUN are different, this will sort by them, and everyt +hing # following the first 'or' is ignored. If they're the same, that # comparison returns 0, so the stuff after 'or' isn't ignored, # and we compare by DISTRICT, then COPY, then TOTAL. sort { $a->[IX_RUN] <=> $b->[IX_RUN] or $a->[IX_DISTRICT] <=> $b->[IX_DISTRICT] or $a->[IX_COPY] <=> $b->[IX_COPY] or $a->[IX_TOTAL] <=> $b->[IX_TOTAL] } # For each filename, split it into an arrayref, so that the first # element in the arrayref is the filename itself, and the rest are # the fields we're interested in. map { [ $_, m/\A[A-Z0-9]+_([0-9])+_ETSTexas_.*_Candidate_RRD_([0-9]+ +)_([0-9]{2})_([0-9]{2})/i ] } # Take our list of filenames… @files; # Check it works. (It does.) # print Dumper(\@sorted);

        Yeah, but I think they were added in 5.10, and when possible I try to give examples using 5.8 features. (Something like say is excusable, because it's so easy to write a shim for it.

        sub say { local $\ = "\n"; print(@_ or $_) } sub IO::Handle::say { my $h = shift; local $\ = "\n"; $h->print(@_ or +$_) }

        I also quite like this way:

        use constant { IX_FILENAME => 0, IX_RUN => 2, IX_DISTRICT => 8, IX_COPY => 9, IX_TOTAL => 10, }; print Dumper map { Dumper($_), $_->[IX_FILENAME] } sort { $a->[IX_RUN] <=> $b->[IX_RUN] or $a->[IX_DISTRICT] <=> $b->[IX_DISTRICT] or $a->[IX_COPY] <=> $b->[IX_COPY] or $a->[IX_TOTAL] <=> $b->[IX_TOTAL] } map { [ $_, split /_/ ] } @files;
Re: Sorting files by 3 numbers in the name
by tybalt89 (Monsignor) on May 26, 2017 at 14:46 UTC

    Since perl's sort is now stable, I offer this in loving memory and tribute to IBM card sorters :)

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1191282 use strict; use warnings; use Data::Dumper; my @files = qw( ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_01_0 +2_Spr17_Initial_201705040952_41044.zip ASR0004520_8960_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_04_0 +4_Spr17_Initial_201705040952_41045.zip ASR0004994_8958_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_02_0 +4_Spr17_Initial_201705040951_41043.zip ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_02_0 +2_Spr17_Initial_201705040952_41044.zip ASR0005154_8957_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_01_0 +4_Spr17_Initial_201705040951_41042.zip ASR0005336_8959_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_03_0 +4_Spr17_Initial_201705040952_41044.zip ASR0005336_8972_ETSTexas_EOC052017P_0517_Candidate_RRD_178902_01_0 +1_Spr17_Initial_201705040952_41044.zip ); # sort by pseudo-column with stable sort -- IBM card sorters forever ! +!! my @returnfiles = sort { (split /_/, $a)[1] <=> (split /_/, $b)[1] } sort { (split /_/, $a)[7] <=> (split /_/, $b)[7] } sort { (split /_/, $a)[8] <=> (split /_/, $b)[8] } sort { (split /_/, $a)[9] <=> (split /_/, $b)[9] } @files; print Dumper \@returnfiles;

      If you've got hundreds of filenames, I think you'll find my way runs significantly faster — it only needs to match each filename against the regexp once. Yours will do it dozens of times per filename.

        Hey, don't besmirch the memory of IBM card sorters.

        Without them, where would 1950's science fiction movies be?

Re: Sorting files by 3 numbers in the name
by BrowserUk (Patriarch) on May 26, 2017 at 13:58 UTC

    #! perl -slw use strict; my @files = qw( ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_01_0 +2_Spr17_Initial_201705040952_41044.zip ASR0004520_8960_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_04_0 +4_Spr17_Initial_201705040952_41045.zip ASR0004994_8958_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_02_0 +4_Spr17_Initial_201705040951_41043.zip ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_02_0 +2_Spr17_Initial_201705040952_41044.zip ASR0005154_8957_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_01_0 +4_Spr17_Initial_201705040951_41042.zip ASR0005336_8959_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_03_0 +4_Spr17_Initial_201705040952_41044.zip ASR0005336_8972_ETSTexas_EOC052017P_0517_Candidate_RRD_178902_01_0 +1_Spr17_Initial_201705040952_41044.zip ); print for map unpack( 'x16 a*', $_ ), sort map pack( 'NNNNa*', (m[_(\d ++)]g)[0,2,3,4], $_ ), @files; __END__ [14:56:37.78] C:\test>junk39 ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_01_02_Sp +r17_Initial_201705040952_41044.zip ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_02_02_Sp +r17_Initial_201705040952_41044.zip ASR0005154_8957_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_01_04_Sp +r17_Initial_201705040951_41042.zip ASR0004994_8958_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_02_04_Sp +r17_Initial_201705040951_41043.zip ASR0005336_8959_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_03_04_Sp +r17_Initial_201705040952_41044.zip ASR0004520_8960_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_04_04_Sp +r17_Initial_201705040952_41045.zip ASR0005336_8972_ETSTexas_EOC052017P_0517_Candidate_RRD_178902_01_01_Sp +r17_Initial_201705040952_41044.zip

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Sorting files by 3 numbers in the name
by hippo (Bishop) on May 26, 2017 at 13:47 UTC

    Do it like the sort documentation suggests?

    # inefficiently sort by descending numeric compare using # the first integer after the first = sign, or the # whole record case-insensitively otherwise my @new = sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || fc($a) cmp fc($b) } @old; # same thing, but much more efficiently; # we'll build auxiliary indices instead # for speed my (@nums, @caps); for (@old) { push @nums, ( /=(\d+)/ ? $1 : undef ); push @caps, fc($_); } my @new = @old[ sort { $nums[$b] <=> $nums[$a] || $caps[$a] cmp $caps[$b] } 0..$#old ]; # same thing, but without any temps my @new = map { $_->[0] } sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] } map { [$_, /=(\d+)/, fc($_)] } @old;
Re: Sorting files by 3 numbers in the name
by BillKSmith (Monsignor) on May 26, 2017 at 18:57 UTC
    I have no idea how fast this module is, but you cannot beat it for convenience.
    use strict; use warnings; use List::UtilsBy qw(sort_by); my $x; my @files = qw( ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_01_0 +2_Spr17_Initial_201705040952_41044.zip ASR0004520_8960_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_04_0 +4_Spr17_Initial_201705040952_41045.zip ASR0004994_8958_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_02_0 +4_Spr17_Initial_201705040951_41043.zip ASR0005336_8950_ETSTexas_EOC052017P_0517_Candidate_RRD_178904_02_0 +2_Spr17_Initial_201705040952_41044.zip ASR0005154_8957_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_01_0 +4_Spr17_Initial_201705040951_41042.zip ASR0005336_8959_ETSTexas_EOC052017P_0517_Candidate_RRD_178901_03_0 +4_Spr17_Initial_201705040952_41044.zip ASR0005336_8972_ETSTexas_EOC052017P_0517_Candidate_RRD_178902_01_0 +1_Spr17_Initial_201705040952_41044.zip ); my @sorted_files = sort_by { join( '', (split /_/, $_)[1,7,8,9]) } @fi +les; $, = "\n"; print @sorted_files;
    Bill
      Nice, but you should better choose nsort_by()

      From the docs of List::UtilsBy

      > Similar to sort_by but compares its key values numerically.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

        We are comparing fixed length strings of digits. The result is the same whether we compare them lexically or numerically. I chose the lexical sort because the strings do not seem to have any numerical significance.

        Thanks for supplying the link to the module documentation.

        UPDATE: Oops! My comment about no numerical significance is wrong. My comment on the subject in level 6 below applies here as well (as long as all fields are of fixed length). I still prefer the lexical sort, but it is harder to justify.

        Bill
Re: Sorting files by 3 numbers in the name
by thanos1983 (Parson) on May 26, 2017 at 14:07 UTC

    Hello crusty_collins,

    I would create my own sort mechanism and give it to sort:

    my @entries; ... # Loop over your files and extract data push @files, { 'array_position' => $array_position, 'run' => $run, 'copy' => $copy, 'district' => $district, 'total' => $total }; ... my @sorted_files = sort { $a->{'run'} <=> $b->{'run'} || # use '<=>' for numbers $a->{'copy'} <=> $b->{'copy'} || $a->{'district'} <=> $b->{'district'} || $a->{'total'} <=> $b->{'total'} } @files;

    This is a sample but you get the picture, extract the values of each file and then based on the values sort them.

    Update: Adding array_position.

    Hope this helps.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Sorting files by 3 numbers in the name
by crusty_collins (Friar) on May 26, 2017 at 14:26 UTC
    Thank you all so much! now it makes sense to me
    "We can't all be happy, we can't all be rich, we can't all be lucky – and it would be so much less fun if we were. There must be the dark background to show up the bright colours." Jean Rhys (1890-1979)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1191282]
Approved by hippo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-24 04:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found