ultibuzz has asked for the wisdom of the Perl Monks concerning the following question:
hi monks, i have the following list of filenames
{ID#0000D128}-20060519_090519_00000ZURACF2954.zip
{ID#0000D129}-20060519_091438_00000ZURACF2955.zip
{ID#0000D12C}-20060519_092458_00000ZURACF2957.zip
{ID#0000D12D}-20060519_092911_00000ZURACF2956.zip
My problem is that i need to sort this list so that at the end the numbers are in right order, as you can see the 56 is after the 57 it shoud be the other way around
i ve read in a book that there are more complex sort functions but then the book stops at this position and continue with the a cmp b and the nummeric way a <=> b
i thought about spliting the filename into parts to sort everything as a part and then reatach the parts but when i do somthing like this i lost the connection between the parts
have someone a clue about this ?, as usual links tips and any help are kindly welcome ;)
Re: problems with advanced array sort
by blazar (Canon) on May 19, 2006 at 13:02 UTC
|
As you may easily imagine, this kind of question gets asked quite so often that it should be easy to locate some info before asking. In particular I recommend you to look for Schwartzian transform and Guttman-Rosler transform. To stay within The Monastery, check the tutorials section, and in particular
Update: minimal example using Guttman-Rosler transform follows. Note that I used : as a separator, which seems appropriate for this example. You may want to choose something different if needed. I also took for granted that the filenames always end with a sequence of four digits, i.e. that the numbers are possibly padded with zeroes. If this is not the case, then just do it yourself with sprintf.
#!/usr/bin/perl -l
use strict;
use warnings;
chomp(my @file=<DATA>);
@file=map +(split /:/)[1],
sort map +(/(\d+)\.zip/)[0] . ":$_", @file;
print for @file;
__END__
{ID#0000D128}-20060519_090519_00000ZURACF2954.zip
{ID#0000D129}-20060519_091438_00000ZURACF2955.zip
{ID#0000D12C}-20060519_092458_00000ZURACF2957.zip
{ID#0000D12D}-20060519_092911_00000ZURACF2956.zip
Update2: alternative version explicitly using sprintf as hinted above, and a simple substr instead of split on a separator, since we have fixed length "fields" anyway.
@file=map { substr $_, 7 }
sort map sprintf("%06d", /(\d+)\.zip/) . $_, @file;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: problems with advanced array sort
by davorg (Chancellor) on May 19, 2006 at 13:05 UTC
|
You need to use a sorting subroutine which extracts the number that you want to sort by from the strings and sorts on that. It might look something like this:
#!/usr/bin/perl
use strict;
use warnings;
print sort my_sort <DATA>;
sub my_sort {
my ($a_num) = $a =~ /(\d+)\.zip/;
my ($b_num) = $b =~ /(\d+)\.zip/;
return $a_num <=> $b_num;
}
__DATA__
{ID#0000D128}-20060519_090519_00000ZURACF2954.zip
{ID#0000D129}-20060519_091438_00000ZURACF2955.zip
{ID#0000D12C}-20060519_092458_00000ZURACF2957.zip
{ID#0000D12D}-20060519_092911_00000ZURACF2956.zip
Or there's a Schwartzian Transform version:
print map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, /(\d+)\.zip/ ] } <DATA>;
Or, if you don't care about maintainability :)
print sort
{ ($a =~ /(\d+)\.zip/)[0] <=> ($b =~ /(\d+)\.zip/)[0] }
<DATA>;
--
< http://dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: problems with advanced array sort
by salva (Canon) on May 19, 2006 at 13:17 UTC
|
my @sorted = sort {
($a =~ /(\d+)\D*$/)[0] <=> ($b =~ /(\d+)\D*$/)[0]
} @filenames;
and something simple and fast:
use Sort::Key qw(isortkey);
my @sorted = isortkey { /(\d+)\D*$/; $1 } @filenames;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: problems with advanced array sort
by Limbic~Region (Chancellor) on May 19, 2006 at 13:04 UTC
|
@file =
# getting back the filename
map $_->[0]
# ascending numerical sort on desired digits
sort {$a->[1] <=> $b->[1]}
# anonymous array 0 => filename, 1 => desired digits
map {[$_, /(\d+)\.zip/]} @file;
Keep in mind this solution is specific to the file name sample you provided. You will likely have to adapt it or, better yet, use File::Basename. Update: I added comments to what the ST was doing in case you weren't familiar with how it worked.
| [reply] [Watch: Dir/Any] [d/l] |
Re: problems with advanced array sort
by ultibuzz (Monk) on Jun 21, 2006 at 08:29 UTC
|
hi all, here is the final code
use strict;
open(SORT_FILE, '<', "sort_test.txt")
or die("open failed: $!");
my @to_sort = <SORT_FILE>;
my @from_sort = map { $_->[0] }
sort { $a->[2] <=> $b->[2] }
map {[$_, (split /CF|\./) ]} @to_sort;
foreach my $sorted (@from_sort) {
print $sorted;
}
my $idx=1;$idx++ while ($idx < @from_sort) and substr($from_sort[$idx]
+,41,4)-substr($from_sort[$idx-1],41,4) <=50;
my @sort_splice = splice @from_sort, 0, $idx;
my @sorted = (@from_sort, @sort_splice);
thx to Corion for the while one liner :D | [reply] [Watch: Dir/Any] [d/l] |
Re: problems with advanced array sort
by ultibuzz (Monk) on May 29, 2006 at 15:59 UTC
|
hi all, sorry for my late response, thx for the great help, i really like the short one at the end i still have some problem with this sort thing the sequence numbers goes from 1-9999 or from 1-400 after the 400 the 1 is the next number,and i dont get how i can mangae this with any of the sort subs so the filenames will be 0399,0400,0001,0002 this is the correct order with the normal sort funktion it work ,sort {$a cmp $b}, how can i implement it in the more complex versions ?
kind regards ultibuzz
| [reply] [Watch: Dir/Any] |
|
What? This is either ambiguous or stupid. Your call.
Your example 0399,0400,0001,0002 and the ranges 1-9999;1-400 you mentioned above imply that the full sequence goes 0001, 0002, ..., 0399, 0400, 0001, 0002, ..., 0399, 0400, 0401, ..., 9999. How am I supposed to know whether any number between 1 and 400 falls into the earlier or later range of the full sequence?
| [reply] [Watch: Dir/Any] |
|
your right this is a problem of the software deliverd to us normaly you dont get a full sequence each day, so you can be quit sure if everything is ok, we can only see after a month if realy everything was ok or not
you only have a 1-400 sequence or a 1-9999 sequence not both
2 different systems
| [reply] [Watch: Dir/Any] |
|
|