Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Sort questions

by wube (Acolyte)
on Jan 28, 2005 at 21:23 UTC ( [id://426119]=perlquestion: print w/replies, xml ) Need Help??

wube has asked for the wisdom of the Perl Monks concerning the following question:


Gurus,
I need your help. I have a array that needs to be sorted.
The requirements are :
    weekday (col7)
    time (col8)
    policy (col0)
where
    weekday - format is alphabetic, but order is by day of week
      - Sunday comes first, followed by Monday.....
    time - format is HHH:MM.SS , numeric
    policy - format is alphabethic
I have written a small test program. Code follows
#!/usr/local/bin/perl -w @data = ( "aix-dev,active,aixr03,aixr03,crmd,FULL,12345,Sunday,018:00:00", "w2k-prod,active,w2k-PROD,pub1hi,prod-web-srv1Y,FULL,3,Friday, +021:00:00", "w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1,CINC,,Monday,0 +09:00:00", "w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1,CINC,,Monday,0 +23:00:000", "aix-dev-ecmd2,active,aixr03,aixr03,ecmD2,FULL,12345,Monday,00 +1:00:00", "aix-prod-artp,active,aixr04,aixr04,artp-rman,FULL,2345,Sunday +,021:30:00", "w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1Y,FULL,3,Friday +,023:00:00", "w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1Y,FULL,3,Friday +,021:00:00" ) ; $count=@data ; print "debug-->data has $count lines \n" ; foreach $line (@data) { chomp $line ; print "$line \n" ; } # Sort on Weekday, time and policy @out1 = sort { @a = (split ',', $a) ; @b = (split ',', $b) ; $a[7] cmp $b[7] || $a[8] cmp $b[8] || $a[0] cmp $b[0] } @data ; $count=@out1 ; print "debug-->out1 has $count lines \n" ; foreach $line (@out1) { chomp $line ; print "$line \n" ; }
1. What do I need to do to make the sort on weekday order by day of week ??
2. How do I split time so I can do numeric compares on HHH, MM,and SS ?
3. How to combine the 3 requirements in one sort statement ?
4. How to make it efficient ( the array could be large ) ?
TIA

Replies are listed 'Best First'.
Re: Sort questions
by Tanktalus (Canon) on Jan 28, 2005 at 21:46 UTC

    This is just screaming out for a Schwartzian transform... ;-) Oh, and for using Text::CSV or something similar.

    @out1 = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { my @d = split /,/; [ join('\0', @d[7,8,0]), $_ ] } @data;

    That's just to get it to be efficient.

    To get it do anything else, I'm going to concentrate on the last map (which is the first one executed): map { my @d = split /,/; [ join('\0', @d[7,8,0]), $_ ] }.

    To sort on weekday order by day of week, we need to know what the order is. Say you have another array elsewhere (associative array, aka hash):

    my %dow = qw(Sunday 0 Monday 1 Tuesday 2 Wednesday 3 Thursday 4 Frid +ay 5 Saturday 6);
    The numbers give a sort order. You can change the numbers to give any sort order you want. Note that the way this is set up, we're actually sorting alphabetically (using cmp) not numerically, so using letters such as a, b, c, d, e, f, g might be more obvious. Then the map looks something like this:
    map { my @d = split /,/; [ join('\0', $dow{$d[7]}, @d[8,0]), $_ ] +}

    For splitting the time - I'll leave that as an excersise for you to do - I think this should give you way more than enough direction.

      As the first post by Limbic~Region pointed out, the Schwartzian Transform is the answer I was looking for.
      However, I will still look into the workings of Text::CSV. It might come in handy later on.

      Thanks for your insight.
Re: Sort questions
by Limbic~Region (Chancellor) on Jan 28, 2005 at 21:39 UTC
    wube,
    I would recommend breaking your data up into a different structure if you can. If you can't, the following should be close to what you want. In fact, it is temporarily creating an alternative structure used to do the sorting and then disgarding it at the end.
    #!/usr/bin/perl #!/usr/bin/perl use strict; use warnings; my %dow = ( Sunday => 1, Monday => 2, Tuesday => 3, Wednesday => 4, Thurday => 5, Friday => 6, Saturday => 7 ); my @data = map { chomp; $_ } <DATA>; print "$_\n" for map { $_->[0] } sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] || $a->[3] cmp $ +a->[3] } map { my @field = split /,/; $field[8] =~ s/\D//g; [ $_, $dow{$field[7]}, $field[8], $field[0] ] } @data; __DATA__ "aix-dev,active,aixr03,aixr03,crmd,FULL,12345,Sunday,018:00:00" "w2k-prod,active,w2k-PROD,pub1hi,prod-web-srv1Y,FULL,3,Friday,021:00:0 +0" "w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1,CINC,,Monday,009:00:00 +" "w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1,CINC,,Monday,023:00:00 +0" "aix-dev-ecmd2,active,aixr03,aixr03,ecmD2,FULL,12345,Monday,001:00:00" "aix-prod-artp,active,aixr04,aixr04,artp-rman,FULL,2345,Sunday,021:30: +00" "w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1Y,FULL,3,Friday,023:00: +00" "w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1Y,FULL,3,Friday,021:00: +00"
    As far as the last question - how to make it efficient. You are going to need to give some idea what you mean by "large" and what resource you are looking for to be efficient with.

    Cheers - L~R

    See Schwartzian Transform for more information.
      Works like a charm once the last compare was changed to
      $a->[3] cmp $b->[3]
      1. Is there a way to delineate the end of data block, something line __ENDOFDATA__ ( if it exists ) ?

      2. I was thinking more in terms of packing or caching the sort keys, but now I really don't think that's necessary. The file only contains about 400 lines.

      Thanks for your assistance.
        wube,
        • Is there a way to delineate the end of data block, something line __ENDOFDATA__ ( if it exists )?
        • I don't believe so, but I am more interested in knowing why you would want to. I used __DATA__ because I assumed you were getting the data to populate the array from an external file. In this case, you would only need to open a filehandle for that file and replace the <DATA> with <FILEHANDLE>. If my assumption is wrong, please clarify and I will try and help you accomplish what you want.
        • I was thinking more in terms of packing or caching the sort keys, but now I really don't think that's necessary. The file only contains about 400 lines
        • That's why I recommended using a different data structure (such as an AoH) if you could. You would not have to use the Schwartzian Transform and your data would be more easily accessible. The other side to this is it would consume more memory.

        Cheers - L~R

Re: Sort questions (use a GRT)
by grinder (Bishop) on Jan 28, 2005 at 22:03 UTC

    This is not too shabby to start with, you're on the right track. To sort by the day of the week, use a hash as a lookup, to retrieve the day value:

    my %dow = (qw( Sunday 0 Monday 1 Tuesday 2 Wednesday 3 Thursday 4 Friday 5 Saturday 6 )); # ... $dow{$a[7]} <=> $dow{$b[7]}

    You might want to make Sunday => 7 if you want it to sort higher, to put the weekend at the end. I'm using a qw() quote-word construct, which returns a list and assigns that to a hash. Just a bit less typing.

    For the timestamps, if you can guarantee that they will always follow a ddd:dd:dd format, you can just sort them alphabetically as you are already doing

    To combine them all into an efficient sort statement, use a Guttman-Rosler Transform (GRT)*. The idea is to build a prefix to the string so that you can use a bare sort operation, and then throw away the prefix after the sort and recover the original record.

    my @out = map { (split /%/)[2] } sort map { my @field = split /,/; "$dow{$field[7]}%$field[8]%$_" } @in ;

    Since the first field of the record is also part of the sort criteria, you don't actually have to make it part of the prefix. So you only have to add on a munged day of week and the timestamp to have it sort correctly. Then, after the sort, split it up on % and throw the first two away to recover your record.

    If the array can be really large, say, over 200_000 entries, then you should consider preparing data files so that the operating system sort command can be used. Otherwise I'd just stick with Perl.

    * see Advanced Sorting - GRT - Guttman Rosler Transform for a discussion on the GRT.

    - another intruder with the mooring in the heart of the Perl

Re: Sort questions
by johnnywang (Priest) on Jan 28, 2005 at 21:48 UTC
    You can use the Schwarzian Transform, something like:
    use strict; my @data = ( "aix-dev,active,aixr03,aixr03,crmd,FULL,12345,Sunday,018:00:00", "w2k-prod,active,w2k-PROD,pub1hi,prod-web-srv1Y,FULL,3,Friday, +021:00:00", "w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1,CINC,,Monday,0 +09:00:00", "w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1,CINC,,Monday,0 +23:00:000", "aix-dev-ecmd2,active,aixr03,aixr03,ecmD2,FULL,12345,Monday,00 +1:00:00", "aix-prod-artp,active,aixr04,aixr04,artp-rman,FULL,2345,Sunday +,021:30:00", "w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1Y,FULL,3,Friday +,023:00:00", "w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1Y,FULL,3,Friday +,021:00:00" ) ; my %hash = (Sunday=>0,Monday=>1,Tuesday=>2,Wednesday=>3, Thursday=>4,Friday=>5,Saturday=>6); print map { $_->[0] } sort { $a->[1] <=> $b->[1] #week || $a->[2] <=> $b->[2] #hour || $a->[3] <=> $b->[3] # minute || $a->[4] <=> $b->[4] # second || $a->[5] cmp $b->[5] #policy } map { [ $_, compute($_)] } @data; sub compute{ my @items = split(/,/,shift); my @time = split(/:/,$items[8]); return ($hash{$items[7]},@time,$items[0]); } __END__ # output: aix-dev,active,aixr03,aixr03,crmd,FULL,12345,Sunday,018:00:00 aix-prod-artp,active,aixr04,aixr04,artp-rman,FULL,2345,Sunday,021:30:0 +0 aix-dev-ecmd2,active,aixr03,aixr03,ecmD2,FULL,12345,Monday,001:00:00 w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1,CINC,,Monday,009:00:00 w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1,CINC,,Monday,023:00:000 w2k-prod,active,w2k-PROD,pub1hi,prod-web-srv1Y,FULL,3,Friday,021:00:00 w2k-prod,active,w2k-PROD,pub01pi,prod-web-srv1Y,FULL,3,Friday,021:00:0 +0 w2k-prod,active,w2k-PROD,pol02pi,prod-web-srv1Y,FULL,3,Friday,023:00:0 +0
      The Schwarzian Transform is the way to go, together with using the hash to sort the day of the week did the trick.

      Thanks for your help.
Re: Sort questions
by wube (Acolyte) on Jan 29, 2005 at 16:29 UTC
    Thanks to all that replied.
    I will test each reply and reply to each one individually after the tests.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://426119]
Approved by johnnywang
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-19 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found