http://qs321.pair.com?node_id=29070


in reply to sorting comma separated value file

The following code automatically detects numeric fields or even fields of mixed text and numbers and sorts them properly (one of my favorite tricks). Note that it doesn't handle quoted fields that contain commas (or even where some lines have the field quoted and some don't and where the quotes are not supposed to affect the sort order). Replace the simple split/\s*,\s*/ with a use of the CSV module if you have that kind of data.

#!/usr/bin/perl -w use strict; die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; my @lines= <>; my @sort= map { my $x=join"\0"x5,(split/\s*,\s*/)[@cols]; $x =~ s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; $x } @lines; print @lines[ sort { $sort[$a] cmp $sort[$b] } 0..$#sort ];

Note that this code explicitly avoids using nifty nested map tricks because they tend to slow things down. For example, the code above was over twice as fast as the following sexier code in my large-file tests:

die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; print map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { my $x=join"\0\0\0\0",(split/\s*,\s*/)[@cols]; s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; [$x,$_] } <>;

P.S. The reason that this nested-map version is slow is not because I don't have tilly's illustrious patch (just to counter tilly's down-playing of how neat his patch is). Those are all 1-to-1 maps. (:

P.P.S. I think that this is a Schwartzian Transform, but I wasn't sure I'd done it right and didn't want to mislabel it. :) Update: While I was typing, an example of a Schwartzian Transform was posted just above and, other than mixing 1 and 0, I did write one.

        - tye (my smileys are ambidextrous!)