sorting comma separated value file

taopunk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: sorting comma separated value file by merlyn (Sage) on Aug 22, 2000 at 20:27 UTC
See Sort::Fields and Text::CSV in the CPAN. -- Randal L. Schwartz, Perl hacker	[reply]
RE: Re: sorting comma separated value file by Anonymous Monk on Aug 22, 2000 at 20:30 UTC
That was the fastest reply ever... Wait, do I have your book sitting in front of me?!? Thanks for the pointer.	[reply]
RE: sorting comma separated value file by BlaisePascal (Monk) on Aug 22, 2000 at 20:50 UTC
While merlyn has suggested the best answer (don't re-invent the wheel), that doesn't help you if your reason for doing this is to learn how the wheel was made to begin with. In this case, I can see one immediate problem which Sort::Field isn't going to help you with: dealing with the string v. numeric form of the data. I don't have a good answer to that problem, either. For something like this, the first-draft of my version of this wheel would look something like so, assuming string-comparison was sufficient: `# $n contains field to sort on # records to sort are in @ARGV my @records; while (<>) { # chomp deleted so we don't have to add \n later push @records, [ split ',' ]; } # now sort them; @records = sort { $$a[$n] cmp $$b[$n] } @records; # now print them foreach $record (@records) { print join (',',@$record); }` [download] The first loop reads in the CSV records, crudely splits them on commas, stores the results in an anonymous array, which it adds to the `@records` array. This is not robust -- it'll fail when the fields of the record contain quoted commas -- but for a first cut, it'll do the job. The `sort` line sorts the `@records` array using the code block given as the comparison function. In this case, since we know that each element of `@records` is a reference to an array, with each sub-array element being one of the original fields, we can access the nth field with `$$a[$n]`, so the comparison function simply compares the nth field of the two records. Finally, we output the now-sorted array.	[reply] [d/l] [select]
Re: sorting comma separated value file by maverick (Curate) on Aug 22, 2000 at 20:35 UTC
you might want to check out the CVS module if you have any quoted fields or the like, but if your data is as simple as your example then you could use something like this: `#!/usr/bin/perl use strict; my $col = $ARGV[1]; my $file = $ARGV[2]; $col--; my @data; open(IN,$file) \|\| die "can't open: $!"; while(<IN>) { s/[\r\n]//g; push(@data,[ split(/,/,$_) ]); } close(IN); foreach (sort {$a->[$col] cmp $b->[$col]} @data) { print join(",",@{$_}),"\n"; }` [download] I just coded this right off the top of my head, there may be errors. But you were looking for practice right? :) Here's how this works. I take the column you want and the filename, decrement the column number (zero origin indexing), read in the file and make it a two dimensional array (or array or arrays, if you wish) then use sort (saying how to sort the elements) and print them back out. I had slipped up and answered someone's homework once before, so I'm purposely being a little cryptic. That and I'm in a bit of a rush :). If you want a clearer explination, I'll come back and fill it later. But this should get you going in the right direction regardless. /\/\averick update:watch as Maverick makes silly typos! It's the CSV (comma seperated values) module not the CVS module which does something completely different altogether :)	[reply] [d/l]
RE: Re: sorting comma separated value file by eduardo (Curate) on Aug 22, 2000 at 21:50 UTC
you might want to check out the CVS module if you have any quoted fields or the like I might even check out the CSV module :) (merlyn said it's Text::CSV) maverick, wanna update your node?	[reply]
RE: Re: sorting comma separated value file by Anonymous Monk on Aug 22, 2000 at 22:24 UTC
My school days are, ahem, LONG over. Thanks for the help!	[reply]
Re: sorting comma separated value file by ZZamboni (Curate) on Aug 22, 2000 at 21:42 UTC
For production use, you might as well use Sort::Fields or Text::CSV as pointed out by merlyn. But if your purpose is learning, apart from the other solutions mentioned in this thread, you may want to take a look at the following. The line with the assignment to @sorted_lines is called a Schwartzian Transform and is named after its inventor, merlyn himself. For good discussions and descriptions of how this works, see: Schwartzian Transform (merlyn) Schwartzian Transform (chromatic) Schwartzian Transform vs. plain Perl How do I sort an array by (anything)? Complex sorting And the code is: `#!/usr/local/bin/perl -w use strict; use vars qw($field @lines @sorted_lines); $field=shift @ARGV; @lines=<>; # This is the Schwartzian transform @sorted_lines= map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, (split(/,\s*/, $_))[$field] ] } @lines; print @sorted_lines;` [download] Once you fully understand the Schwartzian Transform, you'll be on your way to enlightenment :-) --ZZamboni	[reply] [d/l]
RE: sorting comma separated value file by Adam (Vicar) on Aug 22, 2000 at 20:54 UTC
`#!perl use strict; # Always. my $useage = "perl script.pl column# file1 file2 etc.\n"; my $column = shift @ARGV or die $useage; --$column; # zero base instead of one base. my @input; push( @input, [ split /,/ ] ) while(<>); @input = sort {$$a[$column] cmp $$b[$column]} @input; print join( ', ', @$_ ), "\n" for @input;` [download]	[reply] [d/l]
Re: sorting comma separated value file by ferrency (Deacon) on Aug 22, 2000 at 20:43 UTC
This isn't exactly a Perl answer... But if you're on UNIX, the sort command does exactly what you need already: `sort -t, -kn in_file` [download] -t specifies the field separator (a comma in this case) -k specifies the number of the field to sort on (replace n with your favorite number) Update: Oops, I missed the part about "Practice Example." Nevermind. Definitely check out the Schwartzian Transform nodes, in that case. Alan	[reply] [d/l]
numbers OK; Re: sorting comma separated value file by tye (Sage) on Aug 22, 2000 at 22:04 UTC
The following code automatically detects numeric fields or even fields of mixed text and numbers and sorts them properly (one of my favorite tricks). Note that it doesn't handle quoted fields that contain commas (or even where some lines have the field quoted and some don't and where the quotes are not supposed to affect the sort order). Replace the simple `split/\s,\s/` with a use of the CSV module if you have that kind of data. `#!/usr/bin/perl -w use strict; die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; my @lines= <>; my @sort= map { my $x=join"\0"x5,(split/\s,\s/)[@cols]; $x =~ s/(^\|[^\d.])(\d+)/$1.pack("N",$2)/eg; $x } @lines; print @lines[ sort { $sort[$a] cmp $sort[$b] } 0..$#sort ];` [download] Note that this code explicitly avoids using nifty nested `map` tricks because they tend to slow things down. For example, the code above was over twice as fast as the following sexier code in my large-file tests: `die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; print map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { my $x=join"\0\0\0\0",(split/\s,\s/)[@cols]; s/(^\|[^\d.])(\d+)/$1.pack("N",$2)/eg; [$x,$_] } <>;` [download] P.S. The reason that this nested-map version is slow is not because I don't have tilly's illustrious patch (just to counter tilly's down-playing of how neat his patch is). Those are all 1-to-1 maps. (: P.P.S. I think that this is a Schwartzian Transform, but I wasn't sure I'd done it right and didn't want to mislabel it. :) Update: While I was typing, an example of a Schwartzian Transform was posted just above and, other than mixing 1 and 0, I did write one. - tye (my smileys are ambidextrous!)	[reply] [d/l] [select]
RE: sorting comma separated value file by bastard (Hermit) on Aug 23, 2000 at 01:34 UTC
Update: I thought this was an original idea. Instead of using the other methods everyone else was suggestion I attempted (succesfuly i might add) to create a hash with the column to be sorted as the key, and the whole string as the value for the key. Sort the hash keys and you're done (probably). Oh well, I guess it wan't novel enough (someone didn't like it). --------end of update-------------- Being that you are using this to learn, I will stick mostly to the theory (of how I would do it). I found this to be one novel way attack the problem. I would start by creating a hash with the keys of the hash equivalent to the values of the column you are sorting on. Use the desired field for the hash key (the ".=" covers multiple keys of the same name (tho not sorting them later)) `foreach my $line (@file) { chop ($line); $line =~ s/ //g; my @values = split (/,/, $line); $thehash{$values[$field]} .= "$line\n"; }` [download] Once the hash is created, sort it and print it: `@keys = sort(keys %thehash); foreach (@keys) { print "$thehash{$_}"; }` [download] This will cover having two identical key values, but will not sort them appropriately (at least in my mind) e.x.- (sorted on first field, it doesn't look to the second field for more clarification) 99,bbbc,5fa3aaa 99,ccc,2143 It will only sort the actual field specified. I leave fixing that to the reader. Then again you could just put the data into a database and use SQL.	[reply] [d/l] [select]