taopunk has asked for the wisdom of the Perl Monks concerning the following question:
I want to write a Perl script to sort a file containing multiple comma separated values (and output the results back to standard output)
scriptname -f n in_file
Switch Description
-f select field to sort by
where n = the nth value (in the csv) to sort by.
For example, here is what a in_file might look like...
1, abc, asldfjajfal
15, djsk, 1sljfal
24, aldfj, jfaldjfa
If I issue a 'scriptname -f 1 in_file, the output should be:
1, abc, asldfjajfal
15, djsk, 1sljfal
24, aldfj, jfaldjfa
If I issue a 'scriptname -f 3 in_file, the output should be:
1, abc, asldfjajfal
24, aldfj, jfaldjfa
15, djsk, 1sljfal
Anyone has any ideas? I am a complete Perl newbie stuck on a prctice example I created for myself.
Help!
Re: sorting comma separated value file
by merlyn (Sage) on Aug 22, 2000 at 20:27 UTC
|
| [reply] |
|
That was the fastest reply ever...
Wait, do I have your book sitting in front of me?!?
Thanks for the pointer.
| [reply] |
RE: sorting comma separated value file
by BlaisePascal (Monk) on Aug 22, 2000 at 20:50 UTC
|
While merlyn has suggested the best answer (don't re-invent the wheel), that doesn't help you if your reason for doing this is to learn how the wheel was made to begin with.
In this case, I can see one immediate problem which Sort::Field isn't going to help you with: dealing with the string v. numeric form of the data. I don't have a good answer to that problem, either.
For something like this, the first-draft of my version of this wheel would look something like so, assuming string-comparison was sufficient:
# $n contains field to sort on
# records to sort are in @ARGV
my @records;
while (<>) {
# chomp deleted so we don't have to add \n later
push @records, [ split ',' ];
}
# now sort them;
@records = sort { $$a[$n] cmp $$b[$n] } @records;
# now print them
foreach $record (@records) {
print join (',',@$record);
}
The first loop reads in the CSV records, crudely splits them on commas, stores the results in an anonymous array, which it adds to the @records array. This is not robust -- it'll fail when the fields of the record contain quoted commas -- but for a first cut, it'll do the job.
The sort line sorts the @records array using the code block given as the comparison function. In this case, since we know that each element of @records is a reference to an array, with each sub-array element being one of the original fields, we can access the nth field with $$a[$n], so the comparison function simply compares the nth field of the two records.
Finally, we output the now-sorted array.
| [reply] [d/l] [select] |
Re: sorting comma separated value file
by maverick (Curate) on Aug 22, 2000 at 20:35 UTC
|
you might want to check out the CVS module if you have any
quoted fields or the like, but if your data is as simple as
your example then you could use something like this:
#!/usr/bin/perl
use strict;
my $col = $ARGV[1];
my $file = $ARGV[2];
$col--;
my @data;
open(IN,$file) || die "can't open: $!";
while(<IN>) {
s/[\r\n]//g;
push(@data,[ split(/,/,$_) ]);
}
close(IN);
foreach (sort {$a->[$col] cmp $b->[$col]} @data) {
print join(",",@{$_}),"\n";
}
I just coded this right off the top of my head, there may be
errors. But you were looking for practice right? :)
Here's how this works. I take the column you want and the filename,
decrement the column number (zero origin indexing), read in the
file and make it a two dimensional array (or array or arrays, if you wish)
then use sort (saying how to sort the elements) and print them back out.
I had slipped up and answered someone's homework once before,
so I'm purposely being a little cryptic. That and I'm in a bit
of a rush :). If you want a clearer explination, I'll come
back and fill it later. But this should get you going in
the right direction regardless.
/\/\averick
update:watch as Maverick makes silly typos! It's the CSV (comma seperated values)
module not the CVS module which does something completely different altogether :) | [reply] [d/l] |
|
| [reply] |
|
My school days are, ahem, LONG over.
Thanks for the help!
| [reply] |
Re: sorting comma separated value file
by ZZamboni (Curate) on Aug 22, 2000 at 21:42 UTC
|
For production use, you might as well use Sort::Fields
or Text::CSV as pointed out by merlyn.
But if your purpose is learning, apart from the other solutions
mentioned in this thread, you may want to take a look
at the following. The line with the assignment to @sorted_lines
is called a Schwartzian Transform and is named after its
inventor, merlyn himself. For good discussions and
descriptions of how this works, see:
And the code is:
#!/usr/local/bin/perl -w
use strict;
use vars qw($field @lines @sorted_lines);
$field=shift @ARGV;
@lines=<>;
# This is the Schwartzian transform
@sorted_lines=
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, (split(/,\s*/, $_))[$field] ] }
@lines;
print @sorted_lines;
Once you fully understand the Schwartzian Transform, you'll
be on your way to enlightenment :-)
--ZZamboni
| [reply] [d/l] |
RE: sorting comma separated value file
by Adam (Vicar) on Aug 22, 2000 at 20:54 UTC
|
#!perl
use strict; # Always.
my $useage = "perl script.pl column# file1 file2 etc.\n";
my $column = shift @ARGV or die $useage;
--$column; # zero base instead of one base.
my @input;
push( @input, [ split /,/ ] ) while(<>);
@input = sort {$$a[$column] cmp $$b[$column]} @input;
print join( ', ', @$_ ), "\n" for @input;
| [reply] [d/l] |
Re: sorting comma separated value file
by ferrency (Deacon) on Aug 22, 2000 at 20:43 UTC
|
This isn't exactly a Perl answer...
But if you're on UNIX, the sort command does exactly what you
need already:
sort -t, -kn in_file
-t specifies the field separator (a comma in this case)
-k specifies the number of the field to sort on (replace n with your favorite number)
Update: Oops, I missed the part about "Practice Example." Nevermind. Definitely check out the Schwartzian Transform nodes, in that case.
Alan | [reply] [d/l] |
numbers OK; Re: sorting comma separated value file
by tye (Sage) on Aug 22, 2000 at 22:04 UTC
|
The following code automatically detects numeric fields
or even fields of mixed text and numbers and sorts them
properly (one of my favorite tricks). Note that it doesn't
handle quoted fields that contain commas (or even where
some lines have the field quoted and some don't and where the
quotes are not supposed to affect the sort order). Replace
the simple split/\s*,\s*/ with a use of the
CSV module if you have that kind of data.
#!/usr/bin/perl -w
use strict;
die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV;
my @cols= map { $_-1 } split/,/,shift;
my @lines= <>;
my @sort= map { my $x=join"\0"x5,(split/\s*,\s*/)[@cols];
$x =~ s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; $x } @lines;
print @lines[ sort { $sort[$a] cmp $sort[$b] } 0..$#sort ];
Note that this code explicitly avoids using nifty nested
map tricks because they tend to slow things
down. For example, the code above was over twice as fast
as the following sexier code in my large-file tests:
die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV;
my @cols= map { $_-1 } split/,/,shift;
print map { $_->[1] }
sort { $a->[0] cmp $b->[0] }
map { my $x=join"\0\0\0\0",(split/\s*,\s*/)[@cols];
s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; [$x,$_] } <>;
P.S. The reason that this nested-map version is slow is
not because I don't have tilly's illustrious
patch (just to counter tilly's down-playing of how
neat his patch is). Those are all 1-to-1 maps. (:
P.P.S. I think that this is a Schwartzian Transform, but I
wasn't sure I'd done it right and didn't want to mislabel
it. :) Update: While I was typing, an example of a
Schwartzian Transform was posted just above and, other than
mixing 1 and 0, I did write one.
-
tye
(my smileys are ambidextrous!) | [reply] [d/l] [select] |
RE: sorting comma separated value file
by bastard (Hermit) on Aug 23, 2000 at 01:34 UTC
|
Update: I thought this was an original idea. Instead of using the other methods everyone else was suggestion I attempted (succesfuly i might add) to create a hash with the column to be sorted as the key, and the whole string as the value for the key. Sort the hash keys and you're done (probably). Oh well, I guess it wan't novel enough (someone didn't like it).
--------end of update--------------
Being that you are using this to learn, I will stick mostly to the theory (of how I would do it).
I found this to be one novel way attack the problem.
I would start by creating a hash with the keys of the hash equivalent to the values of the column you are sorting on.
Use the desired field for the hash key (the ".=" covers multiple keys of the same name (tho not sorting them later))
foreach my $line (@file) {
chop ($line);
$line =~ s/ //g;
my @values = split (/,/, $line);
$thehash{$values[$field]} .= "$line\n";
}
Once the hash is created, sort it and print it:
@keys = sort(keys %thehash);
foreach (@keys) {
print "$thehash{$_}";
}
This will cover having two identical key values, but
will not sort them appropriately (at least in my mind)
e.x.- (sorted on first field, it doesn't look to the second field for more clarification)
99,bbbc,5fa3aaa
99,ccc,2143
It will only sort the actual field specified.
I leave fixing that to the reader.
Then again you could just put the data into a database and use SQL. | [reply] [d/l] [select] |
|
|