Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

sorting comma separated value file

by taopunk (Initiate)
on Aug 22, 2000 at 20:22 UTC ( [id://29039]=perlquestion: print w/replies, xml ) Need Help??

taopunk has asked for the wisdom of the Perl Monks concerning the following question:

I want to write a Perl script to sort a file containing multiple comma separated values (and output the results back to standard output)

scriptname -f n in_file

Switch Description
-f select field to sort by
where n = the nth value (in the csv) to sort by.

For example, here is what a in_file might look like...
1, abc, asldfjajfal
15, djsk, 1sljfal
24, aldfj, jfaldjfa

If I issue a 'scriptname -f 1 in_file, the output should be:
1, abc, asldfjajfal
15, djsk, 1sljfal
24, aldfj, jfaldjfa

If I issue a 'scriptname -f 3 in_file, the output should be:
1, abc, asldfjajfal
24, aldfj, jfaldjfa
15, djsk, 1sljfal

Anyone has any ideas? I am a complete Perl newbie stuck on a prctice example I created for myself.

Help!

Replies are listed 'Best First'.
Re: sorting comma separated value file
by merlyn (Sage) on Aug 22, 2000 at 20:27 UTC
      That was the fastest reply ever... Wait, do I have your book sitting in front of me?!? Thanks for the pointer.
RE: sorting comma separated value file
by BlaisePascal (Monk) on Aug 22, 2000 at 20:50 UTC
    While merlyn has suggested the best answer (don't re-invent the wheel), that doesn't help you if your reason for doing this is to learn how the wheel was made to begin with.

    In this case, I can see one immediate problem which Sort::Field isn't going to help you with: dealing with the string v. numeric form of the data. I don't have a good answer to that problem, either.

    For something like this, the first-draft of my version of this wheel would look something like so, assuming string-comparison was sufficient:

    # $n contains field to sort on # records to sort are in @ARGV my @records; while (<>) { # chomp deleted so we don't have to add \n later push @records, [ split ',' ]; } # now sort them; @records = sort { $$a[$n] cmp $$b[$n] } @records; # now print them foreach $record (@records) { print join (',',@$record); }
    The first loop reads in the CSV records, crudely splits them on commas, stores the results in an anonymous array, which it adds to the @records array. This is not robust -- it'll fail when the fields of the record contain quoted commas -- but for a first cut, it'll do the job.

    The sort line sorts the @records array using the code block given as the comparison function. In this case, since we know that each element of @records is a reference to an array, with each sub-array element being one of the original fields, we can access the nth field with $$a[$n], so the comparison function simply compares the nth field of the two records.

    Finally, we output the now-sorted array.

Re: sorting comma separated value file
by maverick (Curate) on Aug 22, 2000 at 20:35 UTC
    you might want to check out the CVS module if you have any quoted fields or the like, but if your data is as simple as your example then you could use something like this:
    #!/usr/bin/perl use strict; my $col = $ARGV[1]; my $file = $ARGV[2]; $col--; my @data; open(IN,$file) || die "can't open: $!"; while(<IN>) { s/[\r\n]//g; push(@data,[ split(/,/,$_) ]); } close(IN); foreach (sort {$a->[$col] cmp $b->[$col]} @data) { print join(",",@{$_}),"\n"; }
    I just coded this right off the top of my head, there may be errors. But you were looking for practice right? :)

    Here's how this works. I take the column you want and the filename, decrement the column number (zero origin indexing), read in the file and make it a two dimensional array (or array or arrays, if you wish) then use sort (saying how to sort the elements) and print them back out.

    I had slipped up and answered someone's homework once before, so I'm purposely being a little cryptic. That and I'm in a bit of a rush :). If you want a clearer explination, I'll come back and fill it later. But this should get you going in the right direction regardless.

    /\/\averick

    update:watch as Maverick makes silly typos! It's the CSV (comma seperated values) module not the CVS module which does something completely different altogether :)

      you might want to check out the CVS module if you have any quoted fields or the like

      I might even check out the CSV module :) (merlyn said it's Text::CSV) maverick, wanna update your node?

      My school days are, ahem, LONG over. Thanks for the help!
Re: sorting comma separated value file
by ZZamboni (Curate) on Aug 22, 2000 at 21:42 UTC
RE: sorting comma separated value file
by Adam (Vicar) on Aug 22, 2000 at 20:54 UTC
    #!perl use strict; # Always. my $useage = "perl script.pl column# file1 file2 etc.\n"; my $column = shift @ARGV or die $useage; --$column; # zero base instead of one base. my @input; push( @input, [ split /,/ ] ) while(<>); @input = sort {$$a[$column] cmp $$b[$column]} @input; print join( ', ', @$_ ), "\n" for @input;
Re: sorting comma separated value file
by ferrency (Deacon) on Aug 22, 2000 at 20:43 UTC
    This isn't exactly a Perl answer...

    But if you're on UNIX, the sort command does exactly what you need already:

    sort -t, -kn in_file
    -t specifies the field separator (a comma in this case)
    -k specifies the number of the field to sort on (replace n with your favorite number)

    Update: Oops, I missed the part about "Practice Example." Nevermind. Definitely check out the Schwartzian Transform nodes, in that case.

    Alan

numbers OK; Re: sorting comma separated value file
by tye (Sage) on Aug 22, 2000 at 22:04 UTC

    The following code automatically detects numeric fields or even fields of mixed text and numbers and sorts them properly (one of my favorite tricks). Note that it doesn't handle quoted fields that contain commas (or even where some lines have the field quoted and some don't and where the quotes are not supposed to affect the sort order). Replace the simple split/\s*,\s*/ with a use of the CSV module if you have that kind of data.

    #!/usr/bin/perl -w use strict; die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; my @lines= <>; my @sort= map { my $x=join"\0"x5,(split/\s*,\s*/)[@cols]; $x =~ s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; $x } @lines; print @lines[ sort { $sort[$a] cmp $sort[$b] } 0..$#sort ];

    Note that this code explicitly avoids using nifty nested map tricks because they tend to slow things down. For example, the code above was over twice as fast as the following sexier code in my large-file tests:

    die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; print map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { my $x=join"\0\0\0\0",(split/\s*,\s*/)[@cols]; s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; [$x,$_] } <>;

    P.S. The reason that this nested-map version is slow is not because I don't have tilly's illustrious patch (just to counter tilly's down-playing of how neat his patch is). Those are all 1-to-1 maps. (:

    P.P.S. I think that this is a Schwartzian Transform, but I wasn't sure I'd done it right and didn't want to mislabel it. :) Update: While I was typing, an example of a Schwartzian Transform was posted just above and, other than mixing 1 and 0, I did write one.

            - tye (my smileys are ambidextrous!)
RE: sorting comma separated value file
by bastard (Hermit) on Aug 23, 2000 at 01:34 UTC
    Update: I thought this was an original idea. Instead of using the other methods everyone else was suggestion I attempted (succesfuly i might add) to create a hash with the column to be sorted as the key, and the whole string as the value for the key. Sort the hash keys and you're done (probably). Oh well, I guess it wan't novel enough (someone didn't like it).
    --------end of update--------------
    Being that you are using this to learn, I will stick mostly to the theory (of how I would do it). I found this to be one novel way attack the problem.

    I would start by creating a hash with the keys of the hash equivalent to the values of the column you are sorting on.
    Use the desired field for the hash key (the ".=" covers multiple keys of the same name (tho not sorting them later))

    foreach my $line (@file) { chop ($line); $line =~ s/ //g; my @values = split (/,/, $line); $thehash{$values[$field]} .= "$line\n"; }

    Once the hash is created, sort it and print it:

    @keys = sort(keys %thehash); foreach (@keys) { print "$thehash{$_}"; }

    This will cover having two identical key values, but will not sort them appropriately (at least in my mind)
    e.x.- (sorted on first field, it doesn't look to the second field for more clarification)
    99,bbbc,5fa3aaa
    99,ccc,2143
    It will only sort the actual field specified.

    I leave fixing that to the reader.

    Then again you could just put the data into a database and use SQL.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://29039]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2024-04-25 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found