Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The following code automatically detects numeric fields or even fields of mixed text and numbers and sorts them properly (one of my favorite tricks). Note that it doesn't handle quoted fields that contain commas (or even where some lines have the field quoted and some don't and where the quotes are not supposed to affect the sort order). Replace the simple split/\s*,\s*/ with a use of the CSV module if you have that kind of data.

#!/usr/bin/perl -w use strict; die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; my @lines= <>; my @sort= map { my $x=join"\0"x5,(split/\s*,\s*/)[@cols]; $x =~ s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; $x } @lines; print @lines[ sort { $sort[$a] cmp $sort[$b] } 0..$#sort ];

Note that this code explicitly avoids using nifty nested map tricks because they tend to slow things down. For example, the code above was over twice as fast as the following sexier code in my large-file tests:

die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV; my @cols= map { $_-1 } split/,/,shift; print map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { my $x=join"\0\0\0\0",(split/\s*,\s*/)[@cols]; s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; [$x,$_] } <>;

P.S. The reason that this nested-map version is slow is not because I don't have tilly's illustrious patch (just to counter tilly's down-playing of how neat his patch is). Those are all 1-to-1 maps. (:

P.P.S. I think that this is a Schwartzian Transform, but I wasn't sure I'd done it right and didn't want to mislabel it. :) Update: While I was typing, an example of a Schwartzian Transform was posted just above and, other than mixing 1 and 0, I did write one.

        - tye (my smileys are ambidextrous!)

In reply to numbers OK; Re: sorting comma separated value file by tye
in thread sorting comma separated value file by taopunk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found