Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: sorting a file - multilevel

by jethro (Monsignor)
on Jun 14, 2008 at 02:24 UTC ( [id://692043]=note: print w/replies, xml ) Need Help??


in reply to sorting a file - multilevel

Has the first number always the same length? If yes, you can use unix sort (like runrig suggested) as a first step.

Afterwards the file is now sorted by your first and secondary criteria. Only lines with same first and secondary columns are still unsorted, but they are on consecutive lines and small enough to be sorted in memory

So your program should now read lines from the presorted file and collect lines with equal first and second columns. Sort them with perl sort on the 10th column and write them to a new file.

The new file is now sorted to your criterias.

If unix sort doesn't change the ordering of lines that are equal (which I believe it does, but I'm not sure) then you can do the complete sorting with unix sort. Just use sort with parameter -k=10 to first sort the file by the 10th column, then with -k=1,2 to sort by the first and second column.

Replies are listed 'Best First'.
Re^2: sorting a file - multilevel
by graff (Chancellor) on Jun 14, 2008 at 14:33 UTC
    Has the first number always the same length?

    Length of a numeric field is not an issue. Using unix (or gnu) sort, the OP problem would be a simple command line:

    sort -k 1n -k 2n -k 10 big.file > sorted.big.file
    That's equivalent to doing something like this in perl (but the perl version might take a lot longer, esp. if the file, stored in perl as an AoA, is bigger than available RAM):
    perl -lane 'push @f,[@F]; END{ print join(" ",@$_) for (sort{$$a[0]<=>$$b[0] || $$a[1]<=>$$b[1] || $$a[9] cmp $$b[9]} @f)}' big.file > sorted.big.file

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://692043]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-24 02:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found