Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: sorting very large text files

by salva (Canon)
on Dec 21, 2009 at 11:36 UTC ( [id://813682]=note: print w/replies, xml ) Need Help??


in reply to Re: sorting very large text files
in thread sorting very large text files

I'd either create an index file of field -> file+line number and sort on that

Unfortunately it is not as easy as that. If you sort an index containing just the sorting keys and the offsets, you will need a last step where you combine the sorted index with the original file to create the final full sorted output file.

Doing this step straight ahead, just following the index and seeking into the original file to read every line, would be very, very, very inefficient. Roughly, (as estimated by BrowserUk) 165e6 records * 10ms per seek = 19 days!!!

A work around is to create the final file in several passes reading the original file sequentially and generating an slice in every pass... not so easy!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://813682]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-03-29 11:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found