Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: sorting very large text files

by salva (Canon)
on Dec 19, 2009 at 09:07 UTC ( [id://813497]=note: print w/replies, xml ) Need Help??


in reply to sorting very large text files

GNU sort uses the following rule to determine the size of the memory buffer used for the mergesort algorithm:
buffer_size = min(1/8 * physical_RAM, free_memory)
That's somewhat conservative, specially if the machine you are using is not very loaded. Increasing that buffer size will make the sorting much faster. For instance
sort -S 3.5G ...
Another way to increase the speed of the operation, is to reduce the size of the file using a more compact encoding. For instance, representing numbers in binary format instead of as ASCII strings will reduce its size to 1/5; DNA sequences can be reduced to 1/4; enumerations to 1 or 2 bytes, etc.

This kind of compacting will introduce "\n" characters in the stream that need to be escaped. A simple way is to perform the following expansion:

my %expand = ( "\x10" => "\x11\x11", "\x11" => "\x11\x12"); s/([\x10\x11])/$expand{$1}/g;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://813497]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-03-29 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found