Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: sorting type question- space problems

by mbethke (Hermit)
on Sep 14, 2013 at 19:42 UTC ( [id://1054130]=note: print w/replies, xml ) Need Help??


in reply to Re^2: sorting type question- space problems
in thread sorting type question- space problems

First, RickardK's solution of using sort(1) makes sense. If it's there, use it, as it's written in C, highly optimized and well tested.

That said, if your keys tend to be small compared to the rest of the records, in-memory sort may be feasible by recording only the keys and the file offsets of their respective lines:

my @a; open my $fh, "<","input" or die $!; while(1) { last if eof($fh); my $pos = tell($fh); my ($k1,$k2) = split /\s+/, <$fh>; push @a, [$k1, $k2, $pos]; } foreach(sort { $a->[0] cmp $b->[0] or $a->[1] cmp $b->[1]} @a) { seek($fh, $_->[2], 0); print scalar <$fh>; }

Probably not the fastest, but if you want to avoid external sorts both in the sense of shelling out and tempfiles, it may be worth a try. At 5M lines it will likely break the 100 MB limit but without tricks like assuming something about characters that don't occur in the keys and thus encoding all the keys and offsets in one string it's unlikely to get much smaller.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1054130]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2024-04-23 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found