Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts

by jarich (Curate)
on Jun 30, 2004 at 09:47 UTC ( #370718=note: print w/replies, xml ) Need Help??

in reply to How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts

First of all, welcome to Perl maxon11.

You would be vastly assisted by reading the various tutorials on this site, and getting yourself a good book about Perl. Randal Schwartz's "Learning Perl" might be perfect.

I particularly recommend that you read the On asking for help and How (Not) To Ask A Question nodes as I feel that your questions here could have benefitted by your being more brief. At the very least, you can probably depend on us having access to the sort documentation already. ;) Even if Randal did sent it to you.

The uninitialized value warning from is probably a minor bug. If that's all it gave you however, then I wouldn't worry too much. It looks like it otherwise sorted your file. In the second case - yes you probably ran out of memory. How much memory are your programs allowed to take up? If you're on a Unix-like operating system then you can usually type in "ulimit" on the command line and find out.

Mind you, if you're using a Unix-like operating system then you should probably use the unix sort. :)

The difference between the two sort code listings that you provide is that the first makes several copies of the data in memory whereas the second does not.

To explain how the second program works, I'll reformat it and add in some comments. I've also made some slight changes to make it a better program generally.

#!/usr/bin/perl my $input = "H2Z_ZDL0.000"; my $ouput = "sorted.txt"; # Open $input for reading. open(ORIGFILE, "<", $input) or die "Could not open $input: $!"; # Open $output for writing, destroying current file contents open (FINALFILE, ">", $output) or die "Could not open $output: $!"; # This line does several things. It reads all the lines # from ORIGFILE into memory (which is done in the <ORIGFILE> # bit), sorts them (using sort) and then prints them out # to the file in FINALFILE. print FINALFILE sort(<ORIGFILE>); # close file in FINALFILE, flushes buffer close (FINALFILE); # close file in ORIGFILE close (ORIGFILE);

You ask how Perl knows to default sort the whole record in alphabetical order. This is answered right up the top of the sort documentation:

If SUBNAME or BLOCK is omitted, "sort"s in standard string comparison order.

That is, if you write sort @array then sort will sort alphabetically.

I would presume that you actually want it to sort numerically. You can do this by writing: sort { $a <=> $b} @array just like it says in the documentation.

Good luck with learing Perl.

Hope this helps


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://370718]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2022-09-30 07:24 GMT
Find Nodes?
    Voting Booth?
    I prefer my indexes to start at:

    Results (125 votes). Check out past polls.