Re^2: Performance Trap - Opening/Closing Files Inside a Loop

tachyon, your code is likely to be faster not so much because it shaves away Perl cycles but because it will greatly reduce disk seeks, which are probably dominating L~R's run time. (See my other post in this thread for more on this.)

L~R: Assuming that you have the RAM, can you compare tachyon's code's run time to the other implementations? My guess is that tachyon's code will fare well. (If you don't have the RAM, just tweak the code so that it will process, say, 100_000 or so lines per pass and clear out %fh between passes. Also, you'll need to open files in append mode.)

Cheers,
Tom

Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder

Comment on Re^2: Performance Trap - Opening/Closing Files Inside a Loop

Replies are listed 'Best First'.
Re^3: Performance Trap - Opening/Closing Files Inside a Loop by tachyon (Chancellor) on Dec 10, 2004 at 08:26 UTC
I agree reducing the number of seeks you need is vital. Given an average 3 msec seek time you can only have 333 seeks per second. This is of course glacial. Ignoring buffering the original code effectively needed 2 seeks (or more) per line, the improved version required at least 1 seek. In the example I presented the number of seeks required is a function of the number of files we need to create, not the number of lines in the input file. This will be a significant improvement provided that the number of unique files is less than the number of input lines. cheers tachyon	[reply]
Re^3: Performance Trap - Opening/Closing Files Inside a Loop by Limbic~Region (Chancellor) on Dec 10, 2004 at 15:38 UTC
tmoertel, I had thought about this myself after posting. The reason I didn't give it a lot of initial thought is because the Java developer made it clear that I was not welcome in the sandbox. My guess is that some sort of limited buffer would be best since that's still a whole lot of lines to be keeping all in memory. Cheers - L~R	[reply]
Re^4: Performance Trap - Opening/Closing Files Inside a Loop by Rhandom (Curate) on Dec 10, 2004 at 16:23 UTC
If that is the case - combine the two methods. Buffer the strings up to say 10k or more. Once they hit that size - look for an open, or cache a new filehandle. Print out the buffered string for the file and clear the buffer. Finish off by flushing remaining buffers. my @a=qw(random brilliant braindead); print $a[rand(@a)];	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks