Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Performance Trap - Opening/Closing Files Inside a Loop

by tmoertel (Chaplain)
on Dec 10, 2004 at 07:04 UTC ( [id://413774]=note: print w/replies, xml ) Need Help??


in reply to Re: Performance Trap - Opening/Closing Files Inside a Loop
in thread Performance Trap - Opening/Closing Files Inside a Loop

tachyon, your code is likely to be faster not so much because it shaves away Perl cycles but because it will greatly reduce disk seeks, which are probably dominating L~R's run time. (See my other post in this thread for more on this.)

L~R: Assuming that you have the RAM, can you compare tachyon's code's run time to the other implementations? My guess is that tachyon's code will fare well. (If you don't have the RAM, just tweak the code so that it will process, say, 100_000 or so lines per pass and clear out %fh between passes. Also, you'll need to open files in append mode.)

Cheers,
Tom

  • Comment on Re^2: Performance Trap - Opening/Closing Files Inside a Loop

Replies are listed 'Best First'.
Re^3: Performance Trap - Opening/Closing Files Inside a Loop
by tachyon (Chancellor) on Dec 10, 2004 at 08:26 UTC

    I agree reducing the number of seeks you need is vital. Given an average 3 msec seek time you can only have 333 seeks per second. This is of course glacial. Ignoring buffering the original code effectively needed 2 seeks (or more) per line, the improved version required at least 1 seek. In the example I presented the number of seeks required is a function of the number of files we need to create, not the number of lines in the input file. This will be a significant improvement provided that the number of unique files is less than the number of input lines.

    cheers

    tachyon

Re^3: Performance Trap - Opening/Closing Files Inside a Loop
by Limbic~Region (Chancellor) on Dec 10, 2004 at 15:38 UTC
    tmoertel,
    I had thought about this myself after posting. The reason I didn't give it a lot of initial thought is because the Java developer made it clear that I was not welcome in the sandbox. My guess is that some sort of limited buffer would be best since that's still a whole lot of lines to be keeping all in memory.

    Cheers - L~R

      If that is the case - combine the two methods.
      1. Buffer the strings up to say 10k or more.
      2. Once they hit that size - look for an open, or cache a new filehandle.
      3. Print out the buffered string for the file and clear the buffer.
      4. Finish off by flushing remaining buffers.
      my @a=qw(random brilliant braindead); print $a[rand(@a)];

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://413774]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-04-18 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found