Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Performance Trap - Opening/Closing Files Inside a Loop

by Limbic~Region (Chancellor)
on Dec 10, 2004 at 00:24 UTC ( [id://413728]=note: print w/replies, xml ) Need Help??


in reply to Re: Performance Trap - Opening/Closing Files Inside a Loop
in thread Performance Trap - Opening/Closing Files Inside a Loop

Mark,
I am likely being thick, but I don't understand. The value of the column (not a column name) is what is being used as the file name. It is not possible to know in advance the values without going through every line of every file first. Even if you did that, you would still need to store the information in a hash so that you could look up the filehandle corresponding to that value later so I see this as a slower variation on my proposed solution. What am I missing?

Cheers - L~R

  • Comment on Re^2: Performance Trap - Opening/Closing Files Inside a Loop

Replies are listed 'Best First'.
Re^3: Performance Trap - Opening/Closing Files Inside a Loop
by kvale (Monsignor) on Dec 10, 2004 at 02:07 UTC
    Ah, sorry I wasn't clear. I assumed that one knew the (small) set of possible column values to be used as filenames. If you do not know this set of values, my method may still be faster, but prescanning the table will add some time to the execution.

    Once you have established a hashmap from column values to filehandles, then you can print to the desired filehandle. I expect a single hash lookup to be much faster than a pair of system calls for opening and closing files; in addition to the OS bookkeeping and disk IO overhead for opening and closing, each file buffer is flushed (and, depending on the OS and filesystem, the disk is written to) for every line written.

    Another completely different method is to append the lines to different strings, one for each column value. Then write them all the strings out to files after the loop.

    -Mark

      kvale,
      I expect a single hash lookup to be much faster than a pair of system calls for opening and closing files

      I don't want to sound like I am beating a dead horse here, but that sounds identical to my solution except your way seems like it would be slower because instead of figuring it out as you go, you are processing the files twice.

      Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://413728]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-26 04:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found