Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Parse data into large number of output files.

by pg (Canon)
on Sep 29, 2004 at 02:21 UTC ( [id://394823]=note: print w/replies, xml ) Need Help??


in reply to Parse data into large number of output files.

My solution would be:

  • Keep a list of file handles, but limit the number of file handles you could open at the same time, say 100.
  • For each open file, keep a timestamp on it. Update the timestamp, everytime when you write to it.
  • If you got a line, and its related file is open, fine, just write out whatever you want.
  • If the related file is not open, go thru the list, close the one has the longest unused time. open the one you wanted.

Well, you can come up other ways, other than the last access time, to rank your files. For example, number of lines wrote. You have to try them out, and find the best way of ranking.

  • Comment on Re: Parse data into large number of output files.

Replies are listed 'Best First'.
Re^2: Parse data into large number of output files.
by tachyon (Chancellor) on Sep 29, 2004 at 02:40 UTC

    Why 100? Why not 200, 500, 1000, 10000? What makes you think 100 is a good number? If you can't answer that question then why suggest it? You are buying into an invalid assertion made by the OP about how many filehandles you can really have open

      I guess there is sort of misunderstanding. You thought I was talking about the maximum number of file handlers the OS allowed, but I was not. This is not any sort of physical limitation or something. To me, it is not a good idea to create a list with potentially hugh unknown size, this is why this solution came in. You don't want to code for unknown, the unknown here is the resource used. An easy way to resolve this is to limit the array size.

      I do take your comment positively, and you made a good point that, this number shall not be greater than the OS allowed maximum number. However I do not suggest to reach the maximum.

      The OP has to find out a reasonable number by experimenting. I suggested to rank, because you want to reduce the number of open/close operation.

      "If you can't answer that question then why suggest it?"

      Just to be frank...It is quite okay for you rush to your assumption, which happened to be wrong in this case. Well, I do the same thing from time to time. But it was quite unneccessary for you to make comments that were not technical related. But never mind, no big deal ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://394823]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-19 12:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found