Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: How do you parallelize STDIN for large file processing?

by samtregar (Abbot)
on Feb 06, 2009 at 03:22 UTC ( #741776=note: print w/replies, xml ) Need Help??

in reply to How do you parallelize STDIN for large file processing?

The very easiest way you could do this would be to divide your input file into N pieces and process each piece in parallel by starting your script on that piece. Have each process output to a file of its own and stitch them together at the end. If you start one process per CPU and the black-magic you do is CPU bound you could get something like a linear speedup.

That of course means you can't take data strictly from STDIN, but really that's a silly way to process 4.1GB of data!

If you really, really need to read it from a stream and output it in-order then you'll have to have a parent reading the stream, forking kids and then collecting the results in buffers so you can output in-order. Start with Parallel::ForkManager, which will handle giving out the work and then mix in some IO::Pipe and IO::Select for collecting the results. Be sure you divide the work into sizable chunks, forking a new child for each line isn't going to help very much!

Now go give it a try and don't come back until you have some code to post!


  • Comment on Re: How do you parallelize STDIN for large file processing?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://741776]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2020-07-08 13:04 GMT
Find Nodes?
    Voting Booth?

    No recent polls found