Re: How do you parallelize STDIN for large file processing?

in reply to How do you parallelize STDIN for large file processing?

The very easiest way you could do this would be to divide your input file into N pieces and process each piece in parallel by starting your script on that piece. Have each process output to a file of its own and stitch them together at the end. If you start one process per CPU and the black-magic you do is CPU bound you could get something like a linear speedup.

That of course means you can't take data strictly from STDIN, but really that's a silly way to process 4.1GB of data!

If you really, really need to read it from a stream and output it in-order then you'll have to have a parent reading the stream, forking kids and then collecting the results in buffers so you can output in-order. Start with Parallel::ForkManager, which will handle giving out the work and then mix in some IO::Pipe and IO::Select for collecting the results. Be sure you divide the work into sizable chunks, forking a new child for each line isn't going to help very much!

Now go give it a try and don't come back until you have some code to post!

-sam

Comment on Re: How do you parallelize STDIN for large file processing?

In Section Seekers of Perl Wisdom