laziness, impatience, and hubris | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
The very easiest way you could do this would be to divide your input file into N pieces and process each piece in parallel by starting your script on that piece. Have each process output to a file of its own and stitch them together at the end. If you start one process per CPU and the black-magic you do is CPU bound you could get something like a linear speedup.
That of course means you can't take data strictly from STDIN, but really that's a silly way to process 4.1GB of data! If you really, really need to read it from a stream and output it in-order then you'll have to have a parent reading the stream, forking kids and then collecting the results in buffers so you can output in-order. Start with Parallel::ForkManager, which will handle giving out the work and then mix in some IO::Pipe and IO::Select for collecting the results. Be sure you divide the work into sizable chunks, forking a new child for each line isn't going to help very much! Now go give it a try and don't come back until you have some code to post! -sam In reply to Re: How do you parallelize STDIN for large file processing?
by samtregar
|
|