http://qs321.pair.com?node_id=1197607


in reply to Return all the data from child to parent with Parallel::Forkmanager

While your question of child processes sending data to parents is perfectly reasonable, I would challenge the assumption that reading 4 large files in parallel will be any faster than reading them sequentially, unless they are on different disks in the same system.

Perl's IO is well-optimized, and I've generally found that the disk throughput is the bottleneck.

On another tack, if there are 4 different file types, and a different script is needed for each, then you may find some benefit in reading/reducing/writing out to interim files, which can be read by the final script. But the interim files should probably be 100 to 1000 times smaller than the originals. You will find network bandwidth the limiting factor unless the interim files are much smaller than the originals. There are several ways to serialize data that can be restored to native Perl data structures.

Perl Maven has a nice list of serializers to get you started.

Finally, DBM::Deep will store a Perl data structure on disk, as if it were in memory, but the access speed is much slower. I used this for an in-memory data structure that was larger than the virtual memory, and it worked very well.

-QM
--
Quantum Mechanics: The dreams stuff is made of

  • Comment on Re: Return all the data from child to parent with Parallel::Forkmanager