in reply to Return all the data from child to parent with Parallel::Forkmanager
Perl's IO is well-optimized, and I've generally found that the disk throughput is the bottleneck.
On another tack, if there are 4 different file types, and a different script is needed for each, then you may find some benefit in reading/reducing/writing out to interim files, which can be read by the final script. But the interim files should probably be 100 to 1000 times smaller than the originals. You will find network bandwidth the limiting factor unless the interim files are much smaller than the originals. There are several ways to serialize data that can be restored to native Perl data structures.
Perl Maven has a nice list of serializers to get you started.
Finally, DBM::Deep will store a Perl data structure on disk, as if it were in memory, but the access speed is much slower. I used this for an in-memory data structure that was larger than the virtual memory, and it worked very well.
-QM
--
Quantum Mechanics: The dreams stuff is made of
|
---|