Problems? Is your data what you think it is? | |
PerlMonks |
Re^3: Need to speed up many regex substitutions and somehow make them a here-doc list (MCE solution)by marioroy (Prior) |
on Oct 08, 2022 at 01:52 UTC ( [id://11147289]=note: print w/replies, xml ) | Need Help?? |
The OP mentioned a large number of text files (thousands to millions at a time, up to a couple of MB each). I think that parallelization is better broken down at the file level. Basically, create a list of input files and chunk the list instead. Since the list may range from thousands to millions, go with chunk_size 1 or 2. Notice that workers are spawned early, before creating a large array. Create the array and pass the array reference to MCE to not make an extra copy. This is how to tackle a big job, keeping overhead low. And then, fasten your seat belt and enjoy parallelization in top or htop.
Let's find out the IPC overhead. I wonder myself.
It is mind-boggling nonetheless, just a fraction of a second for 50 thousand chunks. Moreover, 2 seconds will not be felt when processing 500 thousand files. Nor, 4 seconds handling 1 million files.
In Section
Seekers of Perl Wisdom
|
|