Syntactic Confectionery Delight | |
PerlMonks |
Re^3: Need to speed up many regex substitutions and somehow make them a here-doc listby bliako (Monsignor) |
on Oct 04, 2022 at 10:25 UTC ( [id://11147241]=note: print w/replies, xml ) | Need Help?? |
sed can take take in several substitution regexes at once instead of piping each substitution result to the next: sed 's/ need.* / need /gi' | sed 's/ .*meant.* / mean /gi' can become sed 's/ need.* / need /gi;s/ .*meant.* / mean /gi'. This may speed up IO. For both Perl and bash/sed: their IO can be improved by creating a ramdisk and placing input and output files in there if you intend to process them multiple times. Better if the files are created from other processes then you can create them straight into the ramdisk, process them and then transfer them to more permanent store. In Linux this is as easy as: mount -o size=2g -t tmpfs /mnt/ramdisk1 If you have all files living already in just one physical harddisk then parallelising their processing (which implies parallelising the IO) will show little improvement or, most likely, degradation. However you can see some improvement by implementing a pipeline: in one process files are copied into the ramdisk sequentially, the other processes are, in parallel, processing any files found in there. I assume that memory IO can be parallelised better than harddisk IO (but I am a lot behind in what modern OS and CPU can do, or it could be that MCE can work some magik with IO, so use some salt with this advice). Also in a recent discussion here, the issue came up that a threaded perl interpreter can be 10-20-30% slower than a non-threaded one. So, if you do not need threads that's a possible way to speed things up (check your perl's interpreter compilation flags with: perl -V and look for useithreads=define) This is an interesting problem to optimise because even small optimisations can lead to huge savings over your 1,000's to 1,000,000's files. So, I would start by benchmarking a few options with like 20 files: sed, sed+ramdisk, perl+ramdisk, pipeline, etc. Then you will be more confident in where to place your programming efforts or whether you can invest in learning new skills like MCE. bw, bliako
In Section
Seekers of Perl Wisdom
|
|