|Keep It Simple, Stupid
Re^4: Need to speed up many regex substitutions and somehow make them a here-doc listby xnous (Sexton)
|on Oct 02, 2022 at 19:42 UTC
hippo> I am intrigued by some of your s/// operations - perhaps you could confirm that these give your intended outputs?
Yes, you're right , the actual match/subs are non-greedy. I just wanted to provide a simpler and beautified version of my ugly script but the code structure is exactly the same.Corion> Regardless of the performance problems, you may be interested in using a proper stemmer to create a search index. See Lingua::Stem.
I don't need (yet) a full stemming solution, which might not be the ideal tool as I'd have to override numerous substitutions.
hv: Your hash lookup implementation runs twice as fast (34" vs 1'05" for my here-doc regexes). Another difference is it runs faster when operating on lines compared to words. sed seems unbeatable at 6 seconds.
AnomalousMonk> Here's something that may address your needs more closely. As always, the fine details of regex definition are critical. I still have no idea as to relative speed :)
I tested your solution last but unfortunately it took 2'23" to complete. I'll be doing more tests in the following days and report back with any progress. Thank you all for your wisdom.