Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^5: Need to speed up many regex substitutions and somehow make them a here-doc list

by hv (Prior)
on Oct 02, 2022 at 22:05 UTC ( [id://11147225]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Need to speed up many regex substitutions and somehow make them a here-doc list
in thread Need to speed up many regex substitutions and somehow make them a here-doc list

hv: Your hash lookup implementation runs twice as fast (34" vs 1'05" for my here-doc regexes). Another difference is it runs faster when operating on lines compared to words. sed seems unbeatable at 6 seconds.

Glad it's making some progress, at least. :)

It occurs to me now that since you do not need the /.*/ "to end of line" behaviour, you also do not actually need to split the text on newlines: you could work directly on the full text. That would substantially reduce the number of ops you execute, which should give a further speedup.

The next step beyond that would be to combine the three substitutions into a single one, with a single hash. The idea here would be to concatenate the three regexps from the previous iteration, but wrapping the whole in (?|...) so the three distinct captures each get saved as $1, and make a single "master" lookup combining each of %w1, %w2, %w3. If we can combine "was/were" in there as well, I think we'd be starting to get properly competitive with the sed scripts.

It is also worth considering whether you need Unicode support (I have no idea whether your sed supports it). If you do not need Unicode, you should also be able to get further speed by adding aa to the regexp flags, like my $re1 = qr{\b(@{[ join '|', reverse sort keys %w1 ]})\b}iaa;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11147225]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-04-23 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found