Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Speed of regex on compiled perl under windows

by chipmunk (Parson)
on Nov 21, 2001 at 01:24 UTC ( [id://126647]=note: print w/replies, xml ) Need Help??


in reply to Speed of regex on compiled perl under windows

The slowness is actually an effect of the regex. The way your regex is constructed, the regex engine potentially has to do a lot of backtracking to try to find a match.

 

Here's a similar example that demonstrates the same problem: qq{"The quick brown fox jumps over the lazy dog\n"} =~ /("(\w+| )*")/ The (\w+| )* part can match the word 'Just' in many ways: ('Just'), or ('Jus', 't'), or ('Ju', 'st'), or ('Ju', 's', 't'), or... Each time the regex engine gets to the newline and fails to match the second quote, it backtracks and tries another way of matching the words. It's the nested quantifiers that get you.

The solution is to restructure the regex so that it can only match a part of the string in a limited number of ways, to eliminate all the useless backtracking. (Very easy in this case, since the regex is so simple.) qq{"The quick brown fox jumps over the lazy dog\n"} =~ /("[\w ]*")/

 

This is what you did when you moved the space inside the character class and removed the nested quantifiers. Here's one way to fix your regex, without changing the semantics: (?:\w[\.\w\-\'\!\(\)\/]* +)*\w[\.\w\-\'\!\(\)\/]* Each iteration of (?:\w[\.\w\-\'\!\(\)\/]* +)* has to match at least one word character, followed by at least one space. There's only one way for this regex to match a string.

 

As perl's regex engine has been improved, various optimizations have been added to avoid this exponential backtracking problem. That's probably why your code ran so much faster on Unix; I expect you were using 5.6.0 or 5.6.1 there. My simple example shows the same behavior, returning immediately in 5.6.1 and taking a loooong time to finish in 5.005_03.

Jeffrey Friedl discusses this technique, which he calls "unrolling the loop", in Mastering Regular Expressions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://126647]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 06:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found