Re: Inefficient regex

This:
(.*?)\s*(.*?)\s*
can be problematic. Since LABEL and PARAMETER must not contain spaces, try replacing that portion with something like:
([^\s]+)\s+([^\s]+)?

I'd also suggest that you change *? to simply *, since * denotes 0 or more occurances.

Probably the biggest problem is the:
(.+?)
which occurs just before:
<%end_\1%>
and matches to the end of the string, which then causes the regex engine to backtrack each time it cannot find each "end" tag.

You may also want to take a look at the book "Mastering Regular Expressions" by Jeffrey E. Friedl. See Mastering Regular Expressions for a review.

Comment on Re: Inefficient regex Select or Download Code

Replies are listed 'Best First'.
Re^2: Inefficient regex (death to dot star) by tye (Sage) on Mar 05, 2003 at 19:45 UTC
Exactly. In more detail, what the original regex ends up doing goes like this: <%start_(.?)\s(.?)\s%>\n?(.+?)<%end_\1%>\n? $1- -A- $2- -B- $3- C- Quickly matches <%start_ Tries $1 matching "" Tries A matching "" Tries $2 matching "wibble XXX" Tries $3 matches to just before "<%end_flub%>" Is forced to backtrack by C Tries $3 matches to just before "<%end_wibble%>" Is forced to backtrack by C Tries $1 matching "w" Tries A matching "" Tries $2 matching "ibble XXX" Tries $3 matches to just before "<%end_flub%>" Is forced to backtrack by C Tries $3 matches to just before "<%end_wibble%>" Is forced to backtrack by C Tries $1 matching "wi" Tries A matching "" Tries $2 matching "bble XXX" Tries $3 matches to just before "<%end_flub%>" Is forced to backtrack by C Tries $3 matches to just before "<%end_wibble%>" Is forced to backtrack by C ... Tries $1 matching "wibble" Tries A matching " " Tries $2 matching "XXX" Tries $3 matching to just before "<%end_flub%>" Is forced to backtrack by C Tries $3 matches to just before "<%end_wibble%>" Succeeds finding one match [download] which you can see wastes a lot of time. That is one reason why Death to dot star! suggests you use character class instead of . whenever possible in a regex. - tye	[reply] [d/l]
Re: Re^2: Inefficient regex (death to dot star) by Wibble (Beadle) on Mar 05, 2003 at 21:07 UTC
This is great and very clear - thank you!! I've taken on board what you said, experimented by making LABEL long then short to prove the point and sure enough execution times vary wildy. I've altered the regex to now use (\S+) to pick up the LABEL and it now completes in sub-second time. Thanks again!!!	[reply]
Re: Re: Inefficient regex by Thelonius (Priest) on Mar 05, 2003 at 19:40 UTC
Note:`\S+` is the same as `[^\s+]`	[reply] [d/l] [select]
Re: Re: Re: Inefficient regex by sutch (Curate) on Mar 05, 2003 at 20:14 UTC
Thelonius, your note is not correct: \s+ - matches one or more space characters [^\s+] - matches one character that is not a space nor a '+' [^\s]+ - (what you may have meant) matches one or more non-space characters.	[reply]


Perl Monk, Perl Meditation
	PerlMonks