Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: This regex seems to have splattered non-greedy everywhere

by fizbin (Chaplain)
on Aug 11, 2005 at 13:31 UTC ( [id://482953]=note: print w/replies, xml ) Need Help??


in reply to Re: This regex seems to have splattered non-greedy everywhere
in thread This regex seems to have splattered non-greedy everywhere

b) trying real hard to remove exponential behaviour from regexps when you can.
Yes, well, there's the problem of recognizing this before it bites me in the ass in production. I mean, I guess this particular exponential behavior might be something I'll recognize in the future, but is there some automated tool that can recognize when regular expressions have the potential for exponential behavior? (Or is this one of those problems that get as hard as the decideability problem?)
-- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Replies are listed 'Best First'.
Re^3: This regex seems to have splattered non-greedy everywhere
by hv (Prior) on Aug 11, 2005 at 14:52 UTC

    The general question is: given a regexp fragment like

    ( prefix (meat)* suffix )*
    is is possible for the prefix and suffix to be simultaneously (and repeatedly) empty? If so, then anything that matches /(meat){n}/ will be able to match at least 2^n different ways.

    I'm sure that at the least a subset of exponential regexps should be discernible. I imagine the first stage of such an approach would be to iteratively strip out anything that can be zero-width, in which case it would be rather harder to detect that eg:

    /( (foo)* (?=b) (bar)? )* baz )/x
    is linear. Ask me again when Perl6 has a full grammar implementation - it might be an interesting project to write such a detector targetting a pluggable regexp grammar.

    Note that one generic approach to fixing such regexps is to use the 'cut' operator (?> ...), but as far as I am aware that operator is available only in perl itself and the ever-faithful PCRE. For example:

    /( (?> [^']* ) ('[^']*')? )* /x
    avoids the exponential behaviour by disallowing backtracking into the [^']*.

    Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://482953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-25 11:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found