Good question, I suspect that it is an optimization. Trying
my $str = "bea" x 100;
my $re = qr/(?:be|ea|a)/;
sub atomic {
use re 'debug';
say 'Matched $re+\d' if $str =~ m/$re+\d/;
}
sub possessive {
use re 'debug';
say 'Matched $re++\d' if $str =~ m/$re++\d/;
}
atomic();
possessive();
The regexes compile to
# atomic
Compiling REx "(?-xism:(?:be|ea|a))+\d"
Final program:
1: CURLYX[0] {1,32767} (14)
3: TRIE-EXACT[abe] (13)
<be>
<ea>
<a>
13: WHILEM[1/1] (0)
14: NOTHING (15)
15: DIGIT (16)
16: END (0)
minlen 2
#possessive
Compiling REx "(?-xism:(?:be|ea|a))++\d"
Final program:
1: SUSPEND (19)
3: CURLYX[0] {1,32767} (16)
5: TRIE-EXACT[abe] (15)
<be>
<ea>
<a>
15: WHILEM[1/1] (0)
16: NOTHING (17)
17: SUCCEED (0)
18: TAIL (19)
19: DIGIT (20)
20: END (0)
minlen 2
# regex.ato and regex.pos contain the output including
# the Compiling REx message
michael@lnx-main:working> wc regex.ato regex.pos
10650 70471 1837158 regex.ato
229368 1989603 47988305 regex.pos
As you can see it is doing a lot more. I don't know why but it will be interesting to find out;)
Update 1: It may be something to do with caching. Looking in the re debug output I find
michael@lnx-main:working> grep Detected regex.ato
whilem: Detected a super-linear match, switching on caching...
michael@lnx-main:working> grep Detected regex.pos
whilem: Detected a super-linear match, switching on caching...
# Now look for this line
# whilem: (cache) already tried at this position...
michael@lnx-main:working> grep '(cache)' regex.ato | wc
298 2086 26426
michael@lnx-main:working> grep '(cache)' regex.pos | wc
0 0 0
At this point I am prepared to confess that I will struggling a bit to dig in deeper;-)
Update 2: The SUSPEND/TAIL pair look like where any interested monk should start. In perldebguts I found this
# Do nothing
NOTHING no Match empty string.
# A variant of above which delimits a group, thus stops optimizations
TAIL no Match empty string. Can jump here from outside.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.