Performance of possessive quantifiers

moritz has asked for the wisdom of the Perl Monks concerning the following question:

Somehow I thought that the possessive quantifiers in perl 5.10 could only boost up performance, and never degrade it (neglecting a bit of overhead, perhaps). It seems that I was wrong:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Benchmark qw(cmpthese);

my $str = "bea" x 100;
my $re = qr/(?:be|ea|a)/;
cmpthese(-2, {
    atomic1   => sub { die if $str =~ m/(?>$re+)\d/ },
    atomic2   => sub { die if $str =~ m/(?>$re)+\d/ },
    normal    => sub { die if $str =~ m/$re+\d/ },
    posessive => sub { die if $str =~ m/$re++\d/ },
});
__END__
            Rate posessive   atomic1   atomic2    normal
posessive 93.3/s        --        0%      -96%      -97%
atomic1   93.3/s        0%        --      -96%      -97%
atomic2   2545/s     2628%     2628%        --       -8%
normal    2764/s     2862%     2862%        9%        --
[download]

You can see that the normal quantifier is the fastest, more than 20 times faster than the possessive quantifier.

But why?

I tried to decipher the -Mre=debug output, and it seems that, with a normal quantifier, the regex engine does some backtracking, so it can't be a magic optimization that immediately proves that the match fails.

Or is my benchmark flawed in some way?

Update 1: I talked a bit on IRC with demerphq, and as he wrote the problem seems to be some caching problem.

So far it's not clear if 5.8.8 is caching too eagerly, or 5.10.0 too little.

It's also clear that the pattern isn't very useful, in most cases such a pattern would be anchored, in which case it's blazingly fast ;-)

I also informed our formidable porters.

Comment on Performance of possessive quantifiers Select or Download Code

Replies are listed 'Best First'.
Re: Performance of possessive quantifiers by hipowls (Curate) on Jan 27, 2008 at 19:26 UTC
The possessive form, PAT++ is short hand for (?>PAT+), from perlre `Possessive quantifiers are equivalent to putting the item they are applied to inside of one of these constructs. The following equivalences apply: Quantifier Form Bracketing Form --------------- --------------- PAT+ (?>PAT) PAT++ (?>PAT+) PAT?+ (?>PAT?) PAT{min,max}+ (?>PAT{min,max})` [download] In your case m/$re++\d/ == m/(?>$re+)\d/, just prettier;)	[reply] [d/l]
Re^2: Performance of possessive quantifiers by moritz (Cardinal) on Jan 27, 2008 at 22:02 UTC
Yes, that's right, and it explains why the `posessive` and `atomic1` have the same performance impact. (The parse tree that `perl5.10.0 -cMre=debug -we 'm/(?:be\|ea\|a)++\d/'` and `perl5.10.0 -cMre=debug -we 'm/(?>(?:be\|ea\|a)+)\d/'` are indeed identical). But it doesn't explain the performance compared to the non-possessive non-atomic form.	[reply] [d/l] [select]
Re^3: Performance of possessive quantifiers by hipowls (Curate) on Jan 27, 2008 at 22:55 UTC
Good question, I suspect that it is an optimization. Trying `my $str = "bea" x 100; my $re = qr/(?:be\|ea\|a)/; sub atomic { use re 'debug'; say 'Matched $re+\d' if $str =~ m/$re+\d/; } sub possessive { use re 'debug'; say 'Matched $re++\d' if $str =~ m/$re++\d/; } atomic(); possessive();` [download] The regexes compile to # atomic Compiling REx "(?-xism:(?:be\|ea\|a))+\d" Final program: 1: CURLYX[0] {1,32767} (14) 3: TRIE-EXACT[abe] (13) <be> <ea> <a> 13: WHILEM[1/1] (0) 14: NOTHING (15) 15: DIGIT (16) 16: END (0) minlen 2 #possessive Compiling REx "(?-xism:(?:be\|ea\|a))++\d" Final program: 1: SUSPEND (19) 3: CURLYX[0] {1,32767} (16) 5: TRIE-EXACT[abe] (15) <be> <ea> <a> 15: WHILEM[1/1] (0) 16: NOTHING (17) 17: SUCCEED (0) 18: TAIL (19) 19: DIGIT (20) 20: END (0) minlen 2 # regex.ato and regex.pos contain the output including # the Compiling REx message michael@lnx-main:working> wc regex.ato regex.pos 10650 70471 1837158 regex.ato 229368 1989603 47988305 regex.pos [download] As you can see it is doing a lot more. I don't know why but it will be interesting to find out;) Update 1: It may be something to do with caching. Looking in the re debug output I find `michael@lnx-main:working> grep Detected regex.ato whilem: Detected a super-linear match, switching on caching... michael@lnx-main:working> grep Detected regex.pos whilem: Detected a super-linear match, switching on caching... # Now look for this line # whilem: (cache) already tried at this position... michael@lnx-main:working> grep '(cache)' regex.ato \| wc 298 2086 26426 michael@lnx-main:working> grep '(cache)' regex.pos \| wc 0 0 0` [download] At this point I am prepared to confess that I will struggling a bit to dig in deeper;-) Update 2: The SUSPEND/TAIL pair look like where any interested monk should start. In perldebguts I found this `# Do nothing NOTHING no Match empty string. # A variant of above which delimits a group, thus stops optimizations TAIL no Match empty string. Can jump here from outside.` [download]	[reply] [d/l] [select]
Re: Performance of possessive quantifiers by tsee (Curate) on Jan 28, 2008 at 11:07 UTC
Seemed odd to me, so I ran it on 5.8.8 and the current bleadperl 5.11. The absolute numbers for my computer are, of course, different, but the general observation holds for 5.11. The really disturbing part is that 5.8.8 executes the atomic1/2 cases at about the same speed as "normal". Oh, and on a side note: The difference in execution speed of 5.11's "normal" and 5.8.8's "normal" may be due to 5.11 being a debugging build and all, so it's not necessarily significant. #!/usr/bin/perl use strict; use warnings; use version; use Benchmark qw(cmpthese); my $str = "bea" x 100; my $re = qr/(?:be\|ea\|a)/; cmpthese(-2, { atomic1 => sub { die if $str =~ m/(?>$re+)\d/ }, atomic2 => sub { die if $str =~ m/(?>$re)+\d/ }, normal => sub { die if $str =~ m/$re+\d/ }, ( version->new($]) >= version->new("5.10.0") ? (posessive => sub { die if $str =~ m/$re++\d/ }) : () ), }); __END__ 5.8.8: Rate normal atomic2 atomic1 normal 5000/s -- -4% -18% atomic2 5208/s 4% -- -15% atomic1 6132/s 23% 18% -- 5.11: Rate posessive atomic1 atomic2 normal posessive 59.9/s -- -1% -97% -97% atomic1 60.4/s 1% -- -97% -97% atomic2 1953/s 3159% 3134% -- -9% normal 2156/s 3500% 3472% 10% -- [download] I'm hoping for demerphq to chime in with an explanation. :) Cheers, Steffen	[reply] [d/l]
Re^2: Performance of possessive quantifiers by demerphq (Chancellor) on Jan 28, 2008 at 14:09 UTC
try `m/(?>$re)+(*SKIP)\d/` [download] on 5.11 And it looks like something broke the superlinear cache. Even with atomic matching this pattern/string combination goes quadratic, as nothing anchors the start of the pattern. Without the atomic matching its worse. Im not sure what the deal is with $re++ either. I would have expected that to have similar performance to (?>$re)+, its possible thats actually a bug. Ill have to think about it. Oh, and dont use benchmarks from a debugging perl. Depending on the code path you go through variable amounts of debugging goo, so using a debugging perl, especially a 5.9.x or later version will not give useful results even comparing features. --- $world=~s/war/peace/g	[reply] [d/l]
Re^3: Performance of possessive quantifiers by tsee (Curate) on Jan 28, 2008 at 16:14 UTC
Ah, thanks for that last remark, I'll keep it in mind for future testing. However, I double-checked and while I thought I remembered that -Dusedevel implied -DDEBUGGING, it doesn't seem so. Hence, the 5.11 I used for the above results was no debugging build. Sorry for the confusion! Cheers, Steffen I did "./Configure -Dusedevel -de 2>&1 \| grep -i debug" and "make -j3 \| grep -i debug" without any mention of DEBUGGING.	[reply]
Re^4: Performance of possessive quantifiers by moritz (Cardinal) on Jan 28, 2008 at 16:51 UTC

Back to Seekers of Perl Wisdom