Re: Performance penalty of using qr//

in reply to Performance penalty of using qr//

The problem is actually to do with (potential) captures. Currently, capture information (the information that is used to create the values of $1 et al when accessed) is stored as part of the regex object. This model doesn't work well when the same regex can be used in multiple places. This code:

$r = qr/(.)/;
"a" =~ $r;
print "$1\n";
{ "b" =~ $r; print "$1\n"; }
print "$1\n";
[download]

outputs:

a
b
a
[download]

Thus a single regex object has to be associated with multiple capture sets, which change as scopes are exited.

The current workaround for this is to duplicate the qr// object each time it's executed, which is sub-optimal. Unfortunately fixing this properly is non-trivial.

Dave.

Comment on Re: Performance penalty of using qr// Select or Download Code

Replies are listed 'Best First'.
Re^2: Performance penalty of using qr// by Anonymous Monk on Dec 21, 2017 at 13:22 UTC
Um ... so ... when is the sometime the optimization kicks in? Can it be measured?	[reply]
Re^3: Performance penalty of using qr// by dave_the_m (Monsignor) on Dec 21, 2017 at 13:29 UTC
Um ... so ... when is the sometime the optimization kicks in What optimisation are you referring to? Dave.	[reply]
Re^4: Performance penalty of using qr// by Eily (Monsignor) on Dec 21, 2017 at 14:56 UTC
I'm guessing the optimization of using qr// over a plain string (ie: "Since Perl may compile the pattern at the moment of execution of the qr() operator, using qr() may have speed advantages in some situations ...") From your previous post I would say that this happens when there is a big compilation overhead, so I thought about this, and used this list of words for testing: `use strict; use warnings; use Benchmark qw( cmpthese timethese ); open my $words, "<", "linuxwords.txt" or die "$!"; my @words = <$words>; chomp @words; my @search = @words[0..10]; $" = "\|"; my $re = qr/^(?:@words)$/; my $str = "^(?:@words)\$"; my $r = timethese ( -5, { use_qr => sub { map /$re/, @search }, use_str => sub { map /$str/, @search }, use_re => sub { map /^(?:@words)$/, @search }, } ); cmpthese $r;` [download] `Benchmark: running use_qr, use_re, use_str for at least 5 CPU seconds. +.. use_qr: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 98 +736.51/s (n=515997) use_re: 5 wallclock secs ( 5.33 usr + 0.00 sys = 5.33 CPU) @ 23 +.99/s (n=128) use_str: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 22 +68.47/s (n=11855) Rate use_re use_str use_qr use_re 24.0/s -- -99% -100% use_str 2268/s 9355% -- -98% use_qr 98737/s 411431% 4253% --` [download] The re case is pretty bad because of the systematic interpolation, but I can't help but feel like I might be missing something because of how absurd the difference between qr and str is? But if this is correct, then qr is a clear winner for dictionary search.	[reply] [d/l] [select]
Re^5: Performance penalty of using qr// by dave_the_m (Monsignor) on Dec 21, 2017 at 20:01 UTC
Re^6: Performance penalty of using qr// by vr (Curate) on Dec 22, 2017 at 19:26 UTC

In Section Seekers of Perl Wisdom