http://qs321.pair.com?node_id=1205977


in reply to Performance penalty of using qr//

The problem is actually to do with (potential) captures. Currently, capture information (the information that is used to create the values of $1 et al when accessed) is stored as part of the regex object. This model doesn't work well when the same regex can be used in multiple places. This code:
$r = qr/(.)/; "a" =~ $r; print "$1\n"; { "b" =~ $r; print "$1\n"; } print "$1\n";
outputs:
a b a
Thus a single regex object has to be associated with multiple capture sets, which change as scopes are exited.

The current workaround for this is to duplicate the qr// object each time it's executed, which is sub-optimal. Unfortunately fixing this properly is non-trivial.

Dave.

Replies are listed 'Best First'.
Re^2: Performance penalty of using qr//
by Anonymous Monk on Dec 21, 2017 at 13:22 UTC
    Um ... so ... when is the sometime the optimization kicks in? Can it be measured?
      Um ... so ... when is the sometime the optimization kicks in
      What optimisation are you referring to?

      Dave.

        I'm guessing the optimization of using qr// over a plain string (ie: "Since Perl may compile the pattern at the moment of execution of the qr() operator, using qr() may have speed advantages in some situations ...")

        From your previous post I would say that this happens when there is a big compilation overhead, so I thought about this, and used this list of words for testing:

        use strict; use warnings; use Benchmark qw( cmpthese timethese ); open my $words, "<", "linuxwords.txt" or die "$!"; my @words = <$words>; chomp @words; my @search = @words[0..10]; $" = "|"; my $re = qr/^(?:@words)$/; my $str = "^(?:@words)\$"; my $r = timethese ( -5, { use_qr => sub { map /$re/, @search }, use_str => sub { map /$str/, @search }, use_re => sub { map /^(?:@words)$/, @search }, } ); cmpthese $r;
        Benchmark: running use_qr, use_re, use_str for at least 5 CPU seconds. +.. use_qr: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 98 +736.51/s (n=515997) use_re: 5 wallclock secs ( 5.33 usr + 0.00 sys = 5.33 CPU) @ 23 +.99/s (n=128) use_str: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 22 +68.47/s (n=11855) Rate use_re use_str use_qr use_re 24.0/s -- -99% -100% use_str 2268/s 9355% -- -98% use_qr 98737/s 411431% 4253% --
        The re case is pretty bad because of the systematic interpolation, but I can't help but feel like I might be missing something because of how absurd the difference between qr and str is? But if this is correct, then qr is a clear winner for dictionary search.