perlquestion
Athanasius
<p>Hello all, and Merry Christmas!</p>
<p>I recently came across some old code in which I used a longish regular
expression twice within a loop. So I thought, “Aha! here’s
an opportunity for optimisation using <c>qr//</c>.” After all, the documentation
(<b><c>qr/STRING/msixpodualn</c></b> under “Regexp Quote-Like Operators” in [doc://perlop]) says:</p>
<blockquote>
Since Perl may compile the pattern at the moment of execution of the <c>qr()</c>
operator, using <c>qr()</c> may have speed advantages in some situations ...
</blockquote>
<p>But the result was more than disappointing:</p>
<readmore>
<code>
use strict;
use warnings;
use Benchmark qw( cmpthese timethese );
use constant TARGET => 1_389_019_170;
my $r = timethese
(
5,
{
use_re => sub { my $ans1 = use_re();
$ans1 == TARGET or die $ans1; },
use_qr => sub { my $ans2 = use_qr();
$ans2 == TARGET or die $ans2; },
use_str => sub { my $ans3 = use_str();
$ans3 == TARGET or die $ans3; }
}
);
cmpthese $r;
sub use_re
{
for (my $n = 1_010_101_030; $n <= 1_389_026_623; )
{
my $s = $n * $n;
return $n if $s =~ /^1\d2\d3\d4\d5\d6\d7\d8\d900$/;
$n += 40;
$s = $n * $n;
return $n if $s =~ /^1\d2\d3\d4\d5\d6\d7\d8\d900$/;
$n += 60;
}
die;
}
sub use_qr
{
my $re = qr/^1\d2\d3\d4\d5\d6\d7\d8\d900$/;
for (my $n = 1_010_101_030; $n <= 1_389_026_623; )
{
my $s = $n * $n;
return $n if $s =~ $re;
$n += 40;
$s = $n * $n;
return $n if $s =~ $re;
$n += 60;
}
die;
}
sub use_str
{
my $str = '^1\d2\d3\d4\d5\d6\d7\d8\d900$';
for (my $n = 1_010_101_030; $n <= 1_389_026_623; )
{
my $s = $n * $n;
return $n if $s =~ /$str/;
$n += 40;
$s = $n * $n;
return $n if $s =~ /$str/;
$n += 60;
}
die;
}
</code>
<p>Typical output:</p>
<code>
12:50 >perl 1846_SoPW.pl
Benchmark: timing 5 iterations of use_qr, use_re, use_str...
use_qr: 57 wallclock secs (53.19 usr + 0.06 sys = 53.25 CPU) @ 0.09/s (n=5)
use_re: 22 wallclock secs (22.03 usr + 0.00 sys = 22.03 CPU) @ 0.23/s (n=5)
use_str: 26 wallclock secs (25.81 usr + 0.00 sys = 25.81 CPU) @ 0.19/s (n=5)
s/iter use_qr use_str use_re
use_qr 10.7 -- -52% -59%
use_str 5.16 106% -- -15%
use_re 4.41 142% 17% --
12:54 >
</code>
<p>(I obtained similar results across my various 64-bit Strawberry Perl versions: 5.18.2,
5.20.2, 5.22.2, 5.24.0, and 5.26.0.)</p>
</readmore>
<p>I note in the documentation that the string returned by <c>qr//</c>
“magically differs from a string containing the same characters”,
so I’m guessing the additional overhead is due to the “magic” in some way, but I still find the result surprising.
So, my questions:</p>
<ul>
<li>Is this a known issue? Is it documented? (Yes, I looked.)</li>
<li>Can anyone explain why <c>qr//</c> incurs such a significant performance penalty in my example?</li>
<li>Is there an alternative (say, a CPAN module) that can provide the
functionality of <c>qr//</c> without the overhead?</li>
</ul>
<p>Thanks,</p>
<div class="pmsig"><div class="pmsig-968231">
<p>
<table width="100%">
<tr>
<td align="left">
Athanasius <font color="#008000"><</font>[href://http://www.biblegateway.com/passage/?search=John%203:16&version=NLV|<font color="#008000">°</font>]<font color="#008000">(((><</font> <i>contra mundum</i>
</td>
<td align="right">
[href://http://translate.google.com.au/#la/en/Iustus%20alius%20egestas%20vitae%2C%20eros%20Piratica%2C|<b>Iustus alius egestas vitae, eros Piratica,</b>]
</td>
</tr>
</table>
</p>
</div></div>