I know it looks really trivial once you see it, but I'm really astonished by your approach of using 1+index(...) - it had not occurred to me to use index that way in an expression to check for presence. I'll add that to my set of idiosyncratic phrases, just like if( system(...) == 0 ) { for successful execution of subprocesses.
Update: I wondered about how much the capturing parentheses cost, and it seems they account for roughly a third half of the performance attainable when using the regex engine. Maybe the two additional steps executed in the regex engine (OPEN1 and CLOSE1) are to blame for that, as they effectively double the number of steps the regex engine has to execute for a successful match.
Not invoking the regex engine still is much faster, even though I had thought there once was an optimization that turned constant regular expressions without anchors or quantifiers into an index lookup...
# a: if( $s =~ m[(lazy)] ){ $found=$1 }
Compiling REx "(lazy)"
Final program:
1: OPEN1 (3)
3: EXACT <lazy> (5)
5: CLOSE1 (7)
7: END (0)
anchored "lazy" at 0 (checking anchored) minlen 4
Matching REx "(lazy)" against "the quick brown fox jumps over the lazy
+ dog"
Intuit: trying to determine minimum start position...
Found anchored substr "lazy" at offset 35...
(multiline anchor test skipped)
try at offset...
Intuit: Successfully guessed: match at offset 35
35 < the > <lazy dog> | 1:OPEN1(3)
35 < the > <lazy dog> | 3:EXACT <lazy>(5)
39 <the lazy> < dog> | 5:CLOSE1(7)
39 <the lazy> < dog> | 7:END(0)
Match successful!
Freeing REx: "(lazy)"
# b: $found = 'lazy' if 1+index( $s, 'lazy' );
# c: if( $s =~ m[lazy] ){ $found=$& }
Compiling REx "lazy"
Final program:
1: EXACT <lazy> (3)
3: END (0)
anchored "lazy" at 0 (checking anchored isall) minlen 4
Matching REx "lazy" against "the quick brown fox jumps over the lazy d
+og"
Intuit: trying to determine minimum start position...
Found anchored substr "lazy" at offset 35...
(multiline anchor test skipped)
try at offset...
Intuit: Successfully guessed: match at offset 35
Freeing REx: "lazy"
Rate a c b
a 2038631/s -- -50% -75%
c 4089154/s 101% -- -49%
b 8013601/s 293% 96% --
The program I used:
use strict;
use Benchmark 'cmpthese';
use vars '$s';
$s='the quick brown fox jumps over the lazy dog';
my $found;
my %benchmarks = (
a => q[ if( $s =~ m[(lazy)] ){ $found=$1 } ],
b => q[ $found = 'lazy' if 1+index( $s, 'lazy' ); ],
c => q[ if( $s =~ m[lazy] ){ $found=$& } ],
);
{
use re 'debug';
for (sort keys %benchmarks) {
print "# $_: $benchmarks{$_}\n";
undef $found;
my $code = eval qq{sub { $benchmarks{$_} } }
or die "Couldn't compile benchmark $_: $@";
$code->();
$found eq 'lazy'
or die "Unexpected results: [$found] vs. 'lazy'";
};
};
cmpthese( -1, \%benchmarks);
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.