comment on

I know it looks really trivial once you see it, but I'm really astonished by your approach of using 1+index(...) - it had not occurred to me to use index that way in an expression to check for presence. I'll add that to my set of idiosyncratic phrases, just like if( system(...) == 0 ) { for successful execution of subprocesses.

Update: I wondered about how much the capturing parentheses cost, and it seems they account for roughly ~~a third~~ half of the performance attainable when using the regex engine. Maybe the two additional steps executed in the regex engine (OPEN1 and CLOSE1) are to blame for that, as they effectively double the number of steps the regex engine has to execute for a successful match.

Not invoking the regex engine still is much faster, even though I had thought there once was an optimization that turned constant regular expressions without anchors or quantifiers into an index lookup...

# a:  if( $s =~ m[(lazy)] ){ $found=$1 }
Compiling REx "(lazy)"
Final program:
   1: OPEN1 (3)
   3:   EXACT <lazy> (5)
   5: CLOSE1 (7)
   7: END (0)
anchored "lazy" at 0 (checking anchored) minlen 4
Matching REx "(lazy)" against "the quick brown fox jumps over the lazy
+ dog"
Intuit: trying to determine minimum start position...
  Found anchored substr "lazy" at offset 35...
  (multiline anchor test skipped)
  try at offset...
Intuit: Successfully guessed: match at offset 35
  35 < the > <lazy dog>      |  1:OPEN1(3)
  35 < the > <lazy dog>      |  3:EXACT <lazy>(5)
  39 <the lazy> < dog>       |  5:CLOSE1(7)
  39 <the lazy> < dog>       |  7:END(0)
Match successful!
Freeing REx: "(lazy)"
# b:  $found = 'lazy' if 1+index( $s, 'lazy' );
# c:  if( $s =~ m[lazy] ){ $found=$& }
Compiling REx "lazy"
Final program:
   1: EXACT <lazy> (3)
   3: END (0)
anchored "lazy" at 0 (checking anchored isall) minlen 4
Matching REx "lazy" against "the quick brown fox jumps over the lazy d
+og"
Intuit: trying to determine minimum start position...
  Found anchored substr "lazy" at offset 35...
  (multiline anchor test skipped)
  try at offset...
Intuit: Successfully guessed: match at offset 35
Freeing REx: "lazy"
       Rate    a    c    b
a 2038631/s   -- -50% -75%
c 4089154/s 101%   -- -49%
b 8013601/s 293%  96%   --
[download]

The program I used:

use strict;
use Benchmark 'cmpthese';
use vars '$s';
$s='the quick brown fox jumps over the lazy dog'; 
my $found;

my %benchmarks = (
    a => q[ if( $s =~ m[(lazy)] ){ $found=$1 } ],
    b => q[ $found = 'lazy' if 1+index( $s, 'lazy' ); ],
    c => q[ if( $s =~ m[lazy] ){ $found=$& } ],
);

{
    use re 'debug';
    for (sort keys %benchmarks) {
        print "# $_: $benchmarks{$_}\n";
        undef $found;
        my $code = eval qq{sub { $benchmarks{$_} } }
            or die "Couldn't compile benchmark $_: $@";
        $code->();
        $found eq 'lazy'
            or die "Unexpected results: [$found] vs. 'lazy'";
    };
};

cmpthese( -1, \%benchmarks);
[download]

In reply to Re^3: Get a known substring from a string by Corion
in thread Get a known substring from a string by jake7176

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks