Re: Regex simplification

Hmm, isn't substr usually faster than a regex? If so, how about the following approach:

Use rindex to find the indeces of the last and second-to-last spaces, as the OP requires.
Find the difference between those indeces to get the length of the desired string, and use that value (along with the index of the second-to-last space) in a <substr> call to get the required data

Well, I'm sure that it would work, but would it be faster? I'll probably benchmark this myself sometime when I have the time to create data and code to test.

Anyway, that's my (Not-So-)Good Idea for the day.

Update I just ran some benchmarks on a few of the methods suggested. Here's my code and results:

my $str = '<!-- USER 20 - donkey_pusher_6 -->';
my $data;
my $re = qr/--\s*USER\s+\d+\s*-\s*(\w+)/;
my ($start, $end);

sub by_re_noback {
  ($data) = ($str =~ / ^ (?>\s*) <!-- (?>\s+) USER (?>\s+) (?>\d+) (?>
+\s+) - (?>\s+) (\S+?) (?>\s+) --> (?>\s*) $ /ix);
}

sub by_re {
  ($data) = ($line =~ m/<!-- USER \d+ - (\S+)/i);
}

sub by_re_comp {
  ($data) = ($str =~ $re);
}

sub by_substr {
  $end = rindex($str, ' ');
  $start = rindex($str, ' ', $end - 1);
  $data = substr($str, $start + 1, $end - $start);
}

timethese (100000, {
          subst => \&by_substr,
          re_comp => \&by_re_comp,
          re => \&by_re, 
          re_noback => \&by_re_noback,

});
--results--
Benchmark: timing 100000 iterations of re, re_comp, re_noback, subst..
+.
        re:  1 wallclock secs ( 0.46 usr +  0.00 sys =  0.46 CPU) @ 21
+7391.30/s (n=100000)
   re_comp:  4 wallclock secs ( 4.35 usr +  0.00 sys =  4.35 CPU) @ 22
+988.51/s (n=100000)
 re_noback:  6 wallclock secs ( 6.27 usr +  0.00 sys =  6.27 CPU) @ 15
+948.96/s (n=100000)
     subst:  1 wallclock secs ( 1.40 usr +  0.00 sys =  1.40 CPU) @ 71
+428.57/s (n=100000)
[download]

There are 10 kinds of people -- those that understand binary, and those that don't.

Comment on Re: Regex simplification Select or Download Code


XP is just a number
	PerlMonks