Hmm, isn't
substr usually faster than a regex? If so, how about the following approach:
- Use rindex to find the indeces of the last and second-to-last spaces, as the OP requires.
- Find the difference between those indeces to get the length of the desired string, and use that value (along with the index of the second-to-last space) in a <substr> call to get the required data
Well, I'm sure that it would work, but would it be faster? I'll probably benchmark this myself sometime when I have the time to create data and code to test.
Anyway, that's my (Not-So-)Good Idea for the day.
Update I just ran some benchmarks on a few of the methods suggested. Here's my code and results:
my $str = '<!-- USER 20 - donkey_pusher_6 -->';
my $data;
my $re = qr/--\s*USER\s+\d+\s*-\s*(\w+)/;
my ($start, $end);
sub by_re_noback {
($data) = ($str =~ / ^ (?>\s*) <!-- (?>\s+) USER (?>\s+) (?>\d+) (?>
+\s+) - (?>\s+) (\S+?) (?>\s+) --> (?>\s*) $ /ix);
}
sub by_re {
($data) = ($line =~ m/<!-- USER \d+ - (\S+)/i);
}
sub by_re_comp {
($data) = ($str =~ $re);
}
sub by_substr {
$end = rindex($str, ' ');
$start = rindex($str, ' ', $end - 1);
$data = substr($str, $start + 1, $end - $start);
}
timethese (100000, {
subst => \&by_substr,
re_comp => \&by_re_comp,
re => \&by_re,
re_noback => \&by_re_noback,
});
--results--
Benchmark: timing 100000 iterations of re, re_comp, re_noback, subst..
+.
re: 1 wallclock secs ( 0.46 usr + 0.00 sys = 0.46 CPU) @ 21
+7391.30/s (n=100000)
re_comp: 4 wallclock secs ( 4.35 usr + 0.00 sys = 4.35 CPU) @ 22
+988.51/s (n=100000)
re_noback: 6 wallclock secs ( 6.27 usr + 0.00 sys = 6.27 CPU) @ 15
+948.96/s (n=100000)
subst: 1 wallclock secs ( 1.40 usr + 0.00 sys = 1.40 CPU) @ 71
+428.57/s (n=100000)
--
There are 10 kinds of people -- those that understand binary, and those that don't.