I use index when I need the match position, otherwise a regex. And it seems that index is NOT faster.
Even code like
my $pos;
if ( $line =~ $regex ) {
$pos = length $`;
}
which gets the match position with a regex is slightly faster (but much uglier of course):
Update: For better ways of getting the match position, see How do I retrieve the position of the first occurrence of a match?.
Benchmark code:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all) ;
my $count = 5000;
my $filename = 'TEST.dat';
my $DELIMITER = 'GGAGAGGG';
#my $DELIMITER = 'TTTTCATGAAGAAGATGAGAGACAAGATGAGAAAATAGTATCAGAGA';
my $regex = qr{\Q$DELIMITER}o;
cmpthese($count, {
'index' => sub {
open my $FH, '<', $filename;
my $i;
while (my $line = <$FH>) {
my $pos = index $line, $DELIMITER;
if ( $pos >= 0 ) {
$i++;
}
}
close $FH;
},
'regex_compiled_pos' => sub {
open my $FH, '<', $filename;
my $i;
while (my $line = <$FH>) {
my $pos;
if ( $line =~ $regex ) {
$i++;
$pos = length $`;
}
}
close $FH;
},
'regex_compiled' => sub {
open my $FH, '<', $filename;
my $i;
while (my $line = <$FH>) {
if ( $line =~ $regex ) {
$i++;
}
}
close $FH;
},
'regex_pos' => sub {
open my $FH, '<', $filename;
my $i;
while (my $line = <$FH>) {
my $pos;
if ( $line =~ /\Q$DELIMITER/ ) {
$i++;
$pos = length $`;
}
}
close $FH;
},
'regex' => sub {
open my $FH, '<', $filename;
my $i;
while (my $line = <$FH>) {
if ( $line =~ /\Q$DELIMITER/ ) {
$i++;
}
}
close $FH;
},
});
Benchmark results:
Rate index regex_pos regex regex_compiled_pos rege
+x_compiled
index 450/s -- -38% -39% -40%
+ -41%
regex_pos 728/s 62% -- -2% -3%
+ -5%
regex 741/s 65% 2% -- -1%
+ -3%
regex_compiled_pos 749/s 66% 3% 1% --
+ -2%
regex_compiled 763/s 70% 5% 3% 2%
+ --