Re: Style question: regex versus string builtin function

I use index when I need the match position, otherwise a regex. And it seems that index is NOT faster. Even code like
my $pos; if ( $line =~ $regex ) { $pos = length $`; }
[download]
which gets the match position with a regex is slightly faster (but much uglier of course):

Update: For better ways of getting the match position, see How do I retrieve the position of the first occurrence of a match?.

Benchmark code:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all) ;

my $count = 5000;

my $filename = 'TEST.dat';
my $DELIMITER = 'GGAGAGGG';
#my $DELIMITER = 'TTTTCATGAAGAAGATGAGAGACAAGATGAGAAAATAGTATCAGAGA';
my $regex = qr{\Q$DELIMITER}o;

 cmpthese($count, {
               'index' => sub { 
                   open my $FH, '<', $filename; 
                   my $i;
                   while (my $line = <$FH>) {
                       my $pos = index $line, $DELIMITER; 
                       if ( $pos >= 0 ) {
                           $i++;
                       }    
                   } 
                   close $FH;
               },
               'regex_compiled_pos' => sub {
                   open my $FH, '<', $filename;
                   my $i;
                   while (my $line = <$FH>) {
                       my $pos;
                       if ( $line =~ $regex ) {
                           $i++;
                           $pos = length $`;
                       }    
                   }  
                   close $FH;
               },
               'regex_compiled' => sub {
                   open my $FH, '<', $filename;
                   my $i;
                   while (my $line = <$FH>) {
                       if ( $line =~ $regex ) {
                           $i++; 
                       }    
                   }       
                   close $FH;
               },  
               'regex_pos' => sub {
           open my $FH, '<', $filename;
                   my $i;
                   while (my $line = <$FH>) {
                       my $pos;
                       if ( $line =~ /\Q$DELIMITER/ ) {
                           $i++;
                           $pos = length $`;
                       }
                   }
                   close $FH;
               },
               'regex' => sub {
                   open my $FH, '<', $filename;
                   my $i;
                   while (my $line = <$FH>) {
                       if ( $line =~ /\Q$DELIMITER/ ) {
                           $i++;
                       }
                   }
                   close $FH;
               },
           });
[download]

Benchmark results:

                    Rate index regex_pos regex regex_compiled_pos rege
+x_compiled
index              450/s    --      -38%  -39%               -40%     
+      -41%
regex_pos          728/s   62%        --   -2%                -3%     
+       -5%
regex              741/s   65%        2%    --                -1%     
+       -3%
regex_compiled_pos 749/s   66%        3%    1%                 --     
+       -2%
regex_compiled     763/s   70%        5%    3%                 2%     
+        --
[download]

Comment on Re: Style question: regex versus string builtin function Select or Download Code

Replies are listed 'Best First'.
Re^2: Style question: regex versus string builtin function by oha (Friar) on Oct 02, 2007 at 12:27 UTC
there are some issues about using $`, check perlre. what do you want is m// then pos, this will be faster. Oha update: check the tye's note below	[reply]
Re^3: Style question: regex versus string builtin function (pos) by tye (Sage) on Oct 02, 2007 at 13:53 UTC
Make that `m//g` (note the 'g') in a scalar context and then pos. - tye	[reply]
Re^3: Style question: regex versus string builtin function by lima1 (Curate) on Oct 02, 2007 at 13:17 UTC
Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se: Update: Thank you all for your comments and suggestions (here and in the CB)! See How do I get what is to the left of my match? for an updated benchmark and better explanations. Read more... (5 kB)	[reply] [d/l] [select]
Re^4: Style question: regex versus string builtin function by ikegami (Patriarch) on Oct 02, 2007 at 14:15 UTC
You seem to have missed the point. The problem with $`, `$&`, `$'` is not that it slows down a regex, it's that it slows down all regexs that have no captures, say like the one in `matchcontext`. Your Benchmark is useless.	[reply] [d/l] [select]
Re^5: Style question: regex versus string builtin function by lima1 (Curate) on Oct 03, 2007 at 09:17 UTC
Re^4: Style question: regex versus string builtin function by eyepopslikeamosquito (Archbishop) on Oct 02, 2007 at 13:38 UTC
Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se From the Devel::SawAmpersand docs: There's a global variable in the perl source, called PL_sawampersand. It gets set to true in that moment in which the parser sees one of $`, $', and $&. It never can be set to false again. Trying to set it to false breaks the handling of the $`, $&, and $' completely. If the global variable PL_sawampersand is set to true, all subsequent RE operations will be accompanied by massive in-memory copying, because there is nobody in the perl source who could predict, when the (necessary) copy for the ampersand family will be needed. So all subsequent REs are considerable slower than necessary.	[reply]


XP is just a number
	PerlMonks