Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Style question: regex versus string builtin function

by lima1 (Curate)
on Oct 02, 2007 at 11:57 UTC ( [id://642082]=note: print w/replies, xml ) Need Help??


in reply to Style question: regex versus string builtin function

I use index when I need the match position, otherwise a regex. And it seems that index is NOT faster. Even code like
my $pos; if ( $line =~ $regex ) { $pos = length $`; }
which gets the match position with a regex is slightly faster (but much uglier of course):

Update: For better ways of getting the match position, see How do I retrieve the position of the first occurrence of a match?.

Benchmark code:

#!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all) ; my $count = 5000; my $filename = 'TEST.dat'; my $DELIMITER = 'GGAGAGGG'; #my $DELIMITER = 'TTTTCATGAAGAAGATGAGAGACAAGATGAGAAAATAGTATCAGAGA'; my $regex = qr{\Q$DELIMITER}o; cmpthese($count, { 'index' => sub { open my $FH, '<', $filename; my $i; while (my $line = <$FH>) { my $pos = index $line, $DELIMITER; if ( $pos >= 0 ) { $i++; } } close $FH; }, 'regex_compiled_pos' => sub { open my $FH, '<', $filename; my $i; while (my $line = <$FH>) { my $pos; if ( $line =~ $regex ) { $i++; $pos = length $`; } } close $FH; }, 'regex_compiled' => sub { open my $FH, '<', $filename; my $i; while (my $line = <$FH>) { if ( $line =~ $regex ) { $i++; } } close $FH; }, 'regex_pos' => sub { open my $FH, '<', $filename; my $i; while (my $line = <$FH>) { my $pos; if ( $line =~ /\Q$DELIMITER/ ) { $i++; $pos = length $`; } } close $FH; }, 'regex' => sub { open my $FH, '<', $filename; my $i; while (my $line = <$FH>) { if ( $line =~ /\Q$DELIMITER/ ) { $i++; } } close $FH; }, });
Benchmark results:
Rate index regex_pos regex regex_compiled_pos rege +x_compiled index 450/s -- -38% -39% -40% + -41% regex_pos 728/s 62% -- -2% -3% + -5% regex 741/s 65% 2% -- -1% + -3% regex_compiled_pos 749/s 66% 3% 1% -- + -2% regex_compiled 763/s 70% 5% 3% 2% + --

Replies are listed 'Best First'.
Re^2: Style question: regex versus string builtin function
by oha (Friar) on Oct 02, 2007 at 12:27 UTC
    there are some issues about using $`, check perlre.
    what do you want is m// then pos, this will be faster.

    Oha

    update: check the tye's note below

      Make that m//g (note the 'g') in a scalar context and then pos.

      - tye        

      Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se:

      Update: Thank you all for your comments and suggestions (here and in the CB)! See How do I get what is to the left of my match? for an updated benchmark and better explanations.

        You seem to have missed the point. The problem with $`, $&, $' is not that it slows down a regex, it's that it slows down *all* regexs that have no captures, say like the one in matchcontext. Your Benchmark is useless.

        Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se
        From the Devel::SawAmpersand docs:

        There's a global variable in the perl source, called PL_sawampersand. It gets set to true in that moment in which the parser sees one of $`, $', and $&. It never can be set to false again. Trying to set it to false breaks the handling of the $`, $&, and $' completely.

        If the global variable PL_sawampersand is set to true, all subsequent RE operations will be accompanied by massive in-memory copying, because there is nobody in the perl source who could predict, when the (necessary) copy for the ampersand family will be needed. So all subsequent REs are considerable slower than necessary.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://642082]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-26 00:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found