Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Why this code run faster?

by Gangabass (Vicar)
on Nov 08, 2007 at 16:38 UTC ( [id://649749]=perlquestion: print w/replies, xml ) Need Help??

Gangabass has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks i need your help again.

I have some code (not my) that use g and o matching modifiers. I think there is no need in them but this code show strange result. It ran faster than without this modifiers.

Can you explain why?

Here is code:

#!/usr/bin/perl -W use warnings; use strict; use Benchmark; my $phrase1 = 'network'; my $phrase2 = 'networK'; my ($t1, $t2); ########################################## $t1 = new Benchmark; for (1..10000000) { if ($phrase1 =~ /^network$/go) {} } $t2 = new Benchmark; print timestr (timediff ($t2, $t1)), "\n"; ########################################## $t1 = new Benchmark; for (1..10000000) { if ($phrase2 =~ /^networK$/) {} } $t2 = new Benchmark; print timestr (timediff ($t2, $t1)), "\n";

And here is result:

7 wallclock secs ( 6.40 usr + 0.01 sys = 6.41 CPU) 9 wallclock secs ( 9.06 usr + 0.00 sys = 9.06 CPU)

Replies are listed 'Best First'.
Re: Why this code run faster?
by shmem (Chancellor) on Nov 08, 2007 at 17:57 UTC
    #!/usr/bin/perl # use Benchmark qw( cmpthese ); cmpthese( -2, { g => sub { 'network' =~ /^network$/g }, c => sub { 'network' =~ /^network$/gc }, s => sub { 'networK' =~ /^networK$/ }, } ); __END__ Rate s g c s 2106781/s -- -28% -68% g 2931188/s 39% -- -56% c 6602248/s 213% 125% --

    The /c modifier really boosts! Why? It doesn't reset the search position on a failed match while /g is in effect (see perlop). So it tests once from the beginning, and at each further invocation of the same match, it tests beginning at the end, failing quickly. With a single /g, every other match fails.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Why this code run faster?
by kyle (Abbot) on Nov 08, 2007 at 17:19 UTC

    The /o modifier isn't doing anything. Try running with and without it; when I did, there wasn't any difference. That modifier only affects patterns that have a variable interpolated in them, and your patterns don't.

    The /g modifier seems to be the one that's making the difference, but I don't see why. I'd actually expect it to work the opposite way from what it does.

    You might want to see also No More Meaningless Benchmarks! The operations involved are ridiculously fast, so I'm not sure how useful (or accurate) it is to compare them. Consider:

    use Benchmark qw( cmpthese ); cmpthese( 10_000_000, { lower => sub { 'network' =~ /^network$/ }, upper => sub { 'networK' =~ /^networK$/ } } ); __END__ Rate lower upper lower 3174603/s -- -9% upper 3496503/s 10% --

    Matching uppercase is faster than lowercase? Seriously?

    I tried comparing literally the same subs, and there was still a 1% difference.

    Adding an explicit scalar context brought the difference down a bit, but I'm not sure because the results aren't very consistent. In fact, I'd call them downright erratic.

    All that being said, if someone can explain why a /g would make the pattern faster, I'd be very interested to hear. As it stands, I think there isn't a meaningful or consistent difference.

      /g in scalar context would make every second match fail.
      sub f { 'network' =~ /^network$/g } print f()?1:0, "\n" for 1..6;
      1 0 1 0 1 0

      The failing match should be faster than the succeeding match.

        Good eyes. When I change the test from if ($phrase2 =~ /^network$/g) {} to while ($phrase2 =~ /^network$/g) {}, the time went from 2.70s to 5.97s. This is in line with what I'd expect -- if 5E6 successful matches and 5E6 failed matches take 2.7s on my CPU, the while loop, consisting of 1E7 successes and 1E7 failures, should take about twice as long.

        Thank you guys! You are really help me again.

        And of course special thanks to ikegami.

      When I tested with and without /g, the regex with /g would complete its run in 2.75 +/- 0.1 seconds, and without /g completed in 3.70 +/- 0.1 seconds on the CPU. This result was consistent across runs, and did not depend on the case of the regex.
Re: Why this code run faster?
by gamache (Friar) on Nov 08, 2007 at 16:42 UTC
    First guess, thankfully shot down by chromatic: The o modifier tells Perl to only compile the regex once, so I believe the two seconds you saved were Perl not compiling the same regex ten million times. Recompiling a constant didn't make sense to me so I am glad I was wrong here.

    Second: Retesting, it seems that it's /g that speeds things up, not /o. Perhaps someone more familiar with Perl internals can shed some light.

    Finally: Ikegami has it.

      The regex is constant. Why would Perl have to compile it more than once?

      You are incorrect. /o is only operative when there is interpolation in the pattern. Since there is none - the pattern is compiled only once and at that is at the normal perl compile-time. That is, right along at the same time as the surrounding code. Compilation of interpolated patterns is deferred to runtime. If /o were present then the first interpolated pattern would be baked in.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      /o tells the compiler that the variables mentioned in the regex won't change while the program is running, so it only needs to be compiled once. If your regex contains no variables, then /o doesn't do anything.


      We're not surrounded, we're in a target-rich environment!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://649749]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-25 09:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found