Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^6: Common Perl Pitfalls

by Jenda (Abbot)
on Apr 11, 2012 at 16:23 UTC ( [id://964572]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Common Perl Pitfalls
in thread Common Perl Pitfalls

The qr// is more readable in that it clearly signifies to the reader that the thing inside is assumed to be used as a regexp later on. The other thing that makes this more readable in some cases is that the stuff inside the qr// is treated as a regexp, not as a single quoted string with regards to escaping special characters. There is a huge difference between $part = qr/\\d/; and $part = q/\\d/; ! The first will eventually match a backslash followed by letter d, the second will match a digit. Are you sure you will remember to quadruple your backslashes if you want to match a literal backslash?

And regarding the speed of constructed pattern ... maybe you stopped one qr// too soon. Instead of building the ultimate pattern at the point it was used, you should have built it just once at the same place you've defined the parts and then used just if ($var =~ $built_regexp) or $var =~ s/$built_regexp/replacement/;. Maybe. I haven't seen your code.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^7: Common Perl Pitfalls
by JavaFan (Canon) on Apr 11, 2012 at 17:23 UTC
    Are you sure you will remember to quadruple your backslashes if you want to match a literal backslash?
    Sure. But how's that relevant? If I were to write a subpattern that matches a backslash, I may write that as qr/\\/ -- but that doesn't mean that's enough reason to always use qr, even if it's intended to match something different from a backslash.
    There is a huge difference between $part = qr/\\d/; and $part = q/\\d/; !
    I know. Often, both are wrong.
    $pat1 = '[0-9]'; $pat2 = qr/[0-9]/;
    is what's usually intended.
    And regarding the speed of constructed pattern ... maybe you stopped one qr// too soon. Instead of building the ultimate pattern at the point it was used, you should have built it just once at the same place you've defined the parts and then used just if ($var =~ $built_regexp) or $var =~ s/$built_regexp/replacement/;
    I've no clue what you're trying to say.
    I haven't seen your code.
    Indeed.

      Don't get me started on \d! Whoever decided to include "something that might be understood as a digit in a language/charset I've never ever heard of" in \d made a huge huge mistake. Out of ten thousands of \d, there's maybe one where this nonsense is what was meant. I do believe even now it's not too late to fix this insanity. The change would fix many times more scripts/modules than it would break.

      And what I meant regarding the speed is the difference between

      my $foo = qr/.../; my $bar = qr/..../; ... while (<>) { ... if (/$foo(?:$bar)+/) { ...
      and
      my $foo = qr/.../; my $bar = qr/..../; my $foobar = qr/$foo(?:$bar)+/; ... while (<>) { ... if (/$foobar/) { ...
      In the later case the stringification and the compilation of a longer regexp happens just once.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        Oh, sure. And I don't give a damn about the difference in compilation speed of trivial small regexes.

        But when the regexes get large, and difference of compiling the patterns is a few seconds vs a few minutes, I do care.

        But still, even in your simple example, it's three compilations + two stringifications vs a single compile.

        Here's a benchmark, 1 compilation vs 12 compilations and 20 stringifications:

        use Benchmark 'cmpthese'; cmpthese -1, { qq => 'my $p = qq{a}; $p = qq{$p$p} for 1 .. 10; qr/$p/', qr => 'my $p = qr{a}; $p = qr{$p$p} for 1 .. 10; qr/$p/', }; __END__ Rate qr qq qr 914/s -- -100% qq 283880/s 30949% --
        That's with 5.15.9 (on OSX). With 5.12.3 (same box), I get:
        Rate qr qq qr 857/s -- -100% qq 324588/s 37769% --
        And, for kicks, with 5.8.9 (again, same box):
        Rate qr qq qr 508/s -- -100% qq 301810/s 59290% --
        The resulting patterns, while identical, also differ significantly in size: the one build with repeated qr constructs is 19 times the size of the one build with qq.

        I'm usually not a stickler for speed. But I make an exception when it comes to qr.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://964572]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-23 16:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found