Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Naughty match variables in CPAN?

by tall_man (Parson)
on Jul 22, 2003 at 00:15 UTC ( [id://276549]=perlquestion: print w/replies, xml ) Need Help??

tall_man has asked for the wisdom of the Perl Monks concerning the following question:

A co-worker recently added some modules to a large perl program that used $&, $' and $` (a.k.a. the "naughty match variables"). I know these add a large performance penalty for all regular expressions in the program, so I removed all the uses. Then I tried all three of the methods in Mastering Regular Expressions, second edition, p. 358, "How to Check Whether Your Code is Tainted by $&".

Only the last method on that page works for perl 5.8.0. The "-Mre=debug" does not show either 'Enabling $`, $&, $' support' or 'Omitting $`, $&, $' support' any more.

The Devel::SawAmpersand doesn't work either. It gives false positives on trivial programs that don't have the "naughty match variables".

Here is the subroutine that actually worked:

use strict; use Time::HiRes; sub CheckNaughtiness () { my $text = 'x' x 10_000; my $start = Time::HiRes::time(); for (my $i = 0; $i < 5_000; $i++) { } my $overhead = Time::HiRes::time() - $start; $start = Time::HiRes::time(); for (my $i = 0; $i < 5_000; $i++) { $text =~ m/^/} my $delta = Time::HiRes::time() - $start; printf "It seems your code is %s (overhead=%.2f, delta=%.2f)\n", ($delta > $overhead*5) ? "naughty" : "clean", $overhead, $delta; }
To my great surprise, when I traced it out I found two CPAN modules (so far) that we use are also tainted in this way: Printer and Math::MatrixReal. I have sent mail to the maintainers of these modules pointing out the issue.

It makes me wonder how many other CPAN modules are tainted with "naughty match variables". Another way to get tainted is to do:

# Don't do this: use English; # Do this instead: use English qw( -no_match_vars );
Has anyone else noticed this problem? Should there be a general check for "naughty match variables" for code submitted to CPAN?

Replies are listed 'Best First'.
Re: Naughty match variables in CPAN?
by Dog and Pony (Priest) on Jul 22, 2003 at 02:30 UTC
    Curious question: Exactly how big impact does those variables actually have? I never use them, since it's been hammered into me that I shouldn't because of performance issues. This makes sense, and one rarely needs them anyways. But I'm just a bit curious on what size performance hit are we talking about here? Microseconds, seconds, minutes?

    Also, perlre says: once you've used them once, use them at will, because you've already paid the price. If I read that right, it means that the performance hit is only triggered once, the first time one uses them.

    I could make a point here about coding for simplicity instead of (unnecessary) performance, but mainly, I am just curious. Is this advice something that is given as a knee-jerk response, and because we all want to have great performance, or is the impact really so large that it matters in usual cases?

    All that aside, I do agree that it is a bad idea to use them in any module that might be used by someone else - there is no telling what the performance considerations might be for that script. If you are using English, performance is probably not what you are looking for. But there are probably other examples.


    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
      If I read that right, it means that the performance hit is only triggered once, the first time one uses them.

      No. What that means is that once you use them, all matches will incur the overhead of using them whether or not you actually do. It's not a one-time hit but it is all-or-nothing.

      (This is my 1000th post! :-)

      -sauoq
      "My two cents aren't worth a dime.";
      
        Gotcha! I guess it is too late over here to read documentation. :)

        But I still wonder how much of a penalty there is.


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
      How big is the impact? Better than 10x in a simple test (5.8.0 on Win2K).

      Just for fun, here's the benchmark. It took some guesswork to get it to run the subs in the right order - clean first, then use English;, then naughty. If anyone uncomments the print statements to test the order, use 1 as an argument so you don't have to wait forever. Here are the results:

      use strict; use Benchmark qw/cmpthese/; my $time = shift || -5; my $text = 'x' x 10_000; sub clean { # print "clean"; $text =~ m/^x/; } sub make_dirty { # print "md"; eval "use English;"; } sub naughty { # print "naughty"; $text =~ m/^x/; } my %hash = ( clean => 'clean', naughtify => 'make_dirty', sawamp => 'naughty', ); cmpthese ( $time, { clean => 'clean', naughtify => 'make_dirty', sawamp => 'naughty', }); __END__ results: C:\s\pldir>naughty.pl -5 Rate naughtify sawamp clean naughtify 433/s -- -98% -100% sawamp 24153/s 5481% -- -92% clean 300603/s 69366% 1145% --

      Someone with more benchmark-fu may correct me on this, but it looks right to me.

Re: Naughty match variables in CPAN?
by waswas-fng (Curate) on Jul 22, 2003 at 04:34 UTC
    What does this buy you over a quick recusive egrep of the modules you are using? For instance Math::MatrixReal returns a line such as:
    $string = $';

    You know if there is a match for one of those vars you will see the problem.

    -Waswas

      You can't do that because the code $money =  '$'.$money if ($currency eq 'USD'); also matches. (Only perl can parse Perl.)


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Naughty match variables in CPAN?
by TomDLux (Vicar) on Jul 22, 2003 at 07:46 UTC

    They no longer invoke the horrible penalty they used to, but in any case I prefer using parentheses to isolate subexpressions I want to remember.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Naughty match variables in CPAN?
by zakzebrowski (Curate) on Jul 22, 2003 at 12:20 UTC
    Should there be a general check for "naughty match variables" for code submitted to CPAN?
    Maybee. But, there may be cases were the author may choose to make life simpler by using those variables. A simple example is the following code which extracts relative context from a full text match.
    while (<DATA>){ while ($_ =~/(^|\W)CPAN(\W)/gi){ print substr($`,length($`)-10) . $& . substr($',0,10) . "\n"; } } __DATA__ To my great surprise, when I traced it out I found two CPAN modules (s +o far) that we use are also tainted in this way: Printer and Math::Ma +trixReal. I have sent mail to the maintainers of these modules point +ing out the issue. It makes me wonder how many other CPAN modules ar +e tainted with "naughty match variables".

    Output:
    ZAZ@localhost ~
    $ perl sample.pl
     found two CPAN modules
    
    many other CPAN modules ar
    
    Standard untested code caveat...

    ----
    Zak
    Pluralitas non est ponenda sine neccesitate - mysql's philosphy

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://276549]
Front-paged by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-19 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found