http://qs321.pair.com?node_id=311603

hello fellow monks,

some days ago I was playing with an "advanced search"-like form, one of these where you can fill one or more search fields. something like:

# example of form: # Name: __________ # Address: __________ # Telephone: __________ # etc. # [Search]
of course, some of the fields could be blank. the searching code, reduced to its minimum, was something like:
if( $name =~ /\Q$pattern_name\E/i and $address =~ /\Q$pattern_address\E/i and $telephone =~ /\Q$pattern_telephone\E/i) { print "matched\n"; }
the match mysteriously failed when filling just the "Name" field, and I was puzzled for a few hours before finding out what was wrong.

the bug is really subtle indeed, and if you can spot it at a first glance, ++ to you.

what happens here is that, if one of the $pattern variables are blank, the regexp pattern evaluates to the empty string, and guess what this means? the last successfully matched regular expression is used instead. and more, if no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match). (the italic is taken from perlop).

so, if I put, for example "da" in the "Name" field, my if statement tries to match name, address AND telephone from my database with "da". and this obviously fails. even putting debug statements outside the if, just to check what I thought was matched (eg. print the pattern, print the result of $data =~ /pattern/) didn't help, because it's the order of the matches that screw things up.

I ended up changing my code to something like:

if( (not $pattern_name or $name =~ /\Q$pattern_name\E/i) and (not $pattern_address or $address =~ /\Q$pattern_address\E/i) and (not $pattern_telephone or $telephone =~ /\Q$pattern_telephone\E/i +) ) { print "matched\n"; }
(I know, I could I've just used index instead of a regexp in this case, but the principle still applies).

and what I've learned is: something like /$pattern/ (I mean, just a variable inside a regexp and nothing else) is probably a bad idea, unless you're really, really sure that $pattern will actually contain something.

I hope this word of warning could save some headache to someone else in the future :-)

cheers,
Aldo

King of Laziness, Wizard of Impatience, Lord of Hubris

Replies are listed 'Best First'.
Re: a word of warning about /$pattern/
by demerphq (Chancellor) on Dec 02, 2003 at 16:39 UTC

    Heh. Funny just today I used this property to my advantage:

    my $filter = @$custs ? join('|', map quotemeta($_),@$custs) : ""; $filter=qr/^(?!_)(?:$filter)/i; ( my $year_mon = $mon_year ) =~ s/(\d+)_(\d+)/20${2}_${1}/; my @dirs = map { if (-d $_) { warn "Considering: $_\n"; s/.*?([^\\\/]+)$/$1/; if (/$filter/) { warn "Processing: $_\n"; $_ } else { warn "Skipping: $_\n"; () } } else { (); } } glob File::Spec->catfile( $root, '*' );

    This code scans a directory and either includes all directories that dont begin with an _ or only the customers specified in the @$custs array (that dont start with _). I just thought it was funny reading this only minutes after I wrote the above code. (Actually modified some ugly old code to handle a customer filter. Obviously I didnt bother de-uglifying it ;-)

    Anyways, just thought i'd share the irony with yall.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


Re: a word of warning about /$pattern/
by diotalevi (Canon) on Dec 02, 2003 at 14:48 UTC
    my $address_re = length( $address ) ? quotemeta( $address ) : "(?#)";

    No wait, even better - put the comment inside the regex code.

    if ( $foo =~ /(?#)\Q$bar\E/ ...
Re: a word of warning about /$pattern/
by Theo (Priest) on Dec 02, 2003 at 15:42 UTC
    To my limited understanding, it seems like an empty pattern will return true. If all patterns are empty, is the whole expression true? Did you test with all of the patterns being empty?

    -Theo-
    (so many nodes and so little time ... )

      To my limited understanding, it seems like an empty pattern will return true.

      that's not true (or not accurate, at least). read carefully the italic lines in my previous message, in particular: if no match has previously succeded, this will (silently) act instead as a genuine empty pattern (which will always match).

      If all patterns are empty, is the whole expression true? Did you test with all of the patterns being empty?

      yes, and yes. and that was even worse, because it helped to hide the bug from my eyes. when all the patterns were empty, all the records did match. but this happened only because no match has previously succeded, so this is something subject to an action-at-a-distance (eg. a succesful pattern match 1000 lines before may very well alter the behaviour of the search).

      cheers,
      Aldo

      King of Laziness, Wizard of Impatience, Lord of Hubris

        (eg. a succesful pattern match 1000 lines before may very well alter the behaviour of the search)

        That's horrible! I'm sure I'll be hit with that one sooner or later. (Hope I remember this fact at that time).

        -theo-
        (so many nodes and so little time ... )

Re: a word of warning about /$pattern/
by vacant (Pilgrim) on Dec 03, 2003 at 04:49 UTC
    I have once or twice been bewildered by this feature, too. In addition to perlop, there is Freidl, who says in the Owls Book: "If no regex is given, such as with m// ... Perl reuses the regular expression most recently used successfully with in the enclosing dynamic scope. This used to be useful for efficiency reasons, but is now obsolete with the advent of regex objects."

    This kind of thing keeps me humble, because I know that somewhere there is someone who understands exactly why this was a good idea.

      I think this "feature" is for sed compatibility. (By compatibility, I mean that the people using sed can get used to perl easier.)

      Maybe it would be better if this magic applied only if the regexp is empty before interpolation (and before overload qr transformation).

      This kind of thing keeps me humble, because I know that somewhere there is someone who understands exactly why this was a good idea

      Its still a good idea:

      if (/foo/ or /bop/ or /fnorble/) { s//baz/; }

      IOW, it means you can have several regexes try to match, and then without having to worry about which one did you can then replace the thing that was matched. I only recently came accross this feature, but sice I have its found its way into more code than I thought it would.


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        Huh?
        s/foo|bop|fnorble/baz/;

        Makeshifts last the longest.

Re: a word of warning about /$pattern/
by hardburn (Abbot) on Dec 02, 2003 at 14:48 UTC

    Shouldn't you anchor it, e.g.:

    my $pattern = qr//; $_ = 'foo'; /\A \Q$pattern\E \z/x and print "Matched\n";

    Then you don't have to worry about an empty pattern at all. You also don't have to worry about only part of the string matching the pattern.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      But he needed matching parts of the string. Actually he needed the empty pattern to match.

      I can see your answer as an automatic matching process: problem with matching empty strings => answer anchore the pattern. But this time this heuristic did not work.

      I must admit I am guilty of some automatic answers as well.

Re: a word of warning about /$pattern/
by Roger (Parson) on Dec 03, 2003 at 02:51 UTC
    You would definitely want more than just the pattern you are searching for, especially when processing a form. I will normally also search for some anchors in the form fields.

    Example 1 - With anchor fields

    Ok, another example without the anchors -

    Example 2 - Without anchor fields