Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Regular expression help: why does this not match?

by lokiloki (Beadle)
on Jan 18, 2007 at 22:50 UTC ( [id://595362]=perlquestion: print w/replies, xml ) Need Help??

lokiloki has asked for the wisdom of the Perl Monks concerning the following question:

!/usr/bin/perl $a = "http://www.myurl.com/e.php?a"; $b = "http://www.myurl.com/e.php?a"; if ($a =~ /$b/) { print "FOUND\n"; } else { print "NOT FOUND\n"; } $a = "http://www.myurl.com/e.php"; $b = "http://www.myurl.com/e.php"; if ($a =~ /$b/) { print "FOUND\n"; } else { print "NOT FOUND\n"; }
This results in
NOT FOUND FOUND
I suppose I am completely stupid... I suppose the ? is being interpreted as a regular expression character to the preceeding p? Why does the second match, but the first does not? How to remedy this?

Replies are listed 'Best First'.
Re: Regular expression help: why does this not match?
by Zaxo (Archbishop) on Jan 18, 2007 at 22:57 UTC

    '?' is a regex metacharacter for optional. It doesn't match a literal '?'.

    '.' is also, but it will only surprise you later. It does match literal '.'.

    You can write,

    if ($a =~ /\Q$b\E/) { # . . . }
    to get all characters in $b taken as literal.

    BTW, $a and $b are special to sort, so shouldn't be chosen as user variable names.

    After Compline,
    Zaxo

      yes, i figured that... so if i have a very long string that contains multiple possible regex metacharacters, how can i do a match and tell that match to "ignore" any such metacharacters? or do i have to process the string first and backslash all of them??

        Or don't use a regexp at all.

        $a =~ /^\Q$b\E\z/
        is equivalent to
        $a eq $b

        $a =~ /\Q$b\E/
        is equivalent to
        index($a, $b) >= 0

        Case-insensitive versions:

        $a =~ /^\Q$b\E\z/i
        is equivalent to
        lc($a) eq lc($b)

        $a =~ /\Q$b\E/i
        is equivalent to
        index(lc($a), lc($b)) >= 0

        quotemeta or \Q as I added to my original reply.

        After Compline,
        Zaxo

Re: Regular expression help: why does this not match?
by chargrill (Parson) on Jan 18, 2007 at 23:00 UTC

    Correct, ? is a regex meta character which means "Match 1 or 0 times". In your first example, your regex is looking for: http://www(any char)myurl(any char)/e(any char)ph(0 or 1 p)a.

    Please note that the dot . also has special meaning - it means to match any character (also note that a dot qualifies as "any character" :-). You could specify \. to match a dot, but given that you want to match the question mark too, you might just be better off with \Q (quote the following regex metacharacters) (generally followed by \E (stop quoting regex metacharacters), too), i.e.:

    $a = "http://www.myurl.com/e.php?a"; $b = "http://www.myurl.com/e.php?a"; if ($a =~ /\Q$b\E/) { print "FOUND\n"; } else { print "NOT FOUND\n"; }

    Incidentally, $a and $b are bad names for variables, as they have special meaning to sort. And you might also have to fix up your shebang (!/usr/bin/perl should be #!/usr/bin/perl) in case you ever want to run your program via ./ instead of perl program.pl.



    --chargrill
    s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
Re: Regular expression help: why does this not match?
by gaal (Parson) on Jan 18, 2007 at 23:00 UTC
    You're not stupid, after all you answered your own question correctly. The '?' in the first $b is indeed being interpreted as a metacharacter. To prevent this, use the \Q...\E codes in your regular expression (or the quotemeta builtin outside it):

    $a = "http://www.myurl.com/e.php?a"; $b = "http://www.myurl.com/e.php?a"; if ($a =~ /\Q$b\E/) { # <--------- print "FOUND\n"; } else { print "NOT FOUND\n"; }
      ah ha! \Q..\E is what i was looking for... however, its funny and a bit counter intuitive how this code still knows to interpret $b rather than simply going \$b
        It may be surprising at first, but the idea is that it lets you construct regexps on the fly. One common thing this is used for is when you have a list of valid values you got from somewhere, say @valid, and you want to check a value against it:

        my $valid = join "|", @valid; print "okay" if /^$valid$/;

        There are actually two improvements to make in the above code. First, the members of the valid list themselves might contain metacharacters in need of quoting; second, Perl has the qr// operator to make this more efficient:

        # don't run this code on every match: the idea is the qr// needs # to be computed only once. my $valid = join "|", map { quotemeta } @valid; my $valid_re = qr/^$valid$/; # now match as many times as you like. print "$_: " . (/$valid_re/ ? "okay" : "not okay") . "\n" for @a_bunch_of_inputs;
Re: Regular expression help: why does this not match?
by vaticide (Scribe) on Jan 18, 2007 at 23:02 UTC
    The ? is a regex metacharacter. If you want to match it, match on "\?".
    Otherwise, you should be checking for string equality using $a eq $b.
    #!/usr/local/perl $a = "http://www.myurl.com/e.php?a"; $b = qr|http://www.myurl.com/e.php\?a|; if ($a =~ /$b/) { print "FOUND\n"; } else { print "NOT FOUND\n"; } $a = "http://www.myurl.com/e.php"; $b = "http://www.myurl.com/e.php"; if ($a =~ /$b/) { print "FOUND\n"; } else { print "NOT FOUND\n"; } $a = "http://www.myurl.com/e.php"; $b = "http://www.myurl.com/e.php"; if ($a eq /$b/) { print "FOUND\n"; } else { print "NOT FOUND\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://595362]
Approved by chargrill
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 23:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found