Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

What it mathches`

by abubacker (Pilgrim)
on Aug 17, 2009 at 04:36 UTC ( [id://789073]=perlquestion: print w/replies, xml ) Need Help??

abubacker has asked for the wisdom of the Perl Monks concerning the following question:

Dear all ,
Can any one tell me what the following code can match

$_=<>; if (/"^abc[A\b]def$"/ ){ print "true" ; } else { print "false"; }

Thanks in advance!

Replies are listed 'Best First'.
Re: What it mathches`
by ikegami (Patriarch) on Aug 17, 2009 at 04:42 UTC
    Nothing. It's impossible to find a double quote before the start of the string.
    • Matches a double quote
    • followed by the start of the string
    • followed by "abc"
    • followed by "A" or a backspace
    • followed by "def"
    • followed by whatever the content of var $" matches.
Re: What it mathches`
by Tanktalus (Canon) on Aug 17, 2009 at 05:14 UTC

    I had to fight with this a lot to get a "true" output. And even then, I cheated. The most basic of the problems is that ^ is a zero-width assertion. Think about another zero-width assertion, \b. That is, the break between alphanumeric and non-alphanumeric. If you have /a\bc/, this can never match anything because there is not, by definition, a change between word and non-word between an a and a c. Can't happen. Similarly, ^ is a zero-width assertion that asserts this is the beginning of a line. Some pedants may point out that it actually is the beginning of the string, but that's not quite true. The m modifier allows ^ to match anywhere in the string - in fact, according to perlre, the m modifier is merely removing the optimisation that perl has that assumes there is only one line in the string you're testing. That means that it's assuming it's a single line, thus ^ is the beginning of the string because of the assumption there is only one line.

    Anyway, ^, being zero-width, must be right after either the physical beginning of the string, or right after a \n. It can't be right after a quote.

    However, if we insert a \n into your regex right before the ^, we still don't quite get it to work because you're missing the m modifier. I'm also assuming you haven't set the deprecated $* variable (see perlvar, but don't use it - it's deprecated). Let's say we use the m modifier. It still doesn't work because $_=<> will only drag in a single line. Typing in "\nabc... won't match because $_ will only have the ", terminating the input on the carriage return. There is more cheating to be had: adding local $/; before the input line. Now I have:

    (echo '"'; echo "abcd") | perl -le 'local $/;$_=<>; $*=1; print "[$_]" +; if (/"\n^abc/) { print "true" }'
    And, lo and behold, it works. But notice: I added the $/ and $* (which you shouldn't do) variables, and the \n inside your regex.

    I do have to wonder, though, why you're asking this question. It has a slight odor of XY Problem ... or maybe homework. But only slight.

      thus ^ is the beginning of the string because of the assumption there is only one line.

      No, ^ is the beginning of the string because /m wasn't used. No assumption was made.

      By the way, $* doesn't exist anymore.

        Reading perlre, I see this:

        1. \ Quote the next metacharacter
        2. ^ Match the beginning of the line
        (Emphasis mine.) It then goes on to say:
        By default, the "^" character is guaranteed to match only the beginning of the string, the "$" character only the end (or before the newline at the end), and Perl does certain optimizations with the assumption that the string contains only one line.
        (Emphasis still mine.) That's where I got the "optimisation" part. As to $*, I'm using perl 5.8. The OP didn't mention version of perl, and I've not been paying attention to his posting history to note what version he's been using, so felt free to use whichever perl I had handy.

Re: What it mathches`
by CountZero (Bishop) on Aug 17, 2009 at 06:09 UTC
    Obviously, this is written by someone who thinks all strings must be doube-quoted in Perl! And consequentely he put double quote around the regex. Not only this is not necessary, it actually breaks the regex.

    Most probably, it was meant to be:

    if (/^abc[A\b]def$/ ){ print "true" ; } else { print "false"; }
    Which means: match a string starting with 'abc', followed by either a capital 'A' or a backspace, followed by 'def' which ends the string.

    Please note that normally \b in a regex means "match on a word boundary", but inside a character class (the square brackets) it means 'backspace'.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: What it mathches`
by Anonymous Monk on Aug 17, 2009 at 05:03 UTC
    YAPE::Regex::Explain
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/"^abc[A\b]def$"/ )->explain; __END__ The regular expression: (?-imsx:"^abc[A\b]def ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- abc 'abc' ---------------------------------------------------------------------- [A\b] any character of: 'A', '\b' (backspace) ---------------------------------------------------------------------- def 'def ' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: What it mathches`
by bobf (Monsignor) on Aug 17, 2009 at 05:56 UTC

    As other monks have suggested, this regex is a bit odd. Are the quotes supposed to be part of the pattern? The pattern does not need to be quoted inside the regex delimiters. See perlre and perlreref.

    If the quotes were added in error (i.e., they are not part of the pattern), then the regex becomes

    /^abc[A\b]def$/
    which not only makes more sense, but also the task of predicting matching patterns becomes trivial.

    If you need a hint, see Re: What it mathches` and ignore the parts about the quotes. I also wonder if \b (backspace) is in error, but without additional information speculation on intent is merely that.

Re: What it mathches`
by Anonymous Monk on Aug 17, 2009 at 06:36 UTC
    use re 'debug';
    use re 'debug'; $_="abc def"; #$_= qq!"abc def$"!; if (/"^abc[A\b]def$"/ ){ print "\ntrue\n"; } else { print "\nfalse\n"; } __END__ Compiling REx `"^abc[A\b]def ' size 19 Got 156 bytes for offset annotations. first at 1 1: EXACT <">(3) 3: BOL(4) 4: EXACT <abc>(6) 6: ANYOF[\10A](17) 17: EXACT <def >(19) 19: END(0) anchored ""abc" at 0 (checking anchored) minlen 9 Offsets: [19] 1[1] 0[0] 2[1] 3[3] 0[0] 6[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[ +0] 0[0] 0[0] 0[0] 11[4] 0[0] 15[0] false Freeing REx: `"\"^abc[A\\b]def "'
    use re 'debug'; #$_="abc def"; $_= qq!"abc def$"!; if (/"^abc[A\b]def$"/ ){ print "\ntrue\n"; } else { print "\nfalse\n"; } __END__ Compiling REx `"^abc[A\b]def ' size 19 Got 156 bytes for offset annotations. first at 1 1: EXACT <">(3) 3: BOL(4) 4: EXACT <abc>(6) 6: ANYOF[\10A](17) 17: EXACT <def >(19) 19: END(0) anchored ""abc" at 0 (checking anchored) minlen 9 Offsets: [19] 1[1] 0[0] 2[1] 3[3] 0[0] 6[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[ +0] 0[0] 0[0] 0[0] 11[4] 0[0] 15[0] Guessing start of match, REx ""^abc[A\b]def " against ""abc def "... Found anchored substr ""abc" at offset 0... Guessed: match at offset 0 Matching REx ""^abc[A\b]def " against ""abc def " Setting an EVAL scope, savestack=3 0 <> <"abc def > | 1: EXACT <"> 1 <"> <abc def > | 3: BOL failed... Match failed false Freeing REx: `"\"^abc[A\\b]def "'
      Did you notice? The Regex didn't even compile correctly: it failed to take into account the (nonsensical) final double quote after the '$'. Of course the regex engine has all reasons to stop reading the regex after '$' , but one would expect it to give some kind of error such as "garbage found after '$'".

      Forget the above: I missed the 'space' after the 'def'.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: What it mathches`
by targetsmart (Curate) on Aug 18, 2009 at 09:09 UTC
    Hi abubacker,

    I think u have done enough on regular expression when you learnt SED in your UNIX course.....

    if you have doubts on the basics just check with the local mentor there, or go back and read regular expressions.... :)


    Vivek
    -- 'I' am not the body, 'I' am the 'soul', which has no beginning or no end, no attachment or no aversion, nothing to attain or lose.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://789073]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-04-19 06:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found