Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Regex conditional match if previous match in same expression is true?

by radiantmatrix (Parson)
on Apr 09, 2007 at 17:14 UTC ( [id://608987]=perlquestion: print w/replies, xml ) Need Help??

radiantmatrix has asked for the wisdom of the Perl Monks concerning the following question:

I apologize for the confusing title, I'm in new regex territory (for me) here. I've been working my way through the excellent Mastering Regular Expressions, but I must be missing something.

I'm attempting to solve a problem wherein I'm finding a particular string in text: this string is optionally surrounded by '{' and '}'. In other words, the closing brace needs to be matched if and only if the opening brace was there.

I simplified the issue to a very minimal case in order to construct the regex, but I am getting unexpected results. Perhaps a more savvy monk than I can enlighten me as to where I've gone wrong.

use strict; use warnings; for ( 'oh {hello} there', 'oh {hello there', 'oh hello there', 'oh hello} there', ) { print '', ( $_ =~ / ([{]{0,1}) # optional opening brace hello # .. followed by 'hello' (?(1)\}) # a closing brace iif the open brace was the +re /x ? 'YEP ' : 'NOPE' ), " - $1\n"; }

This program outputs the following lines:

YEP  - {
NOPE - 
NOPE - 
YEP  - 

Expected / desired results are:

YEP  - {
NOPE -
YEP  -
NOPE -

What in the world am I missing?

<radiant.matrix>
Ramblings and references
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

Replies are listed 'Best First'.
Re: Regex conditional match if previous match in same expression is true?
by ikegami (Patriarch) on Apr 09, 2007 at 18:31 UTC

    What in the world am I missing?

    The error is in your problem definition. Your regexp does exactly what you said you wanted it to do. It's searching for a string optionally surrounded by '{' and '}'. {hello is optionally surrounded by {...} since it's is not surrounded by {...}.

    'oh {hello there' =~ / ([{]{0,1}) # Matches '' (after some backtracking) hello # Matches 'hello' (?(1)\}) # Matches '' /x;
    'oh hello} there' =~ / ([{]{0,1}) # Matches '' hello # Matches 'hello' (?(1)\}) # Matches '' /x;

    As you can see, searching for a string optionally surrounded by something is the same thing as searching for the string itself.

    From your expected results, I deduce you actually want a string that is surrounded by {...}, or one that is neither preceded by { nor followed by }.

    / {hello} # '{hello}' | (?<! { ) # Not preceded by '{' hello # 'hello' (?! } ) # Not followed by '}' /x

      From your expected results, I deduce you actually want a string that is surrounded by {...}, or one that is neither preceded by { nor followed by }.

      That's correct: I want the string, optionally surrounded by braces. One brace on only one side is not acceptable. I'm glad that my listing of expected results was clearer than my description. ;-)

      Your alternation approach certainly functions. However, I was also hoping to learn to use the conditional ( (?(COND)...) ) notation. So you answered the question I asked (thanks!); but left me with the one I didn't ask.

      For my own education, can you think of a solution that uses the conditional notation, or would I be horribly abusing said to solve this problem?

      <radiant.matrix>
      Ramblings and references
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet

        can you think of a solution that uses the conditional notation

        The algorithm:

        • Note the preceding character (if any).
        • Match 'hello'.
        • If there was a character to note,
          • If the noted character is {,
            • Match }. ( Oops, I had the brace reversed. )
          • Else,
            • Negatively match }.
        • Else,
          • Negatively match }.

        As it turns out, the "if any" portion of the first step is hard to implement because look-behinds must be fixed-width. So let's make that conditional too:

        • If we are at the start of the string,
          • Match 'hello'.
          • Negatively match }.
        • Else,
          • Note the preceding character.
          • Match 'hello'.
          • If the noted character is {,
            • Match }. ( Oops, I had the brace reversed. )
          • Else,
            • Negatively match }.
        / (?(?<=(.)) # We are not at the start of the string. # The preceding character is in $1. hello (?(?{ $1 eq "\{" }) # The char before 'hello' is '{'. } | # The char before 'hello' is not '{'. (?! } ) ) | # We are at the start of the string. hello (?! } ) ) /x

        Yikes!

        Other posters were correct when they changed things from using the {min,max} quantifier notation on the inside of the capture buffer to using the '?' quantifier on the outside. There is a crucial difference between "empty but matching" and "not matching", and {0,1} doesnt have the same behaviour as '?' even though they are functionally equivelent. (This may be construed as a bug :-) Alternatively you can do what i do below, which is to put the capture inside of an alternation.

        Anyway, the following uses the conditional pattern with lookahead/lookbehind to match as you requested. You can play around with it to easily forbid {{hello}}, as the current code allows it.

        use strict; use warnings; for ( 'oh {hello} there', 'oh {hello there', 'oh hello there', 'oh hello} there', 'of {{hello}} there', ) { if ( $_ =~ / (?: ( \{ ) # capture a '{' | # or (?<! \{ ) # not preceded by a '{' ) hello # .. followed by 'hello' (?(1) # if (defined $1) \} # match an '}' | # else (?! \} ) # not followed by a '}' ) # /x) { print "YEP : $_ : ", defined $1 ? "'$1'" : 'undef', " - $&\n"; } else { print "NOPE: $_\n"; } } __END__ YEP : oh {hello} there : '{' - {hello} NOPE: oh {hello there YEP : oh hello there : undef - hello NOPE: oh hello} there YEP : of {{hello}} there : '{' - {hello}

        Also I changed your diagnostic code, you were using $1 even when the match failed, which meant you were getting the wrong thing.

        ---
        $world=~s/war/peace/g

Re: Regex conditional match if previous match in same expression is true?
by merlyn (Sage) on Apr 09, 2007 at 17:23 UTC
    Looks to me like $1 is always going to be defined. It just may be empty. Maybe what you want is (\{)? to match the open brace, very much like the example in perlre.

      Well, that's certainly an improvement -- I had forgotten about all of the behaviors of the ? modifier. However, I must still be missing something. My code now reads:

      use strict; use warnings; for ( 'oh {hello} there', 'oh {hello there', 'oh hello there', 'oh hello} there', ) { print '', ( $_ =~ / (\{)? # optional opening brace hello # .. followed by 'hello' (?(1)\}) # a closing brace iif the open brace was the +re /x ? 'YEP ' : 'NOPE' ), " - $1\n"; }

      But my output is:

      YEP  - {
      YEP  - 
      YEP  - 
      YEP  - 

      However, now warnings are thrown for using an uninitialized '$1' in the last three cases, as it should be.

      What I'm struggling with is how to say that '{hello}' is OK, and 'hello' is OK, but '{hello' and 'hello}' are NOT OK. Or, put another way, I want to require a closing brace if an opening brace is found; however, if there is no opening brace then there must be no closing brace either.

      Thanks for the help, though, it's an improvement.

      <radiant.matrix>
      Ramblings and references
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: Regex conditional match if previous match in same expression is true?
by Rhandom (Curate) on Apr 09, 2007 at 19:02 UTC
    The following doesn't use the (?() pat) construct but it does use a similar setup - and it prints out the correct output:

    for ( 'oh {hello} there', 'oh {hello there', 'oh hello there', 'oh hello} there', ) { our $paren; if (/ ({ | (?<!{)) # optional opening brace (?{ $paren = $^N }) # store for later hello # .. followed by 'hello' (??{$paren ? "\}" : "(?!\})"}) # a closing brace if the open +brace was there /x) { print "Yep - $1\n"; } else { print "Nope\n"; } }


    I tried briefly to get the (?() ) form to work but couldn't get it to go.

    my @a=qw(random brilliant braindead); print $a[rand(@a)];

      That doesn't give me the desired results at all. I was using Perl 5.6, but $^N was only introduced in 5.8.

      By the way, you should localize your package variables whenever possible. Replace
      our $paren;
      with
      local our $paren;

      I'd also add a comment along the lines of "Always use package variables with regular expressions." Someone reading or maintaining the code could very well not know that lexical variables can cause problems.

      Finally, to avoid continually compiling regexp fragments, replace
      (??{$paren ? "\}" : "(?!\})"})
      with
      (?(?{ $paren }) } | (?!}) )

        Note that Rhandom and I basically posted the same pattern, the main difference being mine doesn't need the (?{})/(??{})/$^N stuff, using the conditional pattern instead (as you originally requested).

        Actually to be honest I didnt really look deeply at Rhandom's post before I posted mine. Its cool we came up with the same thing pretty much, but using two different advanced feature sets.

        ---
        $world=~s/war/peace/g

Re: Regex conditional match if previous match in same expression is true?
by kyle (Abbot) on Apr 09, 2007 at 17:44 UTC

    Your expression just never needs to match the braces at all. If you add \s at the beginning and end of the expression (to force Perl to look at something beyond 'hello'), you get the expected result.

    Update: In my fooling around, I'd also changed ([{]{0,1}) to ([{])? (because the first version, as merlyn says, will always match, but sometimes an empty string). That's also necessary to make it work as expected.

      I'm not interested in matching whitespace -- my examples just happen to have some. I would want the results listed as "desired" in the node even if all whitespace was removed.

      <radiant.matrix>
      Ramblings and references
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet

        In that case, how about this:

        use Test::More; my %tests = ( 'hello' => 1, '{hello}' => 1, 'hello}' => 0, '{hello' => 0, 'oh {hello} there' => 1, 'oh {hello there' => 0, 'oh hello there' => 1, 'oh hello} there' => 0, ); plan 'tests' => scalar keys %tests; while ( my ( $text, $result ) = each %tests ) { my $hello = qr/hello/; my $test_result = ( $text =~ / \{$hello\} # hello with braces | # or (?<! \{ ) # not a brace $hello # hello (?! \} ) # also not a brace /x ) ? 1 : 0; is( $test_result, $result, $text ); }

        Put your (possibly complicated) match text in a variable so you don't have to change it in two places when it changes. After that, literally match the text with braces and without.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://608987]
Approved by moklevat
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-29 09:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found