Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Regex to match text in broken parens

by Rodster001 (Pilgrim)
on Oct 31, 2014 at 18:23 UTC ( [id://1105744]=perlquestion: print w/replies, xml ) Need Help??

Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

This does what I want, but do you have any suggestions on writing the regex as a one liner and maybe a bit more elegantly (without the ifs)?
use Data::Dumper; my @test = ( "1 This (is a test) with good parens", "2 This is a (test with broken a paren", "3 And this would be one) the other way", "4 Lastly, no parens" ); print Dumper \@test; foreach (@test) { my ($ip) = $_ =~ m#\((.*?)\)#; print " Match in parens: $ip\n" if $ip; my ($rp) = $_ =~ m#([^(]*?)\)# if $_ !~ /\(/; print " Match before right paren: $rp\n" if $rp; my ($lp) = $_ =~ m#\((.*)# if $_ !~ /\)/; print " Match after left paren: $lp\n" if $lp; } Output: Match in parens: is a test Match after left paren: test with broken a paren Match before right paren: 3 And this would be one

Replies are listed 'Best First'.
Re: Regex to match text in broken parens
by choroba (Cardinal) on Oct 31, 2014 at 18:53 UTC
    The following code passes your tests. It uses a single if and a hash ref as a "poor man's switch":
    for (@test) { if (/ .* (^|\() (.*?) (\)|$) /x and my $p = "$1$3") { print ' Match ', { ')' => 'before right paren', '(' => 'after left paren', '()' => 'in parens', }->{$p}, ": $2\n"; } }

    It's not clear, though, what should happen if the parentheses were nested.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Exactly what I was looking for, thanks! (btw, nested parens won't be a problem with the data I am working with)

        Famous last words!


        ++$anecdote ne $data


Re: Regex to match text in broken parens
by davido (Cardinal) on Nov 01, 2014 at 07:38 UTC

    I really don't mind using more than one regex for this one. You're dealing with more than one rule, so there's a nice symmetry; each rule has corresponding code. If you are concerned with it being verbose where you want it to be terse, move the work out to a subroutine.

    Anyway, with those ideas, here's my version:

    use Test::More; my @test = ( [ '1 This (is a test) with good parens' => 'is a test', 'Match in parens' ], [ '2 This is a (test with broken a paren' => 'test with broken a par +en', 'Match after left paren' ], [ '3 And this would be one) the other way' => '3 And this would be o +ne', 'Match before right paren' ], [ '4 Lastly, no parens' => '', 'No match' ], ); foreach my $test (@test) { my $got = match( $test->[0] ); is( $got, $test->[1], "$test->[2]: <<$got>>" ); } done_testing(); sub match { for (shift) { m/ \(([^)]*?)\) /x && return $1; # Both parens. m/ \((.*)$ /x && return $1; # Left paren. m/ ^(.*)\) /x && return $1; # Right paren. m/ ^[^()]*()$ /x && return $1; # No parens (no capture). return; # Unreachable. } }

    Update: As often happens, I just have to go to bed to have an idea disturb me. Here's an improvement (I think) on sub match:

    sub match { local $_ = shift; m/ \(([^)]*?)\) /x # Both parens. || m/ \((.*)$ /x # Left paren. || m/ ^(.*)\) /x # Right paren. || m/ ^[^()]*()$ /x; # No parens (no capture). return $1 // (); }

    Here's another version that combines the logic above into a single regex using alternation. I don't necessarily think this is better; I prefer the simplicity of breaking things into smaller regexes.

    sub match { shift =~ m/ (?: [^(]*\((?<C>[^)]*?)\) ) # Both parens. | (?: \((?<C>.*)$ ) # Left paren. | (?: ^(?<C>.*)\) ) # Right paren. | (?: ^[^()]*(?<C>)$ ) # No parens (empty capture +). /x; return $+{C} // (); }

    By using named captures we avoid the problem where other single-regex solutions result in either $1, or $2, or $3 being populated. That's too much to keep track of, and could be error prone. Instead, we name every capture the same: $+{C}. (Warning: After checking perlre, I'm of the vague and uncertain impression that this could rely on undefined behavior.)

    Update: Having a little fun with this. Here are two more options with subtle changes from the previous.

    The next example eliminates named captures. This would present a problem: The numeric match variable that accepts the capture could be $1, $2, or $3. choroba avoids this issue by concatenating all possible numeric match variables, but that means possibly interpolating undef, and feels a little dirty (but it is clever). We can avoid that by using $^N, which will contain the most recent submatch.

    sub match { shift =~ m/ (?: [^(]*\(([^)]*?)\) ) # Both parens. | (?: \((.*)$ ) # Left paren. | (?: ^(.*)\) ) # Right paren. | (?: ^[^()]*()$ ) # No parens (empty capture). /x; return $^N // (); }

    This next one wraps all the alternation branches in the (?|...) branch reset construct. That means that each alternate will use the same $1, which is actually the closest I can come to the multiple-regex solutions I originally presented, but within a single regex.

    sub match { shift =~ m/ (?| (?: [^(]*\(([^)]*?)\) ) # Both parens. | (?: \((.*)$ ) # Left paren. | (?: ^(.*)\) ) # Right paren. | (?: ^[^()]*()$ ) # No parens (empty capture). ) /x; return $1 // (); }

    And finally we can remove the grouping (?...) parens, because alternation is already very low precedence:

    sub match { shift =~ m/ (?| [^(]*\(([^)]*?)\) # Both parens. | \((.*)$ # Left paren. | ^(.*)\) # Right paren. | ^[^()]*()$ # No parens (empty capture). ) /x; return $1 // (); }

    I think that this, being Perl, grants us license to explore in the spirit of There is more than one way to do it. :)


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1105744]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 16:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found