I really don't mind using more than one regex for this one. You're dealing with more than one rule, so there's a nice symmetry; each rule has corresponding code. If you are concerned with it being verbose where you want it to be terse, move the work out to a subroutine.
Anyway, with those ideas, here's my version:
use Test::More;
my @test = (
[
'1 This (is a test) with good parens' => 'is a test',
'Match in parens'
],
[
'2 This is a (test with broken a paren' => 'test with broken a par
+en',
'Match after left paren'
],
[
'3 And this would be one) the other way' => '3 And this would be o
+ne',
'Match before right paren'
],
[
'4 Lastly, no parens' => '',
'No match'
],
);
foreach my $test (@test) {
my $got = match( $test->[0] );
is( $got, $test->[1], "$test->[2]: <<$got>>" );
}
done_testing();
sub match {
for (shift) {
m/ \(([^)]*?)\) /x && return $1; # Both parens.
m/ \((.*)$ /x && return $1; # Left paren.
m/ ^(.*)\) /x && return $1; # Right paren.
m/ ^[^()]*()$ /x && return $1; # No parens (no capture).
return; # Unreachable.
}
}
Update: As often happens, I just have to go to bed to have an idea disturb me. Here's an improvement (I think) on sub match:
sub match {
local $_ = shift;
m/ \(([^)]*?)\) /x # Both parens.
|| m/ \((.*)$ /x # Left paren.
|| m/ ^(.*)\) /x # Right paren.
|| m/ ^[^()]*()$ /x; # No parens (no capture).
return $1 // ();
}
Here's another version that combines the logic above into a single regex using alternation. I don't necessarily think this is better; I prefer the simplicity of breaking things into smaller regexes.
sub match {
shift =~ m/
(?: [^(]*\((?<C>[^)]*?)\) ) # Both parens.
| (?: \((?<C>.*)$ ) # Left paren.
| (?: ^(?<C>.*)\) ) # Right paren.
| (?: ^[^()]*(?<C>)$ ) # No parens (empty capture
+).
/x;
return $+{C} // ();
}
By using named captures we avoid the problem where other single-regex solutions result in either $1, or $2, or $3 being populated. That's too much to keep track of, and could be error prone. Instead, we name every capture the same: $+{C}. (Warning: After checking perlre, I'm of the vague and uncertain impression that this could rely on undefined behavior.)
Update: Having a little fun with this. Here are two more options with subtle changes from the previous.
The next example eliminates named captures. This would present a problem: The numeric match variable that accepts the capture could be $1, $2, or $3. choroba avoids this issue by concatenating all possible numeric match variables, but that means possibly interpolating undef, and feels a little dirty (but it is clever). We can avoid that by using $^N, which will contain the most recent submatch.
sub match {
shift =~ m/
(?: [^(]*\(([^)]*?)\) ) # Both parens.
| (?: \((.*)$ ) # Left paren.
| (?: ^(.*)\) ) # Right paren.
| (?: ^[^()]*()$ ) # No parens (empty capture).
/x;
return $^N // ();
}
This next one wraps all the alternation branches in the (?|...) branch reset construct. That means that each alternate will use the same $1, which is actually the closest I can come to the multiple-regex solutions I originally presented, but within a single regex.
sub match {
shift =~ m/
(?|
(?: [^(]*\(([^)]*?)\) ) # Both parens.
| (?: \((.*)$ ) # Left paren.
| (?: ^(.*)\) ) # Right paren.
| (?: ^[^()]*()$ ) # No parens (empty capture).
)
/x;
return $1 // ();
}
And finally we can remove the grouping (?...) parens, because alternation is already very low precedence:
sub match {
shift =~ m/
(?|
[^(]*\(([^)]*?)\) # Both parens.
| \((.*)$ # Left paren.
| ^(.*)\) # Right paren.
| ^[^()]*()$ # No parens (empty capture).
)
/x;
return $1 // ();
}
I think that this, being Perl, grants us license to explore in the spirit of There is more than one way to do it. :)
|