Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Eternal question of parsing parentheses

by Anonymous Monk
on Oct 24, 2009 at 12:59 UTC ( [id://803034]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This must be a common question, but I can't find any answer.

In a string containing nested parentheses, I want to find the first parenthesis, in order to evaluate things for that before the rest. So for $_='a(b(c(d)(e))f)g(h)((i)j)'; the result should be b(c(d)(e))f.

This can be done the sad and boring way by splitting the string and going through it, counting parenthesis signs as you go. But surely there must be some clever regex to do it?

I came up with one solution, which looks wonderfully cryptic:

while(/^[^\(]*\([^\)]*\(/){s/\(([^\(\)]*)\)/\[\1\]/} s/.*?\((.*?)\).*/\1/; tr/\[\]/\(\)/;

Explanation:
While the first '(' is follwed by another '(' before any ')', i.e. the first parenthesis has inner parentheses, find the first '(' which is NOT followed by another '(' before the ')', i.e. the first innermost parenthesis, and replace that with [...]. Then extract the first remaining parenthesis, and replace all [ ] with ( ).

Is this a good idea? Can you improve it? Is there a better way?

Replies are listed 'Best First'.
Re: Eternal question of parsing parentheses
by JavaFan (Canon) on Oct 24, 2009 at 13:23 UTC
    use Regexp::Common qw /balanced/; /($RE{balanced}{-parens=>’()’})/ and print $1;
Re: Eternal question of parsing parentheses
by ikegami (Patriarch) on Oct 24, 2009 at 16:39 UTC
Re: Eternal question of parsing parentheses
by LanX (Saint) on Oct 24, 2009 at 17:44 UTC
    Eternal answer RTFM¹ ! :)

    perldoc perlre ( >5.10) and searching for "recursive" brought the following code, where I only had to exchange "foo" with \w*

    $_='a(b(c(d)(e))f)g(h)((i)j)'; $re = qr{ ( # paren group 1 (full function) \w* ( # paren group 2 (parens) \( ( # paren group 3 (contents of parens) (?: (?> [^()]+ ) # Non-parens without backtracking | (?2) # Recurse to start of paren group 2 )* ) \) ) ) }x; @matches=/$re/; print "@matches";

    perl /tmp/tst.pl a(b(c(d)(e))f) (b(c(d)(e))f) b(c(d)(e))f Compilation finished at Sat Oct 24 19:42:58

    Cheers Rolf

    (¹) SCNR 8)

    UPDATE: untabified code.

      Hi

      I simplified the code, and replaced tr/()/<>/ to make it more readable:

      $_='a<b<c<d><e>>f>g<h><<i>j>'; $re = qr{ < # anchor at first paren as wanted ( # paren group 1 (?: (?> [^<>]+) # Non-parens without backtracking | < (?1) # Recurse to start of paren group 1 > )* ) }x; /$re/; print $1;
      perl /tmp/tst2.pl b<c<d><e>>f

      Cheers Rolf

      For those still limited to 5.8 and before, see also the discussion of the
       (??{ code }) construct under Extended Patterns (just before the discussion of
      (?PARNO)) in perlre.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://803034]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-19 21:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found