Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Regex to pull out string within parenthesis that could contain parenthesis

by dpelican (Initiate)
on Jul 09, 2018 at 13:45 UTC ( [id://1218162]=perlquestion: print w/replies, xml ) Need Help??

dpelican has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a way to automate comment generation for some code that I'm working on and I'm trying to extract parameters from a function declaration. I came up with the following expression:

 ^(private|public)?\s?(function|report)\s([^()]+)\(([^()]+)?\)(\s(returns)\s\(?([^()]+)\)?)?

The expression worked on almost all functions until the parameters contained parentheses themselves, such as:

function convert_wa_date_strings(iv_beg string, iv_end string, iv_read_date date, iv_step char(6)) returns (date, date, char(1))

Since the parentheses are important for the variable type they can't be ignored. The same issue occurs with the returns, but it'll be the same fix. What is it that I'm missing to capture those pesky parameters with parentheses?

Thanks!

Replies are listed 'Best First'.
Re: Regex to pull out string within parentheses that could contain parentheses
by hippo (Bishop) on Jul 09, 2018 at 14:23 UTC

    Is this the sort of thing you are after? It's a PoC as it stands so feel free to tweak until it delivers what you actually want.

    use strict; use warnings; use Test::More tests => 6; my $text = 'function convert_wa_date_strings(iv_beg string, iv_end str +ing, iv_read_date date, iv_step char(6)) returns (date, date, char(1) +)'; my $re = qr#^(private|public)?\s?(function|report)\s(\w+)\((.+?)\)((?: +\s+)(returns)\s\((.+)*?\))?$#; ok ($text =~ $re, 'Matched'); is ($1, undef, '$1 is correct'); is ($2, 'function', '$2 is correct'); is ($3, 'convert_wa_date_strings', '$3 is correct'); is ($4, 'iv_beg string, iv_end string, iv_read_date date, iv_step char +(6)', '$4 is correct'); is ($5, ' returns (date, date, char(1))', '$5 is correct');
Re: Regex to pull out string within parenthesis that could contain parenthesis
by roboticus (Chancellor) on Jul 09, 2018 at 14:28 UTC

    dpelican

    Read perldoc perlre and search for "Recursive subpattern" and you'll find how to handle nesteded parenthesis.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Regex to pull out string within parenthesis that could contain parenthesis
by golux (Chaplain) on Jul 09, 2018 at 15:45 UTC
    Hi dpelican,

    I think this will do what you need. It uses the recursive subpatterns described in the perlre documentation. By the time I finished getting my example working I saw that roboticus had already mentioned them. (I had never used the recursive regex method, so it was a good learning experience for me).

    Edit:   Fixed some comments (specifically capture group numbering), and captured a little bit more.

    Edit 2: Added output.

    Edit 3: Allow keyword 'report' (somehow missed it the first time).

    #!/usr/bin/perl # # References: # http://perldoc.perl.org/perlre.html (See section on 'PARNO') ## use strict; use warnings; use feature qw( say ); use Method::Signatures; ################## ## Main Program ## ################## my $str = 'private function convert_wa_date_strings(iv_beg string, iv_ +end string, iv_read_date date, iv_step char(6)) returns (date, date, +char(1))'; recursive_function_parsing_regex($str); ################# ## Subroutines ## ################# func recursive_function_parsing_regex($str) { my $re = qr{ ( # Paren group 1 -- full function (?: (private|public) # Paren group 2 -- optional 'private +' or 'public' \s+)? (function) # Paren group 3 -- required 'functio +n' keyword \s* # Optional space after 'function' (\w+) # Paren group 4 -- function name ( # Paren group 5 -- args in parens \( ( # Paren group 6 -- contents of paren +s (?: (?> [^()]+ ) # Non-parens without backtracking | (?5) # Recurse to start of paren group 5 )* ) \) ) (?: # Optional return value \s+ returns\s* ( # Paren group 7 -- return args in pa +rens \( ( # Paren group 8 -- return args (?: (?> [^()]+ ) # Non-parens without backtracking | (?7) # Recurse to start of paren group 7 )* ) \) ) )? ) }x; if ($str !~ /$re/) { say "No match for '$str'"; return; } my ($full, $pp, $func, $name, $par, $args, $ret, $rargs) = ($1, $2 + || "", $3, $4, $5, $6, $7 || "", $8 || ""); say "Match!"; say " \$full => '$full'"; # Full expression say " \$pp => '$pp'"; # Optional 'private' or 'public' k +eyword say " \$func => '$func'"; # 'function' keyword say " \$name => '$name'"; # Function name say " \$par => '$par'"; # Func args (in parens) say " \$args => '$args'"; # Func args (no parens) say " \$ret => '$ret'"; # Optional return args (in parens) say " \$rargs => '$rargs'"; # Optional return args (no parens) }

    Result:

    Match! $full => 'private function convert_wa_date_strings(iv_beg string, i +v_end string, iv_read_date date, iv_step char(6)) returns (date, date +, char(1))' $pp => 'private' $func => 'function' $name => 'convert_wa_date_strings' $par => '(iv_beg string, iv_end string, iv_read_date date, iv_step + char(6))' $args => 'iv_beg string, iv_end string, iv_read_date date, iv_step +char(6)' $ret => '(date, date, char(1))' $rargs => 'date, date, char(1)'
    say  substr+lc crypt(qw $i3 SI$),4,5
Re: Regex to pull out string within parenthesis that could contain parenthesis (updated)
by AnomalousMonk (Archbishop) on Jul 09, 2018 at 17:58 UTC

    Here's another, more factored example of the use of recursive subpatterns (introduced with Perl version 5.10):

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "use 5.010; ;; my $s = 'function convert(beg string, end string, read_date date, step char +(6)) returns (date, date, char(1))'; ;; my $rx_paren = qr{ ( [(] (?: [^()]*+ | (?-1))* [)] ) }xms; my $rx_identifier = qr{ \w+ }xms; ;; my $parsed_ok = my @ra = $s =~ m{ \A \s* (private|public)? \s* (function|report) \s* ($rx_identifier) \s* $rx_paren \s* ((returns) \s* $rx_paren)? \s* \z }xms; ;; if ($parsed_ok) { dd @ra; } else { print 'parse failed'; } " ( undef, "function", "convert", "(beg string, end string, read_date date, step char(6))", "returns (date, date, char(1))", "returns", "(date, date, char(1))", )

    Update: The  (private|public)? \s* sub-expression in the above  m// should probably be something like (untested)
        ((?: private | public) \s)? \s*
    because, e.g.,  public looks too much like  function or  report that would always follow it and requires some delimitation.


    Give a man a fish:  <%-{-{-{-<

      Here's a variation on the above solution, using named recursive subpatterns and named captures.
      Nowadays I write all my non-trivial regexes this way.
      use 5.010; my $source = 'function convert(beg string, end string, read_date date, + step char(6)) returns (date, date, char(1))'; my $matched = $source =~ m{ \A \s*+ (?<access> private | public )?+ \s*+ (?<keyword> function | report ) \s*+ (?<name> (?&identifier) ) \s*+ (?<params> (?&list) ) \s*+ (returns \s*+ (?<returns> (?&list) ) )?+ \s*+ \z (?(DEFINE) (?<identifier> [^\W\d]\w*+ ) (?<list> [(] [^()]*+ (?: (?&list) [^()]*+ )*+ [)] ) ) }xms; if ($matched) { my %components = %+; use Data::Dumper 'Dumper'; say Dumper \%components; } else { say 'parse failed'; }
      which outputs:
      $VAR1 = { keyword => 'function', name => 'convert', params => '(beg string, end string, read_date date, step char(6))', returns => '(date, date, char(1))', };

        I had entirely forgotten about named captures and  (?(DEFINE)...) — a much better (regex) approach.


        Give a man a fish:  <%-{-{-{-<

Re: Regex to pull out string within parenthesis that could contain parenthesis
by tobyink (Canon) on Jul 09, 2018 at 20:06 UTC
Re: Regex to pull out string within parenthesis that could contain parenthesis
by fishy (Friar) on Jul 09, 2018 at 16:03 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1218162]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-28 13:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found