Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: A demanding parser

by Juerd (Abbot)
on Jan 25, 2002 at 16:43 UTC ( [id://141484]=note: print w/replies, xml ) Need Help??


in reply to A demanding parser

Is there any way of avoiding the external module and catch an arbitrary number of nested parentheses with a "normal" Regex? I know that the Owl book says it can't be done, but I would like to put my mind at rest on this issue.

It depends on what you call a "normal" regex. If normal means without any perl specific things, it can not be done. But if you don't mind a perl specific regex, perlre has the solution:

The following pattern matches a parenthesized group:
$re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x;

2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Replies are listed 'Best First'.
Re: Re: A demanding parser
by TheDamian (Vicar) on Jan 25, 2002 at 17:12 UTC
    Is there any way of avoiding the external module and catch an arbitrary number of nested parentheses with a "normal" Regex?
    use Regexp::Common; $str =~ /$RE{balanced}{-parens=>'()'}/
      That's what he does already. He asked if it were possible without the module. (Not a very good idea, but I don't know the motivation (education, perhaps))

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

        Ah, sorry. Missed that bit.

        How about:

        use Regexp::Common; print $RE{balanced}{-parens=>'()'}, "\n"; print $RE{balanced}{-parens=>'{}'}, "\n"; print $RE{balanced}{-parens=>'{}()'}, "\n"; # etc.
        followed by reading the Regexp::Common source?

        Damian

Re: Re: A demanding parser
by gmax (Abbot) on Jan 26, 2002 at 18:56 UTC
    Thanks for the tip. I am not sure I understand how to use it, though.
    My purpose, as you have pointed out, is to replace Regexp::Common with some normal Perl RegEx. By normal I mean a non-module dependant expression.
    As for the motivation, you guessed right that it's related to education. Personally, I wouldn't bother. I need to distribute this module as part of a more extensive educational material aiming at the build-up of a huge database. I would like to avoid pointing to a CPAN module, since many people in the audience are not experienced Perl users. They should just copy this module to their computers and execute the import/export script.
    Of course I can provide them with a copy of the module, or instruct them to connect to the CPAN, download the module and install it, or use "perl -MCPAN -e shell" but it would steal valuable time from my lectures.

    That aside, here is a test script for your RegEx, which does not seem to give me what I want.
    Was it my misunderstanding, or were you trying to show me how to catch the inner parenthesized text only?
    #!/usr/bin/perl -w use strict; use Regexp::Common; my $re = qr{ \( (?: (?> [^()]+ ) | (??{ $re }))* \) }x; my $input = "aa bb cc (dd ee (ff gg (hh) jj) kk)"; print "With module\n"; while ($input =~ m/(\w+|$RE{balanced}{-parens=>'()'})\s*/g) { print "$1\n"; } print "With recursive RegExp\n"; while ($input =~ m/(\w+|$re)\s*/g) { print "$1\n"; } __END__ # output: With module aa bb cc (dd ee (ff gg (hh) jj) kk) With recursive RegExp aa bb cc dd ee ff gg (hh) jj kk
    update
    Found the problem. Recursive RegExes don't work properly with use strict
    Changing
    my $re = qr{ \( (?: (?> [^()]+ ) | (??{ $re }))* \) }x;
    into
    no strict 'vars'; $rec_re = qr{ \( (?: (?> [^()]+ ) | (??{ $rec_re }))* \) }x; my $re = $rec_re; use strict;
    makes the same output from both regexes.
     _  _ _  _  
    (_|| | |(_|><
     _|   
    
Re^2: A demanding parser
by Aristotle (Chancellor) on Jan 25, 2002 at 17:07 UTC
    Wait a second - is that really a recursive regex there? :-o

    Makeshifts last the longest.

      Affirmative

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://141484]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-18 02:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found