Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: Tokenizing and qr// <=> /g interplay

by MarkusLaker (Beadle)
on Apr 23, 2005 at 16:34 UTC ( [id://450732]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Tokenizing and qr// <=> /g interplay
in thread Tokenizing and qr// <=> /g interplay

Another use for qr// is to break up unmanageably complex regular expressions into simpler, named, self-contained pieces. (There's a direct parallel here with subs, which do the same for 'ordinary' Perl code. In fact, you can consider a named regex to be just a function written with a funny-looking syntax: its input is a string and its output is either a Boolean value or one or more strings, depending on whether it captures anything.)

Here's an example from a code-filtering assertions module (yes, another one) that's not yet tested thoroughly enough to submit to CPAN:

# A set of regexen to match balanced text in round, square or # curly brackets: sub makerx(); my $rxround = qr/ \( (?: (?> [^()] + ) | (??{ makerx }) ) * \) /ox; my $rxsquare = qr/ \[ (?: (?> [^\[\]] + ) | (??{ makerx }) ) * \] /ox; my $rxcurly = qr/ \{ (?: (?> [^{}] + ) | (??{ makerx }) ) * \} /ox; my $rxbalanced = qr/ $rxround | $rxsquare | $rxcurly /ox; sub makerx() { $rxbalanced; } # A regex to match a term in an 'assert' statement: # balanced text in some kind of bracket, or any text other than a comm +a or semicolon: my $rxterm = qr/ (?: $rxbalanced | (?> [^,;\(\{\x5B] +? # \x5B is a synonym for '[', w +hich confuses Kate's syntax-colouring :-( ) | 0 # Special case for 0 -- why is + this needed? ) +? /ox; # A regex to match one of the tokens that mark the end of an 'assert' +statement: my $rxend = qr/ ; | } | \b (?: if | unless | while | until | for ) \b /x; # A regex to match an entire 'assert' statement and its arguments # and to collect the arguments at the same time. # Unfortunately, constructs like /($foo)+/ match all instances of $foo + but only # capture the last one, and so we have to to devious things with embed +ded Perl # in order to both match and capture all arguments to the assertion in + a single # regex. my ($group, @args, $end); my $rxassert = qr/ (?{ $group = '', @args = () }) # Wipe our state so that, if t +he regex gives up # half-way through, the next a +ttempt doesn't # inherit a lot of spurious to +sh. (?> \b assert \b \s* # Match the 'assert' keyword. ) (?: (?> : \s* (\w+) \b \s* # Look for ':SOMEGROUP' (?{ $group = $^N }) # and save it if found. ) ) ? (?: ( $rxterm ) # Look for an argument to the +assertion, (?= \s* , ) # ensure that it's followed by + a comma before we save it, (?{ push @args, $^N }) # now save it, \s* , \s* # and then skip the comma that + we already know to be there. ) * # There can be zero or more te +rms that are followed by commas. ( $rxterm ) # Look for the final argument, (?= \s* $rxend ) # ensure that it's followed by + a terminator before we save it, (?{ push @args, $^N }) # and save it. \s* ( $rxend ) # Finally, save the terminator +. (?{ $end = $^N }) /sox;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://450732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-03-29 12:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found