Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Another use for qr// is to break up unmanageably complex regular expressions into simpler, named, self-contained pieces. (There's a direct parallel here with subs, which do the same for 'ordinary' Perl code. In fact, you can consider a named regex to be just a function written with a funny-looking syntax: its input is a string and its output is either a Boolean value or one or more strings, depending on whether it captures anything.)

Here's an example from a code-filtering assertions module (yes, another one) that's not yet tested thoroughly enough to submit to CPAN:

# A set of regexen to match balanced text in round, square or # curly brackets: sub makerx(); my $rxround = qr/ \( (?: (?> [^()] + ) | (??{ makerx }) ) * \) /ox; my $rxsquare = qr/ \[ (?: (?> [^\[\]] + ) | (??{ makerx }) ) * \] /ox; my $rxcurly = qr/ \{ (?: (?> [^{}] + ) | (??{ makerx }) ) * \} /ox; my $rxbalanced = qr/ $rxround | $rxsquare | $rxcurly /ox; sub makerx() { $rxbalanced; } # A regex to match a term in an 'assert' statement: # balanced text in some kind of bracket, or any text other than a comm +a or semicolon: my $rxterm = qr/ (?: $rxbalanced | (?> [^,;\(\{\x5B] +? # \x5B is a synonym for '[', w +hich confuses Kate's syntax-colouring :-( ) | 0 # Special case for 0 -- why is + this needed? ) +? /ox; # A regex to match one of the tokens that mark the end of an 'assert' +statement: my $rxend = qr/ ; | } | \b (?: if | unless | while | until | for ) \b /x; # A regex to match an entire 'assert' statement and its arguments # and to collect the arguments at the same time. # Unfortunately, constructs like /($foo)+/ match all instances of $foo + but only # capture the last one, and so we have to to devious things with embed +ded Perl # in order to both match and capture all arguments to the assertion in + a single # regex. my ($group, @args, $end); my $rxassert = qr/ (?{ $group = '', @args = () }) # Wipe our state so that, if t +he regex gives up # half-way through, the next a +ttempt doesn't # inherit a lot of spurious to +sh. (?> \b assert \b \s* # Match the 'assert' keyword. ) (?: (?> : \s* (\w+) \b \s* # Look for ':SOMEGROUP' (?{ $group = $^N }) # and save it if found. ) ) ? (?: ( $rxterm ) # Look for an argument to the +assertion, (?= \s* , ) # ensure that it's followed by + a comma before we save it, (?{ push @args, $^N }) # now save it, \s* , \s* # and then skip the comma that + we already know to be there. ) * # There can be zero or more te +rms that are followed by commas. ( $rxterm ) # Look for the final argument, (?= \s* $rxend ) # ensure that it's followed by + a terminator before we save it, (?{ push @args, $^N }) # and save it. \s* ( $rxend ) # Finally, save the terminator +. (?{ $end = $^N }) /sox;

In reply to Re^3: Tokenizing and qr// <=> /g interplay by MarkusLaker
in thread Tokenizing and qr// <=> /g interplay by skyknight

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-25 13:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found