Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^4: problem with optional capture group

by Special_K (Monk)
on Dec 23, 2020 at 16:31 UTC ( [id://11125686]=note: print w/replies, xml ) Need Help??


in reply to Re^3: problem with optional capture group
in thread problem with optional capture group

m{ <div (?: (?! </div) .)+ (</div)? }xms)

Can you please give a brief explanation regarding how the above regex works? It seems to use a few constructs I've never seen before and searching Google for regex symbols doesn't work very well. In particular, is enclosing a regex in 'm()', as you have done above, equivalent to enclosing it in '//'? What is the trailing xms doing?

Replies are listed 'Best First'.
Re^5: problem with optional capture group
by AnomalousMonk (Archbishop) on Dec 23, 2020 at 20:54 UTC
    ... enclosing a regex in 'm()' ...

    The
       m open-delimiter pattern close-delimiter
    form is what I think of as the "canonical" form of the m// operator, where the delimiters can be a wide variety of characters including {} () <> [] matching braces. The // match form is a special case. Likewise the qr// s/// operators. This alleviates a lot of escape-ology connected with the / character in regexes. See perlop. (Note that q// qq// qx// qw// tr/// y/// and maybe some others also use this delimiter convention.)

    What is the trailing xms doing?

    I use the /ms modifiers as part of a standard "tail" on all my qr// m// s/// expressions to give the . ^ $ operators a standard | fixed behavior. This eliminates some degrees of freedom in regex behavior and makes them slightly easier to understand. The /x modifier in the standard tail enables the use of whitespace to help clarify a regex. See Modifiers in perlre.

    (?: (?! </div) .)+

    This has already been covered by GrandFather here. This expression just steps forward grabbing one character after another as long as that character is not a part of whatever matches the (?!...) negative lookahead expression, a closing div tag fragment in this case. A bit slow perhaps, but effective and flexible (update: flexible in that the lookahead expression can be of any complexity). See Lookaround Assertions in perlre; see also perlretut, perlreref and perlrequick.

    (</div)?

    Optionally capture a literal character sequence if it is present. The capture variable $1 (in this case) will hold the captured sequence if it was present, otherwise $1 will be undefined. See perlre, etc., as above.


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11125686]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-04-24 11:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found