Jeffrey Friedl talks a bit about this in Mastering Regular Expressions in the section called "A Warning About Embedded Code and my Variables" (page 338-339). His conclusion on the matter is that an embedded code construct is in fact a closure.
This means that using a lexical variable inside an embedded code construct in a regular expression binds the instance of the lexical variable in existence at the moment the regex is compiled to the regex. As far as I understand, this means that:
my $regex = qr /( # Start capture
\( # Start with '(',
(?: # Followed by
(?>[^()]+) # Non-parenthesis
|(??{ $regex }) # Or a balanced () block
)* # zero or more times
\) # Close capture
)/x; # Ending with ')'
with regard to liz' remark about lexicals at compile time, is "interpreted" (pardon the hand-waving) as:
my $regex = qr /( # Start capture
\( # Start with '(',
(?: # Followed by
(?>[^()]+) # Non-parenthesis
|(??{ undef }) # <-- Note! 'or undef'
)* # zero or more times
\) # Close capture
)/x; # Ending with ')'
which will match the innermost parentheses. We don't have any undefs in the target string, so how can this part of the construct match?
Trying to write this down in a sensible manner proved to be quite a challenge, so I apologize if the preceeding section is hard to understand. But to quote Friedl from the aforementioned section: "Warning: this section is not light reading." :o)
Hope this helps.
pernod
--
Mischief. Mayhem. Soap.
|