It seems to test OK and I think the documentation is probably minimally acceptable.
The implementation is a bit messy because I'm storing the distinction between (?{...}?) and (?{...}) in the same way as that between (??{...}) and (?{...}) (i.e. an extra LOGICAL op in the complied regex) and that's probably not the right way (in either of those cases). This distinction should in the EVAL op but I didn't feel my understanding of the code was deep enough to attempt that.
Update: Use TRUE and FALSE with bool variable.
diff -u --recursive orig\perl-5.9.5/pod/perlre.pod perl-5.9.5/pod/perl
+re.pod
--- orig\perl-5.9.5\pod\perlre.pod Sat Jul 07 15:40:28 2007
+++ perl-5.9.5\pod\perlre.pod Mon Sep 03 09:59:35 2007
@@ -880,6 +880,8 @@
may be used instead of C<< \k<NAME> >> in Perl 5.10 or later.
=item C<(?{ code })>
+
+=item C<(?{ code }?)>
X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, c
+ode in>
B<WARNING>: This extended regular expression feature is considered
@@ -887,18 +889,24 @@
has side effects may not perform identically from version to version
due to the effect of future optimisations in the regex engine.
-This zero-width assertion evaluates any embedded Perl code. It
-always succeeds, and its C<code> is not interpolated. Currently,
-the rules to determine where the C<code> ends are somewhat convoluted
+.
+These zero-width assertions evaluate any embedded Perl code. In the
+form without the terminal C<?> the assertion always succeeds. The
+C<(?{ code }?)> form the assertion only succeeds if C<code> returns
+true. In either case its C<code> is not interpolated. Currently, the
+rules to determine where the C<code> ends are somewhat convoluted.
This feature can be used together with the special variable C<$^N> to
-capture the results of submatches in variables without having to keep
+restrict or save the results of submatches without having to keep
track of the number of nested parentheses. For example:
$_ = "The brown fox jumps over the lazy dog";
/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
print "color = $color, animal = $animal\n";
+ @known_animal{ qw( cat dog fox horse rabbit rat ) } = ();
+ @animals = /\b(\w++)(?{ exists $known_animal{$^N} }?)/g;
+ print "@animals\n";
+
Inside the C<(?{...})> block, C<$_> refers to the string the regular
expression is matching against. You can also use C<pos()> to know wha
+t is
the current position of matching within this string.
@@ -925,11 +933,12 @@
introduced value, because the scopes that restrict C<local> operators
are unwound.
-This assertion may be used as a C<(?(condition)yes-pattern|no-pattern
+)>
-switch. If I<not> used in this way, the result of evaluation of
-C<code> is put into the special variable C<$^R>. This happens
-immediately, so C<$^R> can be used from other C<(?{ code })> assertio
+ns
-inside the same regular expression.
+These assertions may be used as a
+C<(?(condition)yes-pattern|no-pattern)> switch and the two forms are
+equivalent when used in this way. If I<not> used in this way, the
+result of evaluation of C<code> is put into the special variable
+C<$^R>. This happens immediately, so C<$^R> can be used from other
+C<(?{ code })> assertions inside the same regular expression.
The assignment to C<$^R> above is properly localized, so the old
value of C<$^R> is restored if the assertion is backtracked; compare
diff -u --recursive orig\perl-5.9.5/regcomp.c perl-5.9.5/regcomp.c
--- orig\perl-5.9.5/regcomp.c Sat Jul 07 15:40:26 2007
+++ perl-5.9.5/regcomp.c Sun Sep 02 22:48:48 2007
@@ -5690,6 +5690,7 @@
U32 n = 0;
char c;
char *s = RExC_parse;
+ bool is_failable = FALSE;
RExC_seen_zerolen++;
RExC_seen |= REG_SEEN_EVAL;
@@ -5704,6 +5705,12 @@
count--;
RExC_parse++;
}
+
+ if (!is_logical && *RExC_parse == '?') {
+ /* (?{...}?) but not (??{...}?) */
+ is_failable = TRUE;
+ RExC_parse++;
+ }
if (*RExC_parse != ')') {
RExC_parse = s;
vFAIL("Sequence (?{...}) not terminated or not {}-balance
+d");
@@ -5711,7 +5718,7 @@
if (!SIZE_ONLY) {
PAD *pad;
OP_4tree *sop, *rop;
- SV * const sv = newSVpvn(s, RExC_parse - 1 - s);
+ SV * const sv = newSVpvn(s, RExC_parse - 1 - is_failable
+- s);
ENTER;
Perl_save_re_context(aTHX);
@@ -5742,10 +5749,14 @@
}
nextchar(pRExC_state);
- if (is_logical) {
+ if (is_logical || is_failable ) {
+ /* (??{...}} or (?{...}?) but, if we could manage
+,
+ not (?(?{ ... }?)...). That's rather difficult
+ so instead work round it in regmatch.
+ -- nobull */
ret = reg_node(pRExC_state, LOGICAL);
if (!SIZE_ONLY)
- ret->flags = 2;
+ ret->flags = is_logical ? 2 : 3;
REGTAIL(pRExC_state, ret, reganode(pRExC_state, E
+VAL, n));
/* deal with the length of this later - MJD */
return ret;
@@ -5767,6 +5778,9 @@
ret = reg_node(pRExC_state, LOGICAL);
if (!SIZE_ONLY)
ret->flags = 1;
+ /* Need somehow to tell /(?{...)?)/ that it
+ should behave like /(?{...})/ as part of
+ if construct -- nobull *
+/
REGTAIL(pRExC_state, ret, reg(pRExC_state, 1,
+ &flag,depth+1));
goto insert_if;
}
diff -u --recursive orig\perl-5.9.5/regexec.c perl-5.9.5/regexec.c
--- orig\perl-5.9.5/regexec.c Sat Jul 07 15:40:26 2007
+++ perl-5.9.5/regexec.c Sun Sep 02 21:19:53 2007
@@ -2713,6 +2713,7 @@
0: (?{...})
1: (?(?{...})X|Y)
2: (??{...})
+ 3: (?{...}?)
or the following IFMATCH/UNLESSM is:
false: plain (?=foo)
true: used as a condition: (?(?=foo))
@@ -3669,9 +3670,13 @@
PL_op = oop;
PAD_RESTORE_LOCAL(old_comppad);
PL_curcop = ocurcop;
- if (!logical) {
- /* /(?{...})/ */
+ if (!logical || logical == 3 ) {
+ /* /(?{...})/ or /(?{...}?) */
sv_setsv(save_scalar(PL_replgv), ret);
+ if ( logical && !SvTRUE(ret) ) {
+ /* (?{...}?/ assertion failed */
+ sayNO;
+ }
break;
}
}
@@ -3875,7 +3880,12 @@
}
break;
case LOGICAL:
- logical = scan->flags;
+ /* (?(?{...}?)...) will insert a second redundant LOGICAL
+ op which we here ignore. This is messy, sorry
+ --nobull */
+ if ( !logical ) {
+ logical = scan->flags;
+ }
break;
/*******************************************************************
diff -u --recursive orig\perl-5.9.5/t/op/re_tests perl-5.9.5/t/op/re_t
+ests
--- orig\perl-5.9.5/t/op/re_tests Sat Jul 07 15:40:24 2007
+++ perl-5.9.5/t/op/re_tests Mon Sep 03 08:57:01 2007
@@ -1338,3 +1338,10 @@
.*\z foo\n y - -
^(?:(\d)x)?\d$ 1 y ${\(defined($1)?1:0)} 0
.*?(?:(\w)|(\w))x abx y $1-$2 b-
+x(??{1}?) x c - Sequence (?{...}) not terminated
+x(?{1}?) x y $& x
+x(?{0}?) x n - -
+(?(?{1}?)x|y) x y $& x
+(?(?{0}?)x|y) x n - -
+(?(?{1}?)y|x) x n - -
+(?(?{0}?)y|x) x y $& x
\ No newline at end of file