This is a code I've trimmed down from
Data::CSel to demonstrate the problem I'm having:
package CSelTest;
use 5.020000;
use strict;
use warnings;
our $RE =
qr{
(?&ATTR_SELECTOR) (?{ $_ = $^R->[1] })
(?(DEFINE)
(?<ATTR_SELECTOR>
\[\s*
(?{ [$^R, []] })
(?&ATTR_SUBJECTS)
(?{
$^R->[0][1][0] = $^R->[1];
$^R->[0];
})
(?:
(
\s*=\s*|
#\s*!=\s*| # and so on
\s+eq\s+
#\s+ne\s+ # and so on
)
(?{
my $op = $^N;
$op =~ s/^\s+//; $op =~ s/\s+$//;
$^R->[1][1] = $op;
$^R;
})
(?:
(?&LITERAL_NUMBER)
(?{
$^R->[0][1][2] = $^R->[1];
$^R->[0];
})
)
)?
\s*\]
)
(?<ATTR_NAME>
[A-Za-z_][A-Za-z0-9_]*
)
(?<ATTR_SUBJECT>
(?{ [$^R, []] })
((?&ATTR_NAME))
(?{
push @{ $^R->[1] }, $^N;
$^R;
})
(?:
# attribute arguments
\s*\(\s*
(?{
$^R->[1][1] = [];
$^R;
})
(?:
(?&LITERAL_NUMBER)
(?{
push @{ $^R->[0][1][1] }, $^R->[1];
$^R->[0];
})
(?:
\s*,\s*
(?&LITERAL_NUMBER)
(?{
push @{ $^R->[0][1][1] }, $^R->[1];
$^R->[0];
})
)*
)?
\s*\)\s*
)?
)
(?<ATTR_SUBJECTS>
(?{ [$^R, []] })
(?&ATTR_SUBJECT)
(?{
push @{ $^R->[0][1] }, {
name => $^R->[1][0],
(args => $^R->[1][1]) x !!defined($^R->[1][1
+]),
};
$^R->[0];
})
)
(?<LITERAL_NUMBER>
(
-?
(?: 0 | [1-9]\d* )
(?: \. \d+ )?
(?: [eE] [-+]? \d+ )?
)
(?{ [$^R, 0+$^N] })
)
) # DEFINE
}x;
sub parse_csel {
state $re = qr{\A\s*$RE\s*\z};
local $_ = shift;
local $^R;
eval { $_ =~ $re } and return $_;
die $@ if $@;
return undef;
}
1;
This code tries to parse expression like [attr] or [attr=1] or [attr eq 1] which is similar to the CSS attribute selector.
% perl -I. -Ilib -MCSelTest -MData::Dump -E'dd( CSelTest::parse_csel(q
+{ [attr] }) )'
[[{ name => "attr" }]]
% perl -I. -Ilib -MCSelTest -MData::Dump -E'dd( CSelTest::parse_csel(q
+{ [attr=1] }) )'
[[{ name => "attr" }], "=", 1]
% perl -I. -Ilib -MCSelTest -MData::Dump -E'dd( CSelTest::parse_csel(q
+{ [attr eq 1] }) )'
[[{ name => "attr" }], "eq", 1]
No problem so far. Now, this code also recognizes the form [meth()] or [meth(1,2,3)] or [meth(1,2,3) = 1], which is recognizing an argument list after the attribute/method name. And this is where the problem happens:
% perl -I. -Ilib -MCSelTest -MData::Dump -E'dd( CSelTest::parse_csel(q
+{ [attr()] }) )'
[[{ args => [], name => "attr" }]]
% perl -I. -Ilib -MCSelTest -MData::Dump -E'dd( CSelTest::parse_csel(q
+{ [attr()=1] }) )'
[[{ args => [], name => "attr" }], "=", 1]
% perl -I. -Ilib -MCSelTest -MData::Dump -E'dd( CSelTest::parse_csel(q
+{ [attr() eq 1] }) )'
do {
my $a = [
[
{ args => [], name => "attr" }, # .[0]
{ args => 'fix', name => "attr" }, # .[1]
], # [0]
"eq", # [1]
1, # [2]
];
$a[0][1]{args} = $a[0][0]{args};
$a;
}
As you can see, if I use the eq operator, (which is recognized by \s+eq\s+ part in the regex, notice the \s+ instead of \s*) instead of the = operator (which is recognized by \s*=\s* part in the regex, notice the \s* instead of \s+), I'm getting a duplicated section in the result (marked by the # .[1] comment.
I'm using perl 5.22.1 but have tried 5.24.0 as well as 5.25.4, with the same results.
Any hints?
UPDATE 2016-09-10: I worked around this problem by setting and incrementing counter variable in specific places to detect the backtracking and using conditional to avoid my code being executed multiple times in the case of backtracking. Thanks to everyone who provided responses.