XP is just a number PerlMonks

### Re^2: Explain a regexp matched group result

by jdd (Acolyte)
 on Oct 28, 2013 at 18:51 UTC ( #1060038=note: print w/replies, xml ) Need Help??

in reply to Re: Explain a regexp matched group result
in thread Explain a regexp matched group result

Whatever the values, I was expecting the insides \$2, \$3 and \$4 to be equivalent to \$1 after concatenation. Why is it not the case?
• Comment on Re^2: Explain a regexp matched group result

Replies are listed 'Best First'.
Re^3: Explain a regexp matched group result
by ig (Vicar) on Oct 28, 2013 at 19:10 UTC

From perlvar:

\$-[0] is the offset of the start of the last successful match.

You have my \$re1 = qr/((a+)?(b+)?(c))*/;

Your outer capture group may be repeated zero or more times. In the case of your test string, "aacbbbcac", it matches three times:

At the first repeat, it matches "aac" with \$1 being "aac", \$2 being "aa", \$3 being undef (not matched) and \$4 being "c", but because of the repeat count it doesn't stop there, so you never see these values.

At the second repeat \$1 is "bbbc", \$2 remains "aa" (group (a+) didn't match in this repeat but \$2 is the 'last successful match'), \$3 is "bbb" and \$4 is "c", but it doesn't stop there, so you don't see these values either.

The third and final repeat sets \$1 to "ac", \$2 to "a" leaves \$3 as it was at the last successful match (i.e. "bbb") and sets \$4 to "c".

So, the issue is that the capture groups return the last successful match rather than the last match or failure as the case may be.

Thank you very much. So I cannot get individuals without removing the repetition and do a while ()... i.e.
```use strict;
use warnings FATAL => 'all';

my \$string = "aacbbbcac";

my \$re1 = qr/((a+)?(b+)?(c))/;
while (\$string =~ /\$re1/g) {
foreach (0..\$#-) {
printf "Group %d: <%s>\n", \$_, defined(\$-[\$_]) ? substr(\$string, \$
+-[\$_], \$+[\$_] - \$-[\$_]) : '';
}
print "\n";
}
```Group 0: <aac>
Group 1: <aac>
Group 2: <aa>
Group 3: <>
Group 4: <c>

Group 0: <bbbc>
Group 1: <bbbc>
Group 2: <>
Group 3: <bbb>
Group 4: <c>

Group 0: <ac>
Group 1: <ac>
Group 2: <a>
Group 3: <>
Group 4: <c>

I think your conclusion is correct in general, but if you know the structure of the RE then there are workarounds to the way the capture groups work. Consider:

```#!C:/strawberry/perl/bin/perl.exe
#
use strict;
use warnings;

my \$string = "aacbbbcac";

my \$re1 = qr/((a+)?(b+)?(c))*/;
#my \$re1 = qr/((a*)(b*)(c))*/;
#my \$re1 = qr/((a+)?(b*)(c))*/;
if (\$string =~ \$re1) {
my \$start = 0;
my @something;
foreach (0..\$#-) {
if(defined(\$-[\$_])) {
\$start = \$-[\$_] if(\$-[\$_] > \$start);
if(\$-[\$_] >= \$start) {
printf "Group %d: <%s>\n", \$_, substr(\$string, \$-[\$_], \$+[
+\$_] - \$-[\$_]);
\$something[\$_] = substr(\$string, \$-[\$_], \$+[\$_] - \$-[\$_]);
} else {
printf "Group %d: <%s> - but ignore it because it is from
+a previous iteration of the outer capture group\n", \$_, substr(\$strin
+g, \$-[\$_], \$+[\$_] - \$-[\$_]);
\$something[\$_] = '';
}
} else {
printf "Group %d: hasn't matched yet\n", \$_;
\$something[\$_] = '';
}
}

print "\$1 = " . join('', @something[2..4]) . "\n";
}

Which produces

```Group 0: <aacbbbcac>
Group 1: <ac>
Group 2: <a>
Group 3: <bbb> - but ignore it because it is from a previous iteration
+ of the outer capture group
Group 4: <c>
ac = ac

If you are trying to write something that handles arbitrary REs, this approach is unlikely to work.

Create A New User
Node Status?
node history
Node Type: note [id://1060038]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2020-10-27 13:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favourite web site is:

Results (256 votes). Check out past polls.

Notices?