Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Explain a regexp matched group result

by jdd (Acolyte)
on Oct 28, 2013 at 18:51 UTC ( #1060038=note: print w/replies, xml ) Need Help??


in reply to Re: Explain a regexp matched group result
in thread Explain a regexp matched group result

Whatever the values, I was expecting the insides $2, $3 and $4 to be equivalent to $1 after concatenation. Why is it not the case?
  • Comment on Re^2: Explain a regexp matched group result

Replies are listed 'Best First'.
Re^3: Explain a regexp matched group result
by ig (Vicar) on Oct 28, 2013 at 19:10 UTC

    From perlvar:

    $-[0] is the offset of the start of the last successful match.

    You have my $re1 = qr/((a+)?(b+)?(c))*/;

    Your outer capture group may be repeated zero or more times. In the case of your test string, "aacbbbcac", it matches three times:

    At the first repeat, it matches "aac" with $1 being "aac", $2 being "aa", $3 being undef (not matched) and $4 being "c", but because of the repeat count it doesn't stop there, so you never see these values.

    At the second repeat $1 is "bbbc", $2 remains "aa" (group (a+) didn't match in this repeat but $2 is the 'last successful match'), $3 is "bbb" and $4 is "c", but it doesn't stop there, so you don't see these values either.

    The third and final repeat sets $1 to "ac", $2 to "a" leaves $3 as it was at the last successful match (i.e. "bbb") and sets $4 to "c".

    So, the issue is that the capture groups return the last successful match rather than the last match or failure as the case may be.

      Thank you very much. So I cannot get individuals without removing the repetition and do a while ()... i.e.
      use strict; use warnings FATAL => 'all'; my $string = "aacbbbcac"; my $re1 = qr/((a+)?(b+)?(c))/; while ($string =~ /$re1/g) { foreach (0..$#-) { printf "Group %d: <%s>\n", $_, defined($-[$_]) ? substr($string, $ +-[$_], $+[$_] - $-[$_]) : ''; } print "\n"; }
      Bad news for me, but again mega thanks for your very clear answer. ps:
      Group 0: <aac> Group 1: <aac> Group 2: <aa> Group 3: <> Group 4: <c> Group 0: <bbbc> Group 1: <bbbc> Group 2: <> Group 3: <bbb> Group 4: <c> Group 0: <ac> Group 1: <ac> Group 2: <a> Group 3: <> Group 4: <c>

        I think your conclusion is correct in general, but if you know the structure of the RE then there are workarounds to the way the capture groups work. Consider:

        #!C:/strawberry/perl/bin/perl.exe # use strict; use warnings; my $string = "aacbbbcac"; my $re1 = qr/((a+)?(b+)?(c))*/; #my $re1 = qr/((a*)(b*)(c))*/; #my $re1 = qr/((a+)?(b*)(c))*/; if ($string =~ $re1) { my $start = 0; my @something; foreach (0..$#-) { if(defined($-[$_])) { $start = $-[$_] if($-[$_] > $start); if($-[$_] >= $start) { printf "Group %d: <%s>\n", $_, substr($string, $-[$_], $+[ +$_] - $-[$_]); $something[$_] = substr($string, $-[$_], $+[$_] - $-[$_]); } else { printf "Group %d: <%s> - but ignore it because it is from +a previous iteration of the outer capture group\n", $_, substr($strin +g, $-[$_], $+[$_] - $-[$_]); $something[$_] = ''; } } else { printf "Group %d: hasn't matched yet\n", $_; $something[$_] = ''; } } print "$1 = " . join('', @something[2..4]) . "\n"; }

        Which produces

        Group 0: <aacbbbcac> Group 1: <ac> Group 2: <a> Group 3: <bbb> - but ignore it because it is from a previous iteration + of the outer capture group Group 4: <c> ac = ac

        If you are trying to write something that handles arbitrary REs, this approach is unlikely to work.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1060038]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2020-10-27 13:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (256 votes). Check out past polls.

    Notices?