Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Explain a regexp matched group result

by jdd (Acolyte)
on Oct 28, 2013 at 18:10 UTC ( #1060027=perlquestion: print w/replies, xml ) Need Help??

jdd has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I would to know why the following regexp:
use strict; use warnings FATAL => 'all'; my $string = "aacbbbcac"; my $re1 = qr/((a+)?(b+)?(c))*/; if ($string =~ $re1) { foreach (0..$#-) { printf "Group %d: <%s>\n", $_, substr($string, $-[$_], $+[$_] - $- +[$_]); } }
produces this result:
Group 0: <aacbbbcac> Group 1: <ac> Group 2: <a> Group 3: <bbb> Group 4: <c>
i.e. $3 is not empty.

Guessing this is normal, may I ask also if it is possible, with the same regexp, to get a coherent result i.e. $1 equal to $something2.$something3.$something4 ?

ps: some readers may find that I also posted this question to google perl community. I want to keep the original regexp as is. Note that I envisaged to do to perlreapi it this is impossible to have $3 empty with standard perl - if perlreapi permits this manipulation (!?)

Replies are listed 'Best First'.
Re: Explain a regexp matched group result
by ww (Archbishop) on Oct 28, 2013 at 19:22 UTC

    Edumacation (I hope, anyway) by example :-)

    #!/usr/bin/perl use 5.016; use warnings; # 1060027 my $string = "aacbbbcac"; my $re1 = qr/((a+)?(b+)?(c))*/; my $re2 = qr/(a.*?)(b.+?)(c)(.*)/; # ADDED ON SUSPICION THAT OP'S REG +EX FU # MIGHT BE INCREMENTED BY THIS EXA +MPLE if ($string =~ $re1) { say "Here are the captures: $1, $2, $3)\n"; # ADDED line foreach (0..$#-) { printf "Group %d: <%s>\n", $_, substr($string, $-[$_], $+[$_] - $- +[$_]); } } say "\n"; if ( $string =~ $re2 ) { say "USING \$re2, here are the captures \$1..\$4: $1, $2, $3, $4)\ +n"; foreach (0..$#-) { printf "Group %d: <%s>\n", $_, substr($string, $-[$_], $+[$_] - $- +[$_]); } } =head #ONLY the "ADDED" lines are changed from OP's original Here are the captures: ac, a, bbb) Group 0: <aacbbbcac> Group 1: <ac> Group 2: <a> Group 3: <bbb> Group 4: <c> USING $re2, here are the captures $1..$4: aac, bbb, c, ac) Group 0: <aacbbbcac> Group 1: <aac> Group 2: <bbb> Group 3: <c> Group 4: <ac> =cut
      Thank you very much to you too, now I understand -;
Re: Explain a regexp matched group result
by Laurent_R (Canon) on Oct 28, 2013 at 18:46 UTC

    Could you please explain what exactly you expect to get as a result from your example string?

      Whatever the values, I was expecting the insides $2, $3 and $4 to be equivalent to $1 after concatenation. Why is it not the case?

        From perlvar:

        $-[0] is the offset of the start of the last successful match.

        You have my $re1 = qr/((a+)?(b+)?(c))*/;

        Your outer capture group may be repeated zero or more times. In the case of your test string, "aacbbbcac", it matches three times:

        At the first repeat, it matches "aac" with $1 being "aac", $2 being "aa", $3 being undef (not matched) and $4 being "c", but because of the repeat count it doesn't stop there, so you never see these values.

        At the second repeat $1 is "bbbc", $2 remains "aa" (group (a+) didn't match in this repeat but $2 is the 'last successful match'), $3 is "bbb" and $4 is "c", but it doesn't stop there, so you don't see these values either.

        The third and final repeat sets $1 to "ac", $2 to "a" leaves $3 as it was at the last successful match (i.e. "bbb") and sets $4 to "c".

        So, the issue is that the capture groups return the last successful match rather than the last match or failure as the case may be.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1060027]
Approved by Laurent_R
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2020-10-21 08:06 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (212 votes). Check out past polls.