Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: How do I reference repeated capture groups?

by kcott (Archbishop)
on Aug 13, 2022 at 02:40 UTC ( [id://11146134]=note: print w/replies, xml ) Need Help??


in reply to How do I reference repeated capture groups?

G'day TIOOWTDI,

Welcome to the Monastery.

You actually want to capture zero or more instances of '(\s*\d+\s*)'; i.e. '((?:\s*\d+\s*)*)'.

Your OP code:

$ perl -E ' my $re = qr{(\w+)(\s*\d+\s*)*}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "$&: $1 $2"; } ' a 1 2 3 : a 3 b 4 5 6: b 6

With fixed regex:

$ perl -E ' my $re = qr{(\w+)((?:\s*\d+\s*)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "$&: $1 $2"; } ' a 1 2 3 : a 1 2 3 b 4 5 6: b 4 5 6

Named captures don't change the regex logic. The start of capture groups changes from '(' to '(?<name>'; and, accessing values changes from '$N' to '$+{name}'.

$ perl -E ' my $re = qr{(?<letter>\w+)(?<digit>(?:\s*\d+\s*)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "$&: $+{letter} $+{digit}"; } ' a 1 2 3 : a 1 2 3 b 4 5 6: b 4 5 6

— Ken

Replies are listed 'Best First'.
Re^2: How do I reference repeated capture groups?
by kcott (Archbishop) on Aug 13, 2022 at 03:03 UTC

    It's not specifically part of your question; however, it occurs to me that you might not want to capture all that excess leading and trailing whitespace. Compare these:

    $ perl -E ' my $re = qr{(\w+)((?:\s*\d+\s*)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "|$&|: |$1| |$2|"; } ' |a 1 2 3 |: |a| | 1 2 3 | |b 4 5 6|: |b| | 4 5 6|
    $ perl -E ' my $re = qr{(\w+)\s+((?:\s*\d+)*)}; my $str = "a 1 2 3 b 4 5 6"; while ($str =~ /$re/g) { say "|$&|: |$1| |$2|"; } ' |a 1 2 3|: |a| |1 2 3| |b 4 5 6|: |b| |4 5 6|

    — Ken

Re^2: How do I reference repeated capture groups?
by Anonymous Monk on Aug 13, 2022 at 20:08 UTC
    from the original discussion on Reddit:

    Okay but I also just want a list of each of the matches so I can parse them separately. Think attributes in html, a tag can have multiple, and being able to handle each individually is useful

    ...

    that still doesn't give me

    (a => [1, 2, 3], b => [4, 5, 6]) where in my analogy the letters are the html tags and the numbers are the attributes

    your answer with "1 2 3" in a string has already been given multiple times.

    BTW: The OP's moniker is Timegazer not Onion

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11146134]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (None)
    As of 2024-04-25 04:04 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found