Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

positions of all regexp matches

by glwtta (Hermit)
on Oct 14, 2003 at 17:35 UTC ( [id://299194]=perlquestion: print w/replies, xml ) Need Help??

glwtta has asked for the wisdom of the Perl Monks concerning the following question:

Ok, this should be trivial but I haven't been able to find an asnwer.

As I understand this, $-[0] and $+[0] will give me the start and end of the last match in a regexp, so how do I get a list for all of the matches?

Replies are listed 'Best First'.
•Re: positions of all regexp matches
by merlyn (Sage) on Oct 14, 2003 at 17:41 UTC
    my $string = "hello world ballloon"; my $regex = qr/ll/; my @matches; push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g; print "@$_\n" for @matches;
    Note the care given to find overlapping matches.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Look, ma, no loops!
      use re 'eval'; my $string = "hello world ballloon"; my $regex = qr/ll/; my @matches; () = $string =~ /(?=$regex)(?{push @matches, pos})./g; print join "\n", @matches; __END__ 2 14 15

      --

      flounder

Re: positions of all regexp matches
by hardburn (Abbot) on Oct 14, 2003 at 17:40 UTC

    You would have to maintain your own list as you go:

    my $check = 'abcdefghijk'; my (@matches_start, @matches_end); for (0 .. 9) { $check =~ /efg/; push @matches_start, $-[0]; push @matches_end, $+[0]; }

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: positions of all regexp matches
by Not_a_Number (Prior) on Oct 14, 2003 at 19:48 UTC

    I'm tired. My brain hurts. There must be something I've overlooked in perldoc perlvar. Can anyone explain the difference, if there is one, between merlyn's solution

    push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g;

    and this, which I came up with myself (heavily adapted, but... :):

    push @matches, [$-[0], $+[1]] while $string =~ /(?=($regex))/g;

    thx

    dave

      Since the matched $1 is located at the beginning of the matched substring, $-[0] (left of whole match) and $-[1] (left of $1) have the same value. However, since the lookahead assertion doesn't count in the total width, the whole matched string thus has length zero, $+[0] == $-[0], while $1 inside the lookahead extends further to the right, thus $+[1] > $-[1].

      I can imagine this explanation is a bit abstract, so I'll give a different example. Here I'm trying to match an uppercase letter that is the first of a sequence of at least 3 letters, including itself.

      $_ = 'Ab1Cd2eFg3Hijk4LMN'; /((?=([a-zA-Z]{3,}))[A-Z])/ or die "No match"; print <<"END"; Whole match: \$&: '$&' $-[0] upto $+[0] Parens around everything matched: \$1: '$1' $-[1] upto $+[1] Lookahead matched: \$2: '$2' $-[2] upto $+[2] END
      Resulting in:
      Whole match: $&: 'H' 10 upto 11
      Parens around everything matched: $1: 'H' 10 upto 11
      Lookahead matched: $2: 'Hijk' 10 upto 14
      

      As you can see, the parens around everything matched the same as the whole match itself. The lookahead extends beyond that, and even though you can capture what it matched, it doesn't count for the length. The first thing after the lookahead is still the same thing as the first thing of the whole match.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://299194]
Approved by gjb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-25 19:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found