positions of all regexp matches

glwtta has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
•Re: positions of all regexp matches by merlyn (Sage) on Oct 14, 2003 at 17:41 UTC
`my $string = "hello world ballloon"; my $regex = qr/ll/; my @matches; push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g; print "@$_\n" for @matches;` [download] Note the care given to find overlapping matches. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: •Re: positions of all regexp matches by flounder99 (Friar) on Oct 14, 2003 at 18:58 UTC
Look, ma, no loops! `use re 'eval'; my $string = "hello world ballloon"; my $regex = qr/ll/; my @matches; () = $string =~ /(?=$regex)(?{push @matches, pos})./g; print join "\n", @matches; __END__ 2 14 15` [download] -- flounder	[reply] [d/l]
•Re: Re: •Re: positions of all regexp matches by merlyn (Sage) on Oct 14, 2003 at 19:06 UTC
`() = $string =~ /(?=$regex)(?{push @matches, pos})./g;` [download] Oddly enough, that would find no matches if `$regex` began with `\n`. Sloppy coding. No need for that "." in there anyway. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re:^4 positions of all regexp matches by flounder99 (Friar) on Oct 14, 2003 at 19:57 UTC
•Re: Re:^4 positions of all regexp matches by merlyn (Sage) on Oct 14, 2003 at 20:04 UTC
Some notes below your chosen depth have not been shown here
Re: positions of all regexp matches by hardburn (Abbot) on Oct 14, 2003 at 17:40 UTC
You would have to maintain your own list as you go: `my $check = 'abcdefghijk'; my (@matches_start, @matches_end); for (0 .. 9) { $check =~ /efg/; push @matches_start, $-[0]; push @matches_end, $+[0]; }` [download] ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer Note: All code is untested, unless otherwise stated	[reply] [d/l]
Re: positions of all regexp matches by Not_a_Number (Prior) on Oct 14, 2003 at 19:48 UTC
I'm tired. My brain hurts. There must be something I've overlooked in `perldoc perlvar`. Can anyone explain the difference, if there is one, between merlyn's solution `push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g;` and this, which I came up with myself (heavily adapted, but... :): `push @matches, [$-[0], $+[1]] while $string =~ /(?=($regex))/g;` thx dave	[reply] [d/l] [select]
Re: Re: positions of all regexp matches by bart (Canon) on Oct 14, 2003 at 20:00 UTC
Since the matched `$1` is located at the beginning of the matched substring, `$-[0]` (left of whole match) and `$-[1]` (left of `$1`) have the same value. However, since the lookahead assertion doesn't count in the total width, the whole matched string thus has length zero, `$+[0] == $-[0]`, while `$1` inside the lookahead extends further to the right, thus `$+[1] > $-[1]`. I can imagine this explanation is a bit abstract, so I'll give a different example. Here I'm trying to match an uppercase letter that is the first of a sequence of at least 3 letters, including itself. `$_ = 'Ab1Cd2eFg3Hijk4LMN'; /((?=([a-zA-Z]{3,}))[A-Z])/ or die "No match"; print <<"END"; Whole match: \$&: '$&' $-[0] upto $+[0] Parens around everything matched: \$1: '$1' $-[1] upto $+[1] Lookahead matched: \$2: '$2' $-[2] upto $+[2] END` [download] Resulting in: Whole match: $&: 'H' 10 upto 11 Parens around everything matched: $1: 'H' 10 upto 11 Lookahead matched: $2: 'Hijk' 10 upto 14 As you can see, the parens around everything matched the same as the whole match itself. The lookahead extends beyond that, and even though you can capture what it matched, it doesn't count for the length. The first thing after the lookahead is still the same thing as the first thing of the whole match.	[reply] [d/l] [select]


Syntactic Confectionery Delight
	PerlMonks