my $string = "hello world ballloon";
my $regex = qr/ll/;
my @matches;
push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g;
print "@$_\n" for @matches;
Note the care given to find overlapping matches.
| [reply] [d/l] |
use re 'eval';
my $string = "hello world ballloon";
my $regex = qr/ll/;
my @matches;
() = $string =~ /(?=$regex)(?{push @matches, pos})./g;
print join "\n", @matches;
__END__
2
14
15
| [reply] [d/l] |
| [reply] [d/l] |
my $check = 'abcdefghijk';
my (@matches_start, @matches_end);
for (0 .. 9) {
$check =~ /efg/;
push @matches_start, $-[0];
push @matches_end, $+[0];
}
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
Note: All code is untested, unless otherwise stated
| [reply] [d/l] |
I'm tired. My brain hurts. There must be something I've overlooked in perldoc perlvar. Can anyone explain the difference, if there is one, between merlyn's solution
push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g;
and this, which I came up with myself (heavily adapted, but... :):
push @matches, [$-[0], $+[1]] while $string =~ /(?=($regex))/g;
thx
dave
| [reply] [d/l] [select] |
Since the matched $1 is located at the beginning of the matched substring, $-[0] (left of whole match) and $-[1] (left of $1) have the same value. However, since the lookahead assertion doesn't count in the total width, the whole matched string thus has length zero, $+[0] == $-[0], while $1 inside the lookahead extends further to the right, thus $+[1] > $-[1].
I can imagine this explanation is a bit abstract, so I'll give a different example. Here I'm trying to match an uppercase letter that is the first of a sequence of at least 3 letters, including itself.
$_ = 'Ab1Cd2eFg3Hijk4LMN';
/((?=([a-zA-Z]{3,}))[A-Z])/ or die "No match";
print <<"END";
Whole match: \$&: '$&' $-[0] upto $+[0]
Parens around everything matched: \$1: '$1' $-[1] upto $+[1]
Lookahead matched: \$2: '$2' $-[2] upto $+[2]
END
Resulting in:
Whole match: $&: 'H' 10 upto 11
Parens around everything matched: $1: 'H' 10 upto 11
Lookahead matched: $2: 'Hijk' 10 upto 14
As you can see, the parens around everything matched the same as the whole match itself. The lookahead extends beyond that, and even though you can capture what it matched, it doesn't count for the length. The first thing after the lookahead is still the same thing as the first thing of the whole match.
| [reply] [d/l] [select] |