Hi,
I have a problem that sounds very simple: if you found a hit within a webpage's source, how can you find out if this hit is part of a link, or more generally, within a
tag?
I tried the following code with finds the first string correctly, but also the second one that should not match.
#!/usr/bin/perl -w
use strict;
my @strings = (
'<a href="#1">HIT<b>MUST MATCH</b></a> <a href="#2">2</a>',
'<a href="#1">HIT<a name="MUST NOT MATCH">-</a> <a href="#2">2</a>'
);
foreach my $string (@strings) {
"" =~ /()/; # set $1 to false
$string =~ /
( # the whole match should be catches as $1
HIT # start at the first found "HIT"
(?= # begin of positive Lookahead
[^<]*? # zero or more characters other than "<"
(?! # start a negative Lookahead
<a[^>]*?> # an opening a tag is forbidden
)
.*?<\/a> # any characters until the closing a tag
) # end of positive Lookahead
.*? # catch all chars after HIT,
<\/a> # non-greedy until the first closing a tag
)
/x;
print "\nstring:\t$string\n";
if ($1) { print "match:\t$1\n" }
else { print "NO match!\n" }
}
I think the problem is related to my use of the negative Lookahead. Could anybody point me to the right direction?
Thanks,
rob.