in reply to Using Look-ahead and Look-behind
The following is just not working. Basically, i want to match a value that has "equity",but NOT "private equity". The result must be items 1, 2, 4, 5. Please check this out:
my %hash = (
1 => 'equity, private equity',
2 => 'equity',
3 => 'private equity',
4 => 'private equity,equity',
5 => 'private equity, equity',
6 => 'equity,private equity',
7 => 'private equity',
8 => 'mutual funds',
9 => 'cds'
);
while (my ($k, $v) = each %hash) {
next unless $v =~ m/(?!private\s+)equity/;
printf("%d -> %s\n", $k, $v);
}
Re^2: Using Look-ahead and Look-behind
by Anonymous Monk on Jun 25, 2011 at 08:41 UTC
|
Hi, new questions go in Seekers Of Perl Wisdom because
Roy Johnson, whom you asked a question, hasn't been here in 6 weeks.
You used code tags and put your code in between, that is awesome :)
Welcome, see How do I post a question effectively?, Where should I post X?
The regex which is not working for you, contains A zero-width negative look-ahead assertion, and like perlre#(?!pattern) says
A zero-width negative look-ahead assertion. For example /foo(?!bar)/
matches any occurrence of "foo" that isn't followed by "bar". Note
however that look-ahead and look-behind are NOT the same thing. You cannot
use this for look-behind.
If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/
will not do what you want. That's because the (?!foo) is just saying that
the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
match. Use look-behind instead (see below).
So, use a look-behind
But, that probably won't work either, because you can't have variable length length lookbehind , so you need to use a fixed width lookbehind.
#!/usr/bin/perl --
use strict; use warnings;
use Test::More qw' no_plan ';
Main(@ARGV);
exit(0);
sub Main {
my @yesWant = (
'equity, private equity',
'equity',
'private equity,equity',
'private equity, equity',
'equity,private equity',
);
my @notWant = (
'private equity',
'private equity',
'mutual funds',
'cds',
);
for my $not ( @notWant ){
ok( (not TestEquity($not)), "not '$not'" );
}
for my $yes ( @yesWant ){
ok( TestEquity($yes), "yes '$yes'" );
}
}
sub TestEquity {
return 1 if $_[0] =~ m/(?<!private\s)equity/;
return 0;
}
__END__
$ prove -v pm911357.lookbehind.pl
pm911357.lookbehind.pl ..
ok 1 - not 'private equity'
ok 2 - not 'private equity'
ok 3 - not 'mutual funds'
ok 4 - not 'cds'
ok 5 - yes 'equity, private equity'
ok 6 - yes 'equity'
ok 7 - yes 'private equity,equity'
ok 8 - yes 'private equity, equity'
ok 9 - yes 'equity,private equity'
1..9
ok
All tests successful.
Files=1, Tests=9, 0 wallclock secs ( 0.06 usr + 0.01 sys = 0.08 CPU
+)
Result: PASS
If fixed width lookbehind doesn't work for you, simply do TWO tests | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Here's a solution that exactly matches the phrases specified in AnonyMonk's Re: Using Look-ahead and Look-behind post (which the code of Re^2: Using Look-ahead and Look-behind does not quite do), and also shows how to use the newfangled backtracking control verbs of 5.10 to emulate variable-width negative look-behind. Variable-width positive look-behind is emulated by 5.10's \K assertion.
Explanation:
-
Any 'equity' that is preceded by
- either a character that is not a comma or whitespace, or
- by the 'private' phrase
FAILS and is skipped over (this test has first precedence);
-
Otherwise, any 'equity' that is not followed by a comma that is then followed by any non-whitespace SUCCEEDS.
>perl -wMstrict -le
"use Test::More 'no_plan';
;;
for my $ar_vector (
[ YES => 'equity, private equity', ],
[ YES => 'equity', ],
[ no => 'private equity', ],
[ YES => 'private equity,equity', ],
[ YES => 'private equity, equity', ],
[ no => 'equity,private equity', ],
[ no => 'private equity', ],
[ no => 'mutual funds', ],
[ no => 'cds' ],
) {
my ($expected, $string) = @$ar_vector;
is match($string), $expected, qq{'$string'};
}
;;
sub match {
my ($string) = @_;
;;
my $char_not_comma_or_space = qr{ [^,\s] }xms;
my $private = qr{ private \s+ }xms;
return 'YES' if $string =~
m{ (?: $char_not_comma_or_space | $private) equity (*SKIP)(*FAIL)
|
equity (?! , \S)
}xms;
return 'no',
}
"
ok 1 - 'equity, private equity'
ok 2 - 'equity'
ok 3 - 'private equity'
ok 4 - 'private equity,equity'
ok 5 - 'private equity, equity'
ok 6 - 'equity,private equity'
ok 7 - 'private equity'
ok 8 - 'mutual funds'
ok 9 - 'cds'
1..9
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
/CCGG # Match starting at DNA sequence CCGG
(
(?:
(?!CCGG) # make sure we're not finding duplicates mid-stream
. # accept any character
)*? # any number of times BUT not greedily <====
)
AATT # and ending at AATT
/x;
versus
/CCGG
(
(?:
(?!CCGG)
.
){50,100}? # <====
)
AATT # and ending at AATT
/x;
This latter one does not have dupes of CCGG but does have dupes of AATT. The previous snippet has no dupes of either CCGG or AATT.
A follow-up: The following code snippet fixes my problem, and I have NO idea why! I tried it out of desperation
/CCGG
(
(?:
(?!AATT|CCGG) # <=============
. #
){50,100}? # Here the "?" is not required but I'm anal
) #
AATT #
/x;
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
Nice. Very nice! You nailed.
It's working. Thanks a bunch!
| [reply] [Watch: Dir/Any] |
|
sub TestEquity {
return 1 if $_[0] =~ m/(?<!private).*equity/;
return 0;
}
| [reply] [Watch: Dir/Any] [d/l] |
|
sub TestEquity {
return
$_[0] =~ m/private.*equity/ ? 0 :
$_[0] =~ m/equity/ ? 1 :
0
;
}
This could be slightly simplified if you can tolerate "" (empty string) as a false flag in addition to or in place of 0.
BTW: "I can't get it to work" is rarely helpful as a problem description. How about some input strings and actual versus expected output?
Update: Changed $_[0] =~ m/private.*equity/ to $_[0] =~ m/private/ because it makes more sense.
Update: ... and then changed it back to $_[0] =~ m/private.*equity/ because it actually makes even more sense that way! (sigh)
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I changed the sub TestEquity to allow for any text between Private and Equity, but I can't get it to work. What have I done wrong? Impossible to say, although the anomalous one makes a good point
| [reply] [Watch: Dir/Any] |
|
|