P is for Practical | |
PerlMonks |
Re: How to not match some words but match all the othersby fizbin (Chaplain) |
on Oct 19, 2005 at 13:47 UTC ( [id://501304]=note: print w/replies, xml ) | Need Help?? |
This looks like a job for negative lookahead! First, have perlre handy for this. Then:
This code will match only those words you want, and no others. Note the placement of the \b markers. If you remove the first, you'll end up matching "hen" if your string includes "then" (and similarly for "f" and "lse"). If you remove the second, you'll end up also excluding words that begin with "if", "then", and "else". But wait, there's more! You can also call perl code from inside the middle of your regular expression and use that to determine whether to match or not. Note that though this is more extensible, the negative lookahead solution is likely much faster.
What this means is that after each word is found, perl will look up the word in the %stopwords hash and, if found, will try to match qr/\B/ which can't match at the end of a word. (If the word isn't found in %stopwords, perl tries to match qr// which matches everywhere) Since this is arbitrary code in the (??{ ... }) block, you can do whatever you want, and aren't limited to a fixed set of stopwords. Just remember to return a regular expression that matches or doesn't match depending on what you need at that point.
--
In Section
Seekers of Perl Wisdom
|
|