Your code does several things more than you describe. In addition to removing (case-insensitive) duplicates, it also has some other criteria for elimination, which should not be inside the
foreach.
To duplicate your code functionally, we have to:
- Remove non-words (you missed a closing / in that pattern, by the way)
- Remove words shorter than 4 chars
- Remove words that appear more than once (your technique suggests that the list is sorted)
- Remove words that appear in @exclude
Ok, here we go!
# given @words and @exclude
my %seen;
@seen{@exclude} = (1) x @exclude;
@words = grep {
(! /\W/) and (length() >= 4) and ($seen{$_}++ == 0)
} @words;
We pre-load
%seen with a flag for each word in
@exclude. Then, as we're looking through
@words itself, we mark each element as seen as well.
The PerlMonk tr/// Advocate