http://qs321.pair.com?node_id=71932

athomason has asked for the wisdom of the Perl Monks concerning the following question:

This question's title is perhaps a bit deceptive and/or confusing, because I'm not entirely sure how to classify this problem. For background, the purpose of this script is to match a user-specified pattern against a wordlist, but with a twist. The caveat is magic behavior for the any-character-matching-symbol '.'. Any periods in the pattern should substitute to a previously entered character class (e.g. [prl]). Now, if this same class were substituted for each period, the same letter could potentially match in more than one spot. The question is how to make this list act like a Scrabble bag: once a character is used, it shouldn't be used for a subsequent match. Clearly, this isn't a typical regexp feature, but I'm wondering if it's possible to combine other features for similar functionality.

I'll give an example for clarification. Say I receive the user pattern ^.e..$ and the letter bag 'prl' once again. The straight substitution would give /^[prl]e[prl][prl]$/. But I'd like the regexp to match 'perl' and 'lerp' without matching 'lerr' or 'pepp'. That is, matching a letter in any wildcard spot eliminates it from consideration in later wildcard matches.

The clean and simple solution would be the ability to perform set operations on character classes, e.g. /^[prl]e[{prl}-{\1}][{prl}-{\1\2}]$/. Sadly, those subtraction bits are made up, since I don't know of any real way to do this.

Right now I'm faking it by putting negative lookahead assertions for previous matches in front of subsequent instances of the character class. The example would then look like /^([prl])a(?!\1)([prl])(?!\1|\2)([prl])$/. However, this has (at least) two problems. First, the previously matched characters aren't actually removed from later classes, which is inefficient. Second, quantifiers don't work correctly, as /^([prl])e(?!\1)([prl])+$/ will incorrectly match on 'pell'. The second problem is obviously more bothersome than the first, but both are issues I'd like to resolve. An auxilliary problem with using character classes is that duplicates in the letter bag are immediately quashed. That's actually an even bigger problem for me, but the solution seems farther away.

So, is there a better way to do this? The only full solution I can think of would start using (??{}) assertions, and I'm not eager to start down the self-evaluating pattern path. I just re-read Camel 3's regexp chapter looking for the set operations or equivalent hack, but I can't find one. And sadly, I don't have a copy of Mastering Regular Expressions around for its ninja magic. As always, any insights would be appreciated.