http://qs321.pair.com?node_id=466016

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

a weekly paper here is printing a word jumble puzzle. was wondering if it's possible with regex to scan a dictionary file for matches. puzzle rules: you get nine letters, each letter can only be used once. example: "RTESAMCNA", the longest word = SACRAMENT. oh wise ones, how to construct a regex so that each letter (from the letter pool) can only be used once? this is not a homework question....i stopped doing homework in 1994.

Replies are listed 'Best First'.
Re: regex for word puzzle
by Zaxo (Archbishop) on Jun 13, 2005 at 01:05 UTC

    This is anagramming, and the nicest solution is well-known here. Split your string into characters, sort them, and rejoin them. Compare against a hash with keys made the same way from your dictionary or wordlist.

    my %dict = do { open my $fh, '<', '/usr/dict/words' or die $!; map {chomp; join( '', sort split //, lc $_), $_} <$fh>; }; my $canon = join '', sort split //, lc 'RTESAMCNA'; print $dict{$canon} if exists $dict{$canon};

    No regex or combinatorial excess needed.

    After Compline,
    Zaxo

Re: regex for word puzzle
by muba (Priest) on Jun 13, 2005 at 00:49 UTC
    What thou needest, fellow monk who shall rename unnamed and anonymous, shall be not a regular expression.

    No, instead you'll need some kind of a loop construct.
    I don't know how good your math is, but with a word of 9 letters, there are 9! = 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 combination. This is because at the first position there are nine possible letters. The second position has only 8 possible letters and so on.

    Your task will be to write a loop than can create all these combinations. And regex won't do that for you, AFAIK.




    "2b"||!"2b";$$_="the question"
    Besides that, my code is untested unless stated otherwise.
    One more: please review the article about regular expressions (do's and don'ts) I'm working on.

    magnum unum bovem audivisti
      I echo muba's comments above, and note that I have done a similar sort of program for yahoo word games and that kind of thing.

      I think it would be unfair to solve the problem for you.. because this is not a Perl programming issue so much (as muba pointed out with regard to using regexps to solve the problem) but more a program logic issue.

      I can assist with suggestions though. First you'll need to compile a dictionary of words into a format you can use for lookup. Secondly you'll need to convert the supplied word into something you can match against your compiled version of the english dictionary.

      If you're getting stuck, examine yourself. What thought processes do you use when attempting to solve such a puzzle? Are you scanning for letters? Are you comparing jumbled random bits to english words that you know? How do you solve it manually.. remember a computer usually can only do what we can do.. but faster and more consistently.

Re: regex for word puzzle
by davidrw (Prior) on Jun 13, 2005 at 02:27 UTC
    If you really wanted to use regex, you can use it to narrow the dictionary list before preformming one of the above (most notably Zaxo's) methods.
    my $jumble = 'RTESAMCNA'; open FILE, "/usr/share/dict/words"; my $n = length $jumble; my @possible_words = grep /^[$jumble]{$n}$/i, map {chomp; $_} <FILE> +; close FILE;
    This narrows (with the dictionary i have) the list for "RTESAMCNA" down to just 35 words. From there, you can use the lc/split/sort/join & hash method to get the final list (for example, it finds 'treatment', but clearly that should be excluded). Obvisouly the longer the jumble, the better off this initial pass is.. (And depends on the letter combinations, too)
      The hash isn't necessary unless you are trying to match multiple words. Just test each word that matches the pattern. This also has the advantage of handling multiple possible anagrams that could exist in the word file. e.g. meat->team->meta->mate. You could also add a simple string length check as the first filter.
Re: regex for word puzzle
by ank (Scribe) on Jun 13, 2005 at 01:40 UTC

    If you really want to go with regexes, this will give you a general idea of how it would be done:

    TPJ regex article by japhy

    -- ank

Re: regex for word puzzle
by planetscape (Chancellor) on Jun 13, 2005 at 06:21 UTC

      Argh. Should have read "almost exactly". Not my day for reading, writing, or typing...

      planetscape
Re: regex for word puzzle
by TedPride (Priest) on Jun 13, 2005 at 02:49 UTC
    Also of interest:

    estramacon = sacrament + o
    crassament = sacrament + s
    canaster = sacrament - m

    I thought it might be fun to write something to generate "near" variations as well. Does anyone know where I can get an official Scrabble word list, instead of the somewhat mediocre dictionary file I'm working with right now? I promise not to use it to cheat in Scrabble :)

      Several dictionary files, including a couple specifically for scrabble, can be found here.

      I have no idea about their quality.
Re: regex for word puzzle
by hv (Prior) on Jun 13, 2005 at 09:28 UTC

    I have a program that I use for this sort of thing, specifically written always to generate a single regular expression.

    The program is 'word', and it includes flags -a for "anagram", -u for "use any subset of the letters", and -D for "just show me the regexp".

    For an exact anagram, word -Da rtesamcna gives:

    /^(?:(?=.*e)(?=.*n)(?=.*c)(?=.*r)(?=.*a.*a)(?=.*m)(?=.*s)(?=.*t).{9,9} +)\z/oi

    That is: require each of the single letters, require a double "a", and require exactly 9 letters.

    For all subsets, word -Dua rtesamcna gives:

    /^(?:(?:([encrmst])(?!.*\1)|([a])(?!.*\2.*\2))+)\z/oi

    That is: require all matching letters to be from the list, and additionally don't allow more than one of any of [encrmst], nor more than two "a"s.

    (Note: normal use of the word program also requires wrap from the same place.)

    Hugo

      Assume your rack is rtesamcna. Assume also you have the list of valid words in TWL.txt. The following grep command will show you words formed using only letters in your rack, but it will use any number of the letters:

      grep -i '^[rtesamcna]*$' TWL.txt

      Not quite what you want. You don't want words with more than 1 R, or more than 2 A's. So we filter it out:

      grep -i '^[rtesamcna]*$' TWL.txt | grep -iEv 'r.*r|t.*t|e.*e|s.*s|a.*a.*a|m.*m|c.*c|n.*n'

      And finally, to show anything with at least 9 letters:

      grep -i '^[rtesamcna]*$' TWL.txt | grep -iEv 'r.*r|t.*t|e.*e|s.*s|a.*a.*a|m.*m|c.*c|n.*n' | grep ...... +...