A word challenge! I love word challenges! :)
First, I'll second grep's suggestion to indent your code properly. The lack of indenting discouraged me from understanding your code, and therefore from providing in-depth comments on your code. :(
The most important optimization is to realize that anagrams only need to be tested once, because it's the set of letters that matters, not their order. For example, starting with 'topsy', if you drop the 'y' the next word could be 'post', or 'stop', or 'tops', etc., but you only need to try one of those words.
I implemented this by putting the words in a hash where the key is the sorted letters in a word. For each set of letters, I keep only the first word found, just to save memory. I also push each set of letters onto an array, grouped by length.
Once that's done, I can start looking for add-a-grams, beginning with the longest words. The first step is to set up a stack of attempted solutions. Then, in a loop, I pop an attempt off the stack and try to branch the last set of letters. Testing each branch is as simple and as fast as looking for the set of letters in the hash. Testing the few possible branches and looking for a matching word is much, much faster than testing all the words and looking for a matching branch. (Okay, so maybe that's the most important optimization. :)
For each match, push an attempt onto the stack.
I stuck in an additional optimization here; if none of the branches work, I remove the original set of letters from the stack. That way I won't waste time trying the same dead end later.
Once an attempt gets down to three letters, I print the solution. I continue finding solutions at the same length. However, I don't try the shorter lengths after that, since the object is to find the longest possible solution.
My script ran in a few minutes and found 6640 add-a-grams with a word of 16 letters. I'm sure the code could be cleaned up, but I should be getting to bed.
#!/usr/local/bin/perl -w
use strict;
$| = 1;
use Getopt::Std;
use vars qw($opt_w);
getopts('w:') or die "usage: words.pl [-w <wordlist>] ";
my $wordlist = $opt_w || 'wordlist';
open(DICT, $wordlist) or # open word list
die "Unable to open $wordlist: $!\n";
my(%words, @letters);
while (<DICT>) {
chomp;
my $letters = join '', sort split //;
if (not exists $words{$letters}) {
$words{$letters} = [$_, 1];
push @{ $letters[length] }, $letters;
} else {
# just for fun, keep track of the word count for each set of l
+etters
$words{$letters}[1]++;
}
}
my(@solutions, @stack, $found);
my $i = $#letters;
while ($i > 3 and !$found) {
print "Trying length $i.\n";
# try each word at this length
for my $letters (@{ $letters[$i] }) {
@stack = [ $letters ];
while (@stack) {
# advance the next attempt
my $test = pop @stack;
if (length $test->[-1] == 3) {
# found a solution!
# note it, print it, go back to look for more
$found = 1;
print_solution($test);
next;
}
# look for branches from the last set of letters
my $continue;
for my $branch (branch($test->[-1])) {
if (exists $words{$branch}) {
# found a branch!
# note it and push it on the stack
$continue = 1;
push @stack, [ @$test, $branch ];
}
}
if (!$continue) {
# no branches found; prune the dead end
delete $words{ $test->[-1] };
}
}
}
# all done with this length; go one shorter
$i--;
}
# given a set of letters, return all possible branches
# (set minus one letter)
sub branch {
my($letters) = @_;
# get all possible branches
my @branches = map { my $x = $letters; substr($x, $_, 1) = ''; $x
+}
0 .. length($letters) - 1;
# remove duplicate branches
my %uniq;
@uniq{@branches} = ();
return sort keys %uniq;
}
sub print_solution {
my($solution) = @_;
for my $letters (@$solution) {
print "$words{$letters}[0] ($words{$letters}[1]) ";
}
print "\n";
}
__END__
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.