Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
A word challenge! I love word challenges! :)

First, I'll second grep's suggestion to indent your code properly. The lack of indenting discouraged me from understanding your code, and therefore from providing in-depth comments on your code. :(

The most important optimization is to realize that anagrams only need to be tested once, because it's the set of letters that matters, not their order. For example, starting with 'topsy', if you drop the 'y' the next word could be 'post', or 'stop', or 'tops', etc., but you only need to try one of those words.

I implemented this by putting the words in a hash where the key is the sorted letters in a word. For each set of letters, I keep only the first word found, just to save memory. I also push each set of letters onto an array, grouped by length.

Once that's done, I can start looking for add-a-grams, beginning with the longest words. The first step is to set up a stack of attempted solutions. Then, in a loop, I pop an attempt off the stack and try to branch the last set of letters. Testing each branch is as simple and as fast as looking for the set of letters in the hash. Testing the few possible branches and looking for a matching word is much, much faster than testing all the words and looking for a matching branch. (Okay, so maybe that's the most important optimization. :)

For each match, push an attempt onto the stack. I stuck in an additional optimization here; if none of the branches work, I remove the original set of letters from the stack. That way I won't waste time trying the same dead end later.

Once an attempt gets down to three letters, I print the solution. I continue finding solutions at the same length. However, I don't try the shorter lengths after that, since the object is to find the longest possible solution.

My script ran in a few minutes and found 6640 add-a-grams with a word of 16 letters. I'm sure the code could be cleaned up, but I should be getting to bed.

#!/usr/local/bin/perl -w use strict; $| = 1; use Getopt::Std; use vars qw($opt_w); getopts('w:') or die "usage: words.pl [-w <wordlist>] "; my $wordlist = $opt_w || 'wordlist'; open(DICT, $wordlist) or # open word list die "Unable to open $wordlist: $!\n"; my(%words, @letters); while (<DICT>) { chomp; my $letters = join '', sort split //; if (not exists $words{$letters}) { $words{$letters} = [$_, 1]; push @{ $letters[length] }, $letters; } else { # just for fun, keep track of the word count for each set of l +etters $words{$letters}[1]++; } } my(@solutions, @stack, $found); my $i = $#letters; while ($i > 3 and !$found) { print "Trying length $i.\n"; # try each word at this length for my $letters (@{ $letters[$i] }) { @stack = [ $letters ]; while (@stack) { # advance the next attempt my $test = pop @stack; if (length $test->[-1] == 3) { # found a solution! # note it, print it, go back to look for more $found = 1; print_solution($test); next; } # look for branches from the last set of letters my $continue; for my $branch (branch($test->[-1])) { if (exists $words{$branch}) { # found a branch! # note it and push it on the stack $continue = 1; push @stack, [ @$test, $branch ]; } } if (!$continue) { # no branches found; prune the dead end delete $words{ $test->[-1] }; } } } # all done with this length; go one shorter $i--; } # given a set of letters, return all possible branches # (set minus one letter) sub branch { my($letters) = @_; # get all possible branches my @branches = map { my $x = $letters; substr($x, $_, 1) = ''; $x +} 0 .. length($letters) - 1; # remove duplicate branches my %uniq; @uniq{@branches} = (); return sort keys %uniq; } sub print_solution { my($solution) = @_; for my $letters (@$solution) { print "$words{$letters}[0] ($words{$letters}[1]) "; } print "\n"; } __END__

In reply to Re: Add-A-Gram Performance by chipmunk
in thread Add-A-Gram Performance by smgfc

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2021-12-07 00:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (33 votes). Check out past polls.

    Notices?