Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Add-A-Gram Performance

by smgfc (Monk)
on Feb 05, 2002 at 05:05 UTC ( [id://143357]=perlquestion: print w/replies, xml ) Need Help??

smgfc has asked for the wisdom of the Perl Monks concerning the following question:

Yesterday, while going through the daily (read: hourly) routine of slashdot appreciation, I noticed a banner add for ItaSoftware (this isnt a plug) and this add contained information of a challenge. Upon visiting their site I realized it was really just a way to evaluate anyone applying for a job, but decided to do one the challenges for fun. I chose Add-A-Grams, where you start with a three letter word, and add a letter to create another word. The challenge was to find the longest in their 1.6 meg dictionary. After evaluating problem I can to the conclusion that I should start from the largest word and work my way down, anyway here is my sparsely documented code, that I will explain a little. I am really looking for performance tips. The last feature, writing the add-a-grams to a file, i havent tested, because it took an hour to run on my old mac, my pc is having nine heart attacks, and the new mac wont arrive for way too long, so if ther e is a problem there really sorry!
$dict="<untitled:Desktop Folder:WORD.LST"; $adda=">untitled:Desktop Folder:addagram.lst"; ####### # open the dictionary and create an array of the words ####### open (dict, $dict) or die ("Can't open $dict: $!\n"); while(defined($line=readline(*dict))) { chomp $line; push @words, $line; } close (dict); ####### #create a hash of array, where the key is the length of each element i +n the array ####### foreach $word (@words) { push @{ $wordpl{length($word)} }, $word; } ####### #extract the lengths and sort them ####### foreach $k (keys %wordpl ) { push @lengths, $k; } @lengths = sort {$b <=> $a} @lengths; ####### #do something for every length ####### foreach $l (@lengths) { word(); } ####### #do something with every word of every length ####### sub word { my $word; foreach $word ( @{ $wordpl{$l} } ) { $oword=$word; #define original word my $p=$l-1; word1($p, $word); } } ####### #see if any word from one array of lengths down differs by only one ch +aracter ####### sub word1 { my $p=@_[0]; my @a=split(//,@_[1]); my $sword; WO: foreach $sword ( @{ $wordpl{$p} } ) { my @b=split(//,$sword); my %seen = (); my @aonly = (); my $item; @seen{@b}=(); foreach $item (@a) { push (@aonly, $item) unless exists $seen{$item}; } if (length(@aonly)==1) { #if the difference between the two strings is + one character if ($p==3) { push @addw, $oword; #if the length of the string is three, store the o +riginal word as a add-a-gram last WO; #return to word() } $p--; word2($p, $sword); #If the length of the string is >3 but differs by o +ne go to word2 for another pass last WO; #upon exiting word2 start a new "original word" } } } ####### #see if any word from one array of lengths down differs by only one ch +aracter ####### sub word2 { my $p=@_[0]; my @a=split(//,@_[1]); my $sword; WT: foreach $sword ( @{ $wordpl{$p} } ) { @b=split(//,$sword); %seen = (); @aonly = (); @seen{@b}=(); foreach $item (@a) { push (@aonly, $item) unless exists $seen{$item}; } if (length(@aonly)==1) { #if the difference between the two strings is + one character if ($p==3) { push @addw, $oword; #if the length of the string is three, store the o +riginal word as a add-a-gram last WT; #return to word() } $p--; word1($p, $sword); #If the length of the string is >3 but differs by o +ne go to word1 for another pass last WT; #upon exiting word1 start a new "original word" } } } ####### #find the longest add-a-gram ####### $lo=""; foreach $i (@addw) { if ( length($lo)<length($i) ) { $lo=$i; } } ####### #print all anagrams to a file and signify the longest ###### open (add, $adda) or die ("Cant open $adda: $!\n"); foreach $i (@addw) { if ($i ne $lo) { print add "$i\n"; } else { print "longest: $i\n"; } } close (add);
Essentially it reads all the words, sorts them into arrays where each word has the same length, puts the arrays into a hash with the keys being the lengths of the elements in the array. Then it starts with the largest words, and takes a pass looking for a word which is one character smaller, and differs by one character. Another pass starts, doing the same thing, but to the new word. I hope you can understand it because I havent explained it well, documented it well, or named variable well, but any performance tips would be great, or even just ways to make the code a little shorter! Thanks again!

Replies are listed 'Best First'.
Re: Add-A-Gram Performance
by chipmunk (Parson) on Feb 05, 2002 at 06:19 UTC
    A word challenge! I love word challenges! :)

    First, I'll second grep's suggestion to indent your code properly. The lack of indenting discouraged me from understanding your code, and therefore from providing in-depth comments on your code. :(

    The most important optimization is to realize that anagrams only need to be tested once, because it's the set of letters that matters, not their order. For example, starting with 'topsy', if you drop the 'y' the next word could be 'post', or 'stop', or 'tops', etc., but you only need to try one of those words.

    I implemented this by putting the words in a hash where the key is the sorted letters in a word. For each set of letters, I keep only the first word found, just to save memory. I also push each set of letters onto an array, grouped by length.

    Once that's done, I can start looking for add-a-grams, beginning with the longest words. The first step is to set up a stack of attempted solutions. Then, in a loop, I pop an attempt off the stack and try to branch the last set of letters. Testing each branch is as simple and as fast as looking for the set of letters in the hash. Testing the few possible branches and looking for a matching word is much, much faster than testing all the words and looking for a matching branch. (Okay, so maybe that's the most important optimization. :)

    For each match, push an attempt onto the stack. I stuck in an additional optimization here; if none of the branches work, I remove the original set of letters from the stack. That way I won't waste time trying the same dead end later.

    Once an attempt gets down to three letters, I print the solution. I continue finding solutions at the same length. However, I don't try the shorter lengths after that, since the object is to find the longest possible solution.

    My script ran in a few minutes and found 6640 add-a-grams with a word of 16 letters. I'm sure the code could be cleaned up, but I should be getting to bed.

    #!/usr/local/bin/perl -w use strict; $| = 1; use Getopt::Std; use vars qw($opt_w); getopts('w:') or die "usage: words.pl [-w <wordlist>] "; my $wordlist = $opt_w || 'wordlist'; open(DICT, $wordlist) or # open word list die "Unable to open $wordlist: $!\n"; my(%words, @letters); while (<DICT>) { chomp; my $letters = join '', sort split //; if (not exists $words{$letters}) { $words{$letters} = [$_, 1]; push @{ $letters[length] }, $letters; } else { # just for fun, keep track of the word count for each set of l +etters $words{$letters}[1]++; } } my(@solutions, @stack, $found); my $i = $#letters; while ($i > 3 and !$found) { print "Trying length $i.\n"; # try each word at this length for my $letters (@{ $letters[$i] }) { @stack = [ $letters ]; while (@stack) { # advance the next attempt my $test = pop @stack; if (length $test->[-1] == 3) { # found a solution! # note it, print it, go back to look for more $found = 1; print_solution($test); next; } # look for branches from the last set of letters my $continue; for my $branch (branch($test->[-1])) { if (exists $words{$branch}) { # found a branch! # note it and push it on the stack $continue = 1; push @stack, [ @$test, $branch ]; } } if (!$continue) { # no branches found; prune the dead end delete $words{ $test->[-1] }; } } } # all done with this length; go one shorter $i--; } # given a set of letters, return all possible branches # (set minus one letter) sub branch { my($letters) = @_; # get all possible branches my @branches = map { my $x = $letters; substr($x, $_, 1) = ''; $x +} 0 .. length($letters) - 1; # remove duplicate branches my %uniq; @uniq{@branches} = (); return sort keys %uniq; } sub print_solution { my($solution) = @_; for my $letters (@$solution) { print "$words{$letters}[0] ($words{$letters}[1]) "; } print "\n"; } __END__
      Thanks alot. I took most of grep's issues to heart, but extreme incompitance prevented me from realizing that when you pass values to a subroutine they are passed by reference so passing a reference to $l and %wordspl would not really slow anything down. It will take a little time for me to really parse your code, but thanks alot. I did post a better formatted and documented version of my program as a response to grep, if you want to have a look at it. Thanks again!
Re: Add-A-Gram Performance
by grep (Monsignor) on Feb 05, 2002 at 05:35 UTC
    Well I think you are getting into premature optimization. I would focus on cleaning up your code and making it more maintable before you look at speeding it up.

    Right off the bat your (lack of) indenting makes it almost impossible to follow your code (Thanks emacs and perl-mode).

    If you had warnings turned on you would see that you are using single element array slices (bad ju-ju). You are not using strict.

    You break some (IMO) important idioms. Particularly using upper case for filehandles (DICT). This would also turn off some bareword warnings. Use $_. It makes your code more readable, sort of like pronouns.

    Please do not mix you main program logic with your subroutines. This makes it very hard to follow.

    You should also pass you vars into subroutines and avoid using globals in your subs.
    sub word { my $word; foreach $word ( @{ $wordpl{$l} } ) { my $oword=$word; #define original word my $p=$l-1; word1($p, $word); } }
    Here $l and %wordpl are in the main package not passed into your sub.

    I would also recommend reading perlstyle. It goes into more detail than I willing to go into in this post.

    As a final note - these suggestions not only help you debug your code, but it helps other PM's read your code so they can offer better advice.

    grep
    grep> chown linux:users /world
      Sorry about all the problems, I really am a newbie. I reformatted the code and the comments, I hope this helps. I didn't think I was using any globals besides $l, and I hadn't planed to, but didnt want to pass it every time i jumped between subroutines. I dont really understand what you mean by single element arrays slices, if you would go into a litttle more detail that would be great. Here is the updated code (mostly superfical updates):
      use Strict; $dict="<untitled:Desktop Folder:WORD.LST"; $adda=">untitled:Desktop Folder:addagram.lst"; #### #Open the dictionary file and read each word into @words #### open (DICT, $dict) or die ("Can't open $dict: $!\n"); while(defined($line=readline(*DICT))) { chomp $line; push @words, $line; } close (DICT); #### #Create a hash (%wordpl) of arrays, where each element in each array i +s #the same length, and the keys represent those lengths #### foreach $word (@words) { push @{ $wordpl{length($word)} }, $word; } #### #Read the keys of %wordpl, i.e. the lengths of the elements in the arr +ay #@{ $wordpl{length($word)} } in @lengths and sort @length numerically #in descending order #### foreach $k (keys %wordpl ) { push @lengths, $k; } @lengths = sort {$b <=> $a} @lengths; #### #Start a loop where $l is the length, or the key, for %wordpl. #### foreach $l (@lengths) { word(); } #### #Start a loop where you pick the a word in @{ $wordpl{$l} } #and pass that word and $l-1 (index of array of words with one #less character then aforementioned word) to word1() #### sub word { my $word; foreach $word ( @{ $wordpl{$l} } ) { $oword=$word; #define original word my $p=$l-1; word1($p, $word); } } #### #Split $word from sub word into an @a, do the same for the word #picked by WO. If the two words differ by one character, and the #newest word ($sword) is greater then 3 characters, pass $sword #and $p-1 (index of array of words with one less character then #aforementioned word) to word2(). When word2() exits, exit WO #and return to word(). If the newest word is equal to three #characters in length, push that word into the array of #add-a-grams and exit WO to word() to restart process #### sub word1 { my $p=@_[0]; my @a=split(//,@_[1]); my $sword; WO: foreach $sword ( @{ $wordpl{$p} } ) { my @b=split(//,$sword); my %seen = (); my @aonly = (); my $item; @seen{@b}=(); foreach $item (@a) { push (@aonly, $item) unless exists $seen{$item}; } if (length(@aonly)==1) { if ($p==3) { push @addw, $oword; last WO; } $p--; word2($p, $sword); last WO; } } } #### #Split $sword from sub word1 into an @a, do the same for the word #picked by WT. If the two words differ by one character, and the #newest word ($sword) is greater then 3 characters, pass $sword #and $p-1 (index of array of words with one less character then #aforementioned word) to word1(). When word1() exits, exit WT #and return to word(). If the newest word is equal to three #characters in length, push that word into the array of #add-a-grams and exit WT to word() to restart process #### sub word2 { my $p=@_[0]; my @a=split(//,@_[1]); my $sword; WT: foreach $sword ( @{ $wordpl{$p} } ) { @b=split(//,$sword); %seen = (); @aonly = (); @seen{@b}=(); foreach $item (@a) { push (@aonly, $item) unless exists $seen{$item}; } if (length(@aonly)==1) { if ($p==3) { push @addw, $oword; last WT; } $p--; word1($p, $sword); last WT; } } } #### #determine the longest add-a-gram and print it and the #others to a file #### $lo=""; foreach $i (@addw) { if ( length($lo)<length($i) ) { $lo=$i; } } open (ADD, $adda) or die ("Cant open $adda: $!\n"); foreach $i (@addw) { if ($i ne $lo) { print ADD "$i\n"; } else { print ADD "longest: $i\n"; } } close (ADD);
        a single element array slice is my @a=split(//,@_[1]);. @_[1] should be written $_[1]. This is spelled out when you use warnings or (-w).

        Just putting use strict; (not Strict, leading upper case is generally used for modules, 'strict' is a pragma) at the top does not make your script run under 'use strict;'. The script you just posted will not even compile.

        Please turn on warnings ('-w' on the shebang or 'use warnings;') and 'use strict;'. Then go through your code make it run without giving a warnings. If you have a problem, grab the section you are having a problem with and post the snippet here and ask for help.

        I apoligize for not answering all of your questions outright, but you'll do much better in the long run if you go though these things yourself instead of me answering them for you. Hope you understand.

        Some nodes to help you out:
      • On asking for help
      • How to ReadTheFineManual


      • Update: also chipmunk posted some good code and one of the best things to do is take some good code like chipmunk's and go through it until you understand it. (I get to thank merlyn for letting me do that to his code :) )

        grep
        grep> chown linux:users /world
Re: Add-A-Gram Performance
by tachyon (Chancellor) on Feb 05, 2002 at 06:39 UTC

    Running on an ancient PII 266 this takes less than a minute to process the standard unix dictionary. It uses a fairly efficient algorithm. Basically we bulid a hash which has the sorted letters of the words as its keys and the corresponding words (as an array ref) as its values. Thus we have a hash like:

    ... 'aet' => [ ate eat ] ...

    As we have represented all the words as their sorted letters in a hash we make the search quick and easy. We work from the longest keys to the shortest and use recursion to test all the possibilities. We keep a list of 'winners' - any word which is in a winning list need not be tested again when we descend to it's level.

    #!usr/bin/perl -w use strict; # get words into an array and a hash keyed on sorted letters open DICT, "c:/windows/desktop/unixdict.txt" or die "Oops $!\n"; my (%words,%winners); while(<DICT>) { chomp; my $sort = join '', (sort split'',$_); push @{$words{$sort}}, $_; } # test the words by length - longest first, don't repeat known winners for my $word (sort {length $b <=> length $a} keys %words) { defined $winners{$word} ? next : test($word); } sub test { my $word = $_[-1]; if (length $word <=3) { winner(@_); return; } my @letters = sort split '', $word; for my $i (0..$#letters) { my @temp = @letters; $temp[$i] = ''; my $short = join '', @temp; test(@_,$short) if exists $words{$short}; } } sub winner { local $, = " "; for (@_) { $winners{$_}++; print @{$words{$_}}, "\n"; } print "\n"; } __DATA__ planetarium manipulate aluminate laminate matinal animal manila milan main min [blah]

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re (tilly) 1: Add-A-Gram Performance
by tilly (Archbishop) on Feb 05, 2002 at 11:45 UTC
    I think the performance of my solution is fairly good. :-) Can anyone (significantly) beat it? (Use the -v option if you want some debugging output.)

    The keys are to keep logic simple, avoid redundant work at all steps, avoid calling functions where not needed...(OK, I could speed up some more by removing the one remaining function, but that would be more trouble than I think it is worth.)

    #! /usr/bin/perl -w use strict; use Getopt::Std; getopts('v'); use vars qw($opt_v); # Read the input into an array of hashes from normalized to denormaliz +ed # words my @denorm_word; print "Scanning input\n" if $opt_v; while (<>) { chomp; $denorm_word[ length($_) ]{ join '', sort split //, $_ } = $_; } my $longest_seq = []; my $longest_chrs = []; # Find what the shortest word length is so we can break out of the # reasoning early. my $shortest = 0; $shortest++ until $denorm_word[$shortest]; # Let's walk unvisited words by length first, and do depth first # searches of sequences. Clearing as we go of course. LEN: foreach my $i (reverse 0...$#denorm_word) { if (not $denorm_word[$i]) { # No words next; } if ($i-$shortest < @$longest_seq) { print "Length $i to shortest $shortest can't improve, aborting\n" if $opt_v; last LEN; } print "Working on length $i\n" if $opt_v; foreach my $word (values %{$denorm_word[$i]}) { my ($seq, $chrs) = find_seq_chrs($word); if (@$longest_seq < @$seq) { $longest_seq = $seq; $longest_chrs = $chrs; print " New longest:\n", map " $_\n", @$seq if $opt_v; } if ($i-$shortest < @$longest_seq) { print "Length $i to shortest $shortest can't improve, aborting\n +" if $opt_v; last LEN; } } } # And dump it. foreach my $i (0..$#$longest_chrs) { print "$longest_seq->[$i] + $longest_chrs->[$i] =\n"; } print uc ($longest_seq->[-1]) . " (length " . (scalar @$longest_seq) . ")\n"; # Takes sub find_seq_chrs { my $word = shift || return; my $best_seq = [$word]; my $best_chrs = []; my ($out_chr, @end_list, @begin_list) = sort split //, $word; my $len = scalar @end_list; while (@end_list) { if (my $subword = delete($denorm_word[$len]{ join '', @begin_list, @end_list })) { my ($seq, $chrs) = find_seq_chrs($subword); if (@$best_seq <= @$seq) { push @$seq, $word; push @$chrs, $out_chr; $best_seq = $seq; $best_chrs = $chrs; } } push @begin_list, $out_chr; $out_chr = shift @end_list; } return ($best_seq, $best_chrs); } __END__
    UPDATE
    Some numbers. Run on my old PII laptop at 233 MHZ, I get
    bash-2.01$ time perl add_chain.pl /etc/dictionary at + r = tar + t = tart + s = start + i = traits + n = transit + e = straiten + o = stationer + i = iterations + c = recitations + n = interactions + d = INDOCTRINATES (length 12) real 0m6.708s user 0m6.090s sys 0m0.180s
    Note that my code is *not* case insensitive. The original contest didn't specify that, and case insensitivity is slower. :-)
Re: Add-A-Gram Performance
by runrig (Abbot) on Feb 05, 2002 at 14:29 UTC
    I could not resist, so I wrote a version which not only finds the anagrams, but also keeps track of all the chains of words found. It might be similar to something else here, but I didn't peek, honest (except for starting with the idea of starting with the longest words and removing letters):
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @word; while(<>) { chomp; my $key = join '', sort split ''; push @{$word[length]{$key}}, $_; } my @ana; for my $i (reverse 3..$#word) { my $got_it; for my $key (keys %{$word[$i]}) { if (my @chain = find_chain($key)) { push @ana, @chain; $got_it = 1; } } last if $got_it; } for my $chain (@ana) { for my $key (@$chain) { my $len = length $key; print "[ @{$word[$len]{$key}} ]\n"; } print "\n"; } sub find_chain { my $key = shift; my $len = length($key)-1; return [ $key ] if $len < 3; my @rtn; my $old_chr = ''; for my $i (0..$len) { my $tmp = $key; next if (my $chr = substr($tmp, $i, 1, '')) eq $old_chr; $old_chr = $chr; next unless exists $word[$len]{$tmp}; if (my @chain = find_chain($tmp)) { unshift @$_, $key for @chain; push @rtn, @chain; } } delete $word[$len]{$key} unless @rtn; @rtn; }
Re: Add-A-Gram Performance
by smgfc (Monk) on Feb 06, 2002 at 00:41 UTC
    Well I redid some of the code with all of your wonderful suggestions, especially grep and chipmunk, but now I have run into a snag. Rather then using %wordpl where the items are arrays of same length words and the keys are the lengths of those words, %wordpl's items are another hash with the sorted words as the keys and no items, and the keys are the lengths of the different sorted words. %wordr is away to reverse look up the actual words (ie the key is the same as the sorted letter, but point to an array of all the anagrams of those letters. But when i run this program it spits out:
    # Use of uninitialized value, <DICT> chunk 41. File 'untitled:Desktop Folder:Will's Stuff:Applications:MacPerl ƒ:add- +a-gram redux'; Line 15 # Use of uninitialized value, <DICT> chunk 41. File 'untitled:Desktop Folder:Will's Stuff:Applications:MacPerl ƒ:add- +a-gram redux'; Line 16
    Line 15 and 16 assign %wordpl and %wordr, but i dont have a clue what is wrong. Here is the full code:
    #!/usr/bin/perl -w use strict; my ($dict, $adda, $line, @words, $word, %wordpl, $k, @lengths, $l, @ad +dw, $oword, $lo, $i, %wordr); $dict="<untitled:Desktop Folder:WORD.LST"; #### #Open the dictionary file and Create a hash (%wordpl) of a hash, #where each key in 2nd hash is the same length, and the keys for #the 1st hash represent those lengths #### open (DICT, $dict) or die ("Can't open $dict: $!\n"); while (<DICT>) { chomp; push @{ $wordr{sort {lc($a) cmp lc($b)} $_} }, $_; $wordpl{length()}{sort {lc($a) cmp lc($b)} $_}++; } close (DICT); #### #Read the keys of %wordpl, i.e. the lengths of the elements in the 2nd + hash #in descending order Start a loop where $_ is the length, or the key, #for %wordpl. #### foreach (sort { $b <=> $a } keys %wordpl) { push (@addw, word(%wordpl, $_, $oword, @addw)); } #### #Start a loop where you pick the a word in keys %{ $wordpl{$l} } #and pass that word and $l-1 (index of hash of words with one #less character then aforementioned word) to word1() #### sub word { my ($word, $l, %wordpl, $oword, $p, @addw, $oword); %wordpl=$_[0]; $l=$_[1]; $oword=$_[2]; @addw=$_[3]; foreach $word ( keys %{ $wordpl{$l} } ) { $oword=$word; #define original word my $p=$l-1; word1($p, $word, %wordpl, $l, $oword, @addw); } return @addw; } #### #Split $word from sub word into an @a, do the same for the word #picked by WO. If the two words differ by one character, and the #newest word ($sword) is greater then 3 characters, pass $sword #and $p-1 (index of hash of words with one less character then #aforementioned word) to word2(). When word2() exits, exit WO #and return to word(). If the newest word is equal to three #characters in length, push that word into the array of #add-a-grams and exit WO to word() to restart process #### sub word1 { my ($p, @a, %wordpl, $l, $sword, @b, %seen, @aonly, $item, @addw, +$oword); $p=$_[0]; @a=split(//,$_[1]); %wordpl=$_[2]; $l=$_[3]; $oword=$_[4]; @addw=$_[5]; WO: foreach $sword ( keys %{ $wordpl{$p} } ) { @b=split(//,$sword); %seen = (); @aonly = (); @seen{@b}=(); foreach $item (@a) { push (@aonly, $item) unless exists $seen{$item}; } if (length(@aonly)==1) { if ($p==3) { push @addw, $oword; last WO; } $p--; word2($p, $sword, %wordpl, $l, $oword, @addw); last WO; } } } #### #Split $sword from sub word1 into an @a, do the same for the word #picked by WT. If the two words differ by one character, and the #newest word ($sword) is greater then 3 characters, pass $sword #and $p-1 (index of hash of words with one less character then #aforementioned word) to word1(). When word1() exits, exit WT #and return to word(). If the newest word is equal to three #characters in length, push that word into the array of #add-a-grams and exit WT to word() to restart process #### sub word2 { my ($p, @a, %wordpl, $l, $sword, @b, %seen, @aonly, $item, @addw, +$oword); $p=$_[0]; @a=split(//,$_[1]); %wordpl=$_[2]; $l=$_[3]; $oword=$_[4]; @addw=$_[5]; WT: foreach $sword ( keys %{ $wordpl{$p} } ) { @b=split(//,$sword); %seen = (); @aonly = (); @seen{@b}=(); foreach $item (@a) { push (@aonly, $item) unless exists $seen{$item}; } if (length(@aonly)==1) { if ($p==3) { push @addw, $oword; last WT; } $p--; word2($p, $sword, %wordpl, $l, $oword, @addw); last WT; } } } #### #print all the anagrams #### foreach $i (@addw) { foreach $lo (@{ $wordr{$i} }) { print "$lo\n"; } }
      One minor thing. The word file on the website contains NO uppercase letters nor any characters besides a-z (as a 'grep "^a-z"' of the file shows). So the lc is useless in this particular instance.
        Thanks alot. I actually didnt realize that, i just found that particular piece of code in Programming Perl, and didnt consider the dictionary file. I was about to take it out when i thought i would leave it, just incase I decided to use a different dictionary, etc. etc. So unless I am wrong in thinking lc is pretty effiecient I think I will leave it. Then again there are ~170,000 words in that dictionary file, so 680000 lc calls might slow it down a little! :) Thanks again!
Re: Add-A-Gram Performance
by patgas (Friar) on Feb 06, 2002 at 00:10 UTC

    You know, I looked at that challenge a couple weeks ago, and thought nothing of it. After all, they're looking for C++ and LISP programmers (I think). It wasn't until this very afternoon that I thought about it again, and started working on it. I had a good start on it before I left work, and I was going to pick back up on it tomorrow. That is, until I read all this. It's like having the ending of a good movie ruined. *sigh* :p

    "We're experiencing some Godzilla-related turbulence..."

      Hey, they still have the 9 9's problem that nobody has ruined yet.

      Good luck getting it to run in under 1 minute. :-)

        use Quantum::Superpositions; $nines[1] = any(9); for $i ( 1..8 ){ $nines[$i+1] = any eigenstates any map{ $nines[$_]+$nines[$i-$_+1], $nines[$_]-$nines[$i-$_+1], $nines[$_]*$nines[$i-$_+1], $nines[$_]/any grep{$_!=0}eigenstates $nines[$i-$_+1] } 1..$i; print $nines[$i+1],"\n"; for( $_=1;;$_++){ last unless $_ eq $nines[$i+1] } print "$_\n"; }
        Hey, they still have the 9 9's problem that nobody has ruined yet.

        Good luck getting it to run in under 1 minute. :-)

        Well, I got it running in about one hour (on my ancient 200MHz 486), which is better than the few days I thought it was going to take at first :-)
        (So if I run it on some new fangled 12Ghz system, it might break a minute,
        right? ;-)

        What matters, of course, is how many iterations it takes, and what it came down to (in my solution) was how many ways to order the 4 operators eight in a row (4**8 => 65536), times how many ways to put 9 duplicate balls in eight different buckets (which is alot, until I realized each bucket only needs to hold at most two balls, and then I filtered most of the redundant and invalid combinations from that which resulted in only 64 combinations), which resulted in about 4 million iterations of the main loop.

        I'm not going to spoil the answer here (and I hope I haven't given away too much already, and only provoke "what the hell is he talking about?" responses from most people), since the add-a-gram has been done to death both here and on comp.lang.perl.misc, and I'd hate to see the same done to the 9 nines, but anyone that can filter the problem space even further has my ++'s...(hmm, wonder if I can get a job in Cambridge, MA now...)

        Update: Damn! While I took a completely different approach, it'll never beat tilly's (and I suppose its not even worth pointing out that my implementation was broken, and that the corrected version will take about 2 hours to run :-)

Re: Add-A-Gram Performance
by dfog (Scribe) on Feb 06, 2002 at 21:05 UTC
    My entry into the pot works by starting with the smallest words and building to the largest, instead of the other way. On the word list from ITA, the answers are 16 letters long (Underestimations and indeterminations).
    #!perl use strict; use Data::Dumper; $|++; my %WordTree = (); my %WordLengths = (); my $LongWordLength = 1; open (DATA, "Word.txt"); while (<DATA>) { chomp; my $Word = $_; my $Length = length($_); next if $Length < 3; $LongWordLength = $Length if ($LongWordLength < $Length); push (@{$WordLengths{$Length}}, $Word); } print "Finished reading Data\n"; close (DATA); for (my $CurrentLength = 3; $CurrentLength <= $LongWordLength; $Curren +tLength++) { print "Working on words of length $CurrentLength\n"; my ($Word) = ""; foreach $Word (@{$WordLengths{$CurrentLength}}) { my (@SortedLetters) = sort split //, $Word; if ($CurrentLength > 3) { my $TempWord = ""; my @SplicedLetters = ""; my $MatchFlag = 0; for (my $Splice = 0; $Splice <= $#SortedLetters; $Splice++ +) { @SplicedLetters = @SortedLetters; splice(@SplicedLetters, $Splice, 1); $TempWord = join '', @SplicedLetters; if (exists($WordTree{$#SortedLetters}->{$TempWord})) { if (! exists($WordTree{$CurrentLength}->{join '', +@SortedLetters})) { push (@{$WordTree{$CurrentLength}->{join '', @ +SortedLetters}}, $Word); push (@{$WordTree{$CurrentLength}->{join '', @ +SortedLetters}}, @{$WordTree{$CurrentLength - 1}->{join ' +', @SplicedLetters}}); } last; } } } else { if (! exists($WordTree{3}->{join '', @SortedLetters})) { push (@{$WordTree{3}->{join '', @SortedLetters}}, $Wor +d); } } } } my (@LongestMatch) = sort {$a<=>$b} keys %WordTree; my $NotFoundLongest = 1; while ($NotFoundLongest && $#LongestMatch > 1) { if (keys %{$WordTree{$LongestMatch[$#LongestMatch]}} > 0) { print "The longest length is $LongestMatch[$#LongestMatch]\n"; print Dumper $WordTree{pop @LongestMatch}; $NotFoundLongest = 0; } pop @LongestMatch; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://143357]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-16 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found