What you want is a linear search, one that checks each word against the set of letters only once, without requiring that you store either the entire dictionary or the entire set of letter combinations in memory at once. The following isn't set up to use strict (I never did find out how to use globals without setting off strict), but it does perform as required:
use Benchmark;
$letters = 'chunzcii';
$lhash{$_}++ for split //, $letters;
$lcount = length($letters);
$t0 = new Benchmark;
open ($handle, 'dictionary.dat');
while (<$handle>) {
chomp; $w++ if scrabble($_);
$c++;
}
close ($handle);
$t1 = new Benchmark;
$td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";
print "$w matches from $c words";
sub scrabble {
return 0 if length($_[0]) > $lcount;
my %wlhash;
$wlhash{$_}++ for split //, $_[0];
for (keys %wlhash) {
return 0 if $lhash{$_} < $wlhash{$_} &&
($nf += $wlhash{$_} - $lhash{$_}) > $blanks;
}
return 1;
}
Using this, I ran through a 200,000-word file in around a second, using almost no memory in the process. And I'm sure someone imaginative will be able to significantly improve on the efficiency of the function and file reads.
EDIT: Actually, Benchmark says I used 3.93 seconds (124 matches out of 201252 words).Then I tried a different algorithm, which took 2.32 seconds:
use Benchmark;
$letters = 'chunzcii';
$sorted = join '', sort split //, $letters;
$lcount = length($letters);
$t0 = new Benchmark;
open ($handle, 'dictionary.dat');
while (<$handle>) {
chomp; $w++ if scrabble($_);
$c++;
}
close ($handle);
$t1 = new Benchmark;
$td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";
print "$w matches from $c words";
sub scrabble {
return 0 if length($_[0]) > $lcount;
$p = 0;
for (sort split //, $_[0]) {
return 0 if !($p = index($sorted, $_, $p) + 1);
}
return 1;
}
Incidently, loading all 200,000+ words into an array and then cycling through them increased the time to 3.08 seconds, so not only is this more wasteful of memory, but it seems to be less efficient as well.
EDIT: Oh wait, you need support for blanks. Guess the array method is out, since hashing is much more suited for this. 4.27 seconds for 1240 matches from 201252 words, with one blank:
use Benchmark;
$letters = 'chuncii_';
$lcount = length($letters);
while ($letters =~ /[^a-z]/) {
$letters =~ s/[^a-z]//;
$blanks++;
}
$lhash{$_}++ for split //, $letters;
$t0 = new Benchmark;
open ($handle, 'dictionary.dat');
while (<$handle>) {
chomp; $w++ if scrabble($_);
$c++;
}
close ($handle);
$t1 = new Benchmark;
$td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";
print "$w matches from $c words";
sub scrabble {
return 0 if length($_[0]) > $lcount;
my %wlhash;
$wlhash{$_}++ for split //, $_[0];
$nf = 0;
for (keys %wlhash) {
return 0 if $lhash{$_} < $wlhash{$_} &&
($nf += $wlhash{$_} - $lhash{$_}) > $blanks;
}
return 1;
}
And 4.53 seconds for 10230 matches from 201252 words, with two blanks.
And 4.98 seconds for 65472 matches from 201252 words, with four blanks.
And 5.07 seconds for 135718 matches from 201252 words, with all eight letters blanks.
EDIT: And sulfericacid couldn't figure out how to make it print the matches, so here's yet one more version. Note that this version requires storing all matches in memory at once, since I'm having them sorted by length and then alphabetically. Now I should probably contact Yahoo and tell them you're cheating...
$letters = 'chuncii_';
$lcount = length($letters);
while ($letters =~ /[^a-z]/) {
$letters =~ s/[^a-z]//;
$blanks++;
}
$lhash{$_}++ for split //, $letters;
open ($handle, 'dictionary2.dat');
while (<$handle>) {
chomp; push @matches, $_ if scrabble($_);
}
close ($handle);
print join "\n", sort { length($b) <=> length($a) || $a cmp $b } @matc
+hes;
sub scrabble {
return 0 if length($_[0]) > $lcount;
my %wlhash;
$wlhash{$_}++ for split //, $_[0];
$nf = 0;
for (keys %wlhash) {
return 0 if $lhash{$_} < $wlhash{$_} &&
($nf += $wlhash{$_} - $lhash{$_}) > $blanks;
}
return 1;
}
Returned (from a small dictionary sub-set):
zucchini
acacia
couch
lunch
kick
zinc
cut
did
hen
hit
ice
kin
nip
sun
ugh
pi
a
EDIT: I also have a use strict / warnings version up on my scratchpad now. Though I can't guarantee it will be there forever. |