Re: Challenge: 8 Letters, Most Words

Haven't been on Monastery grounds in a while, so I'm a little late to this party.

$ time t.pl 2of12inf.txt
Done processing dict (40933 words). Candidate counts, by number of let
+ters:
$VAR1 = [
          0,
          53,
          516,
          1894,
          4068,
          7076,
          10360,
          11926
        ];

Done computation. Result (most words at bottom):
aabdellu : 1
emprrtuy : 1
[... output truncated ...]
aeginrst : 296
aelmprst : 297
acelprst : 297
aceiprst : 303
adeimrst : 305
adeoprst : 307
aeimnrst : 307
aeilnpst : 308
adeinrst : 311
aeilnrst : 311
adeilrst : 319
aeimprst : 327
adeiprst : 331
aeinprst : 336
aeilprst : 343

real    0m4.860s
user    0m3.665s
sys     0m0.160s
[download]

My solution:

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw(sum);
use Data::Dumper;

my $MAXLEN = 8;

my @data;
my $words_processed = 0;

while (<>) {
    s/[%\s]+//g;

    my $len = length;
    next unless 1 <= $len && $len <= $MAXLEN;

    ++$words_processed;

    my @a = sort split //, lc;
    my $w = join "", @a;
    ++$data[$len - 1]{$w}{count};
    $data[$len - 1]{$w}{contrib}{$w} = 1;
    $data[$len - 1]{$w}{alphas} ||= \@a;
}

print "Done processing dict ($words_processed words). Candidate counts
+, by number of letters:\n";
print Dumper [map { $_ ? scalar(keys %$_) : 0 } @data];

for my $c (1 .. $MAXLEN - 1) {
    my $data1 = $data[$c - 1];
    my $data2 = $data[$c];
    next unless $data1 && $data2;

    for my $v1 (values %$data1) {
        for my $extra ("a" .. "z") {
            my $new = join "", sort(@{ $v1->{alphas} }, $extra);

            if ($data2->{$new}) {
                $data2->{$new}{contrib}{$_} = 1
                  for keys %{ $v1->{contrib} };
            }
        }
    }
}

my $data_max = $data[$MAXLEN - 1];
for my $v (values %$data_max) {
    $v->{total} = sum(map $data[length($_) - 1]{$_}{count}, keys %{ $v
+->{contrib} });
}

print "\nDone computation. Result (most words at bottom):\n";

for my $w (sort { $data_max->{$a}{total} <=> $data_max->{$b}{total} } 
+keys %$data_max) {
    print "$w : $data_max->{$w}{total}\n";
}
[download]

My strategy is to aggregate up the word counts from candidates with the fewest letters to the most letters. (This approach is not thorough but is fast, see Update #2). The trick to make it fast is to realize that there is a one letter difference between candidate tiers, and to cycle through 26 letters for subset matching.

My winner is aeilprst with it being able to make at least 343 words (using all 8 letters, or a subset of that). This may not be the best answer because of the simplistic aggregation, but it points to the likely champs.

~~Looking at the other replies makes me question myself. What am I doing wrong?~~

~~Other results have winners that create only hundreds of words. Seems strange, given that 40k+ words were processed. Is that for words using all 8 letters only (and not a subset of the letters)?~~

Update: I see what's wrong now. The aggregation has to keep track of contributing counts of specific candidates. Back to drawing board.

Update #2: I have updated the script and results, which is more in line with what others have gotten. The total word counts do come up short because of the aggregation strategy. But the tradeoff in speed is significant.

Comment on Re: Challenge: 8 Letters, Most Words Select or Download Code


Perl Monk, Perl Meditation
	PerlMonks