Multiple words anagrams (challenge) Re: Perl's pearls

in reply to Re: Perl's pearls
in thread Perl's pearls

This is a real challenge (at least for me).
Finding (multiple) words that are anagrams of a given phrase is much more complex than finding the simple anagrams. However, we can do something, at least to find up to three words corresponding to a given input.

pseudocode

--------------------------------------------------------
calculate inputphrase signature
while (input available)
    read word
    skip if words letters not in inputphrase
    calculate signature and anagrams

for each candidate word           # first pass
    for each candidate word       # second pass
        for each candidate word   # third pass
            combine candidate signatures
            skip if different from inputphrase signature
            add to anagrams list

print anagrams list
--------------------------------------------------------
[download]

Example: given a list containing the following words

cere 
cheer
come
cree
echo
eeoc
em 
he
here
herm
home
me
mere 
moe
re
reme
rhee
rho
[download]

We could issue a command

perl mul_anagram.pl "come here" < words
[download]

and we would get this result:

 [come] (here rhee)
 [echo] (mere reme)
 [eeoc] (herm)
 [home] (cere cree)
 [moe] (cheer)
 [re] (come) <he>
 [re] (echo) <em me>
 [rho] (emcee)
[download]

From which it is easy to pick "cheer Moe" or "mere echo" or "re: echo me"
Notice that the words have different parentheses: "[]" come from the first pass, "()" from the second and "<>" from the third one. It means that, to have your complete anagram, you should pick one word from each different parenthesis.
Using the same word list mentioned in the main node, I got these:

monk friar
   norm fakir
   marin fork
   rank of rim
   koran firm

lithographic
   alight orphic
   goliath chirp
   high tropical
   oligarch pith
   pig haircloth

perlmonks.net
   tromp kernel

monk saint
   main knots
   mason knit
  
Perl forever
   lover prefer
   reprover elf
   repel fervor
   
Use Perl forever
   profuse reveler
   reefer overplus
   reprove refuels
   sleeper fervour

wait for experience
   firepower exitance

son of a gun
   snafu goon
   GNU on sofa

Brother Tilly
    try their boll
    try other bill
    lit Tyrol herb
[download]

Enough chatting. Here's the code.

#!/usr/bin/perl -w
use strict;

my $phrase = shift || die "input phrase required\n";
my $outer_limit = shift || 1500;
my $inner_limit = shift || 100;
$phrase = lc $phrase;
$phrase =~ tr/a-z//cd; # considers only alpha characters
my @input_letters = split //, $phrase;
my $signature = join "", sort @input_letters;
my %words = ();
my %compare_template;
for (@input_letters) {$compare_template{$_}++};

INPUT:
while (<>) {
    chomp;                                     
    $_ = lc $_;                             
    my @letters = split //, $_;
    my $windex = join "", sort @letters;
    my %compare = %compare_template;
    for my $let (@letters) {
        next INPUT 
            unless (exists $compare{$let}) # keeps only words made of 
                and $compare{$let}--;      # signature letters
    }
    if (exists $words{$windex}) {             
        next if $words{$windex} =~ /\b$_\b/;
        $words{$windex} .= " ";         
    }
    $words{$windex} .= $_;
}

my $items = scalar keys %words ;
print STDERR "Considering $items items. ";
if ($items > $outer_limit) {
    print "Too many candidates. It would take too long\n";
    exit;
}
print STDERR  @{[$items > $inner_limit ? "Only two" : "Three"]}, 
    " passes\n";

my @candidates = keys %words;
my @used = (); # stores the combination of words already found

for my $first (0 .. $#candidates) {
    if ($signature eq $candidates[$first]) {
        print " [" . $words{$candidates[$first]} . "]\n";
        push @used , [$first, -1,-1];        
        next
    }
    for my $second (0 .. $#candidates) {
        next if $second == $first;
        next if grep { (grep {$_ == $first} @$_)
                        and (grep {$_ == $second} @$_)} @used;
        my $sign = join "", sort 
            split //, $candidates[$first].$candidates[$second];
        if ($sign eq $signature) {
            print " [" . $words{$candidates[$first]}. "] ("
                . $words{$candidates[$second]}. ") \n";
            push @used, [$first, $second, -1];
            next;
        }
        if ($items <= $inner_limit) {
            for my $third (0.. $#candidates) {
                next if $third == $second;
                next if grep { (grep {$_ == $first} @$_ )
                        and (grep {$_ == $second} @$_)
                        and (grep {$_ == $third} @$_) } @used;
                my $sign = join "", sort 
                    split //, $candidates[$first]
                    .$candidates[$second].$candidates[$third];
                if ($sign eq $signature) {
                    print " [" . $words{$candidates[$first]}. "] ("
                        . $words{$candidates[$second]}. ") <"
                        . $words{$candidates[$third]}. "> \n";
                    push @used, [$first, $second,$third];
                    next;
                }
            }
        }
    }
}
[download]

Of course, this program is going to be much slower than the normal anagram script. For 2000 candidate words -- i.e. the words found to be composed of input-phrase letters only --, it could run (depending on your conputer's speed) for 10 minutes! That's why I put some safeguards. If the candidates are more than 2000 it won't start at all. If they are more than 100, it will limit to 2 passes. The number of iterations rises very fast, and therefore, unless you have a FAST computer with LOTS of memory (and time and patience on your side), don't change these defaults. Three passes a list of 100 candidate words result in 1 million iterations, 8 million for 200, and 27 million for 300
This is just a shot. I am sure that there is room for improvements. Maybe some saints in the Monastery could help ... :-)
Enjoy!

 _  _ _  _  
(_|| | |(_|><
 _|

Comment on Multiple words anagrams (challenge) Re: Perl's pearls Select or Download Code

In Section Meditations