in reply to Re: Perl's pearls
in thread Perl's pearls

This is a real challenge (at least for me).
Finding (multiple) words that are anagrams of a given phrase is much more complex than finding the simple anagrams. However, we can do something, at least to find up to three words corresponding to a given input.
-------------------------------------------------------- calculate inputphrase signature while (input available) read word skip if words letters not in inputphrase calculate signature and anagrams for each candidate word # first pass for each candidate word # second pass for each candidate word # third pass combine candidate signatures skip if different from inputphrase signature add to anagrams list print anagrams list --------------------------------------------------------
Example: given a list containing the following words
cere cheer come cree echo eeoc em he here herm home me mere moe re reme rhee rho
We could issue a command
perl "come here" < words
and we would get this result:
[come] (here rhee) [echo] (mere reme) [eeoc] (herm) [home] (cere cree) [moe] (cheer) [re] (come) <he> [re] (echo) <em me> [rho] (emcee)
From which it is easy to pick "cheer Moe" or "mere echo" or "re: echo me"
Notice that the words have different parentheses: "[]" come from the first pass, "()" from the second and "<>" from the third one. It means that, to have your complete anagram, you should pick one word from each different parenthesis.
Using the same word list mentioned in the main node, I got these:
monk friar norm fakir marin fork rank of rim koran firm lithographic alight orphic goliath chirp high tropical oligarch pith pig haircloth tromp kernel monk saint main knots mason knit Perl forever lover prefer reprover elf repel fervor Use Perl forever profuse reveler reefer overplus reprove refuels sleeper fervour wait for experience firepower exitance son of a gun snafu goon GNU on sofa Brother Tilly try their boll try other bill lit Tyrol herb
Enough chatting. Here's the code.
#!/usr/bin/perl -w use strict; my $phrase = shift || die "input phrase required\n"; my $outer_limit = shift || 1500; my $inner_limit = shift || 100; $phrase = lc $phrase; $phrase =~ tr/a-z//cd; # considers only alpha characters my @input_letters = split //, $phrase; my $signature = join "", sort @input_letters; my %words = (); my %compare_template; for (@input_letters) {$compare_template{$_}++}; INPUT: while (<>) { chomp; $_ = lc $_; my @letters = split //, $_; my $windex = join "", sort @letters; my %compare = %compare_template; for my $let (@letters) { next INPUT unless (exists $compare{$let}) # keeps only words made of and $compare{$let}--; # signature letters } if (exists $words{$windex}) { next if $words{$windex} =~ /\b$_\b/; $words{$windex} .= " "; } $words{$windex} .= $_; } my $items = scalar keys %words ; print STDERR "Considering $items items. "; if ($items > $outer_limit) { print "Too many candidates. It would take too long\n"; exit; } print STDERR @{[$items > $inner_limit ? "Only two" : "Three"]}, " passes\n"; my @candidates = keys %words; my @used = (); # stores the combination of words already found for my $first (0 .. $#candidates) { if ($signature eq $candidates[$first]) { print " [" . $words{$candidates[$first]} . "]\n"; push @used , [$first, -1,-1]; next } for my $second (0 .. $#candidates) { next if $second == $first; next if grep { (grep {$_ == $first} @$_) and (grep {$_ == $second} @$_)} @used; my $sign = join "", sort split //, $candidates[$first].$candidates[$second]; if ($sign eq $signature) { print " [" . $words{$candidates[$first]}. "] (" . $words{$candidates[$second]}. ") \n"; push @used, [$first, $second, -1]; next; } if ($items <= $inner_limit) { for my $third (0.. $#candidates) { next if $third == $second; next if grep { (grep {$_ == $first} @$_ ) and (grep {$_ == $second} @$_) and (grep {$_ == $third} @$_) } @used; my $sign = join "", sort split //, $candidates[$first] .$candidates[$second].$candidates[$third]; if ($sign eq $signature) { print " [" . $words{$candidates[$first]}. "] (" . $words{$candidates[$second]}. ") <" . $words{$candidates[$third]}. "> \n"; push @used, [$first, $second,$third]; next; } } } } }
Of course, this program is going to be much slower than the normal anagram script. For 2000 candidate words -- i.e. the words found to be composed of input-phrase letters only --, it could run (depending on your conputer's speed) for 10 minutes! That's why I put some safeguards. If the candidates are more than 2000 it won't start at all. If they are more than 100, it will limit to 2 passes. The number of iterations rises very fast, and therefore, unless you have a FAST computer with LOTS of memory (and time and patience on your side), don't change these defaults. Three passes a list of 100 candidate words result in 1 million iterations, 8 million for 200, and 27 million for 300
This is just a shot. I am sure that there is room for improvements. Maybe some saints in the Monastery could help ... :-)
 _  _ _  _  
(_|| | |(_|><