http://qs321.pair.com?node_id=221277

mt2k has asked for the wisdom of the Perl Monks concerning the following question:

I have looked at the posts on the site, as well as the Algorithm::Permute module, but nothing seems to correctly match my problem. Everything I have seen is in regards to getting N Permutations of N items. My problem requires that I get N Permutations of M items. Here's what I'm doing:

I'm taking a list of letters and finding matches in the dictionary that contain a permutation of these letters. I have gotten to the point where if I pass it the string "ciolnaot", it will find 'location'. This is being done with the above-mentioned Algorithm::Permute. This is working fine, but now I need to add a new twist to it. I want to specify the length of the words the script will find in the dictionary. Meaning that if I specify the string "ciolnaot" and a length of 4, I want to find 'coal', 'colt', 'loin', 'lion', 'toil', and whatever other 4-letter words exist in there, and only the words of that length.

So rather than taking the string, splitting it and finding permutations of it, it seems it becomes more complicated as I think I'll need to get 4-character permutations from the original string of letters, and then permutate each of those permutations... Am I making sense anymore? :)

One thing I hope I may be missing is another module like Algorithm::Permute that is more advanced, in allowing permutation string lengths or something similar. Please help!

-------------------------------------
eval reverse@{[lreP
=>q{ tsuJ\{qq},' rehtonA'
,q{\}rekcaH },' tnirp']}[1+1+
1,1-1,1+1,1*1,(1+1)*(1+1)];
-------------------------------------

Replies are listed 'Best First'.
Re: N Permutations of M Items
by Zaxo (Archbishop) on Dec 20, 2002 at 08:27 UTC

    It's better to avoid checking all permutations if you can. Permutations grow in number very rapidly with string length.

    You don't need to know which permutation gets you a dictionary word, only that one exists. You can do that by rearranging the characters of words in alphabetical order and matching with wildcards. We deal with 'location' in the form 'acilnoot'.

    First, let's get the preliminaries out of the way...

    #!/usr/bin/perl use warnings; use strict;
    and make an anagram dictionary. That's not necessary, but it will speed things up if the wordlist is not too long for memory.
    my ($wl, %words) = '/usr/dict/words'; { open my $wf, '<', $wl or die $!; chomp, push @{$words{join( '', sort split //, lc)}}, $_ while <$wf>; close $wf }

    It would be nice to give several words on the command line, so we set that up in a hash which will hold results:

    my %matches; @matches{map {lc} @ARGV} = ();

    Now, for instance, 'coal' appears in the %words dictionary under the key 'aclo'. We can construct the regex /a.*c.*l.*o/. If 'acilnoot' matches, then 'location' contains 'coal'.

    for (keys %matches) { my $ordered = join '', sort split //; $matches{$_} = [ grep { length($ordered) < length($_) ? 0 : do { my $re = '^(.*?)' . join('(.*)', map {quotemeta} split '') . '(.*)$'; $ordered =~ /$re/; }; } keys %words]; }
    I've fancied up the regex to capture unused characters (just in case that may be handy ;-), though there is no use made of that here. I also arranged for length check to save time on hopeless cases.

    Finally, print results just to have something to show for all that.

    print $_, ' contains ', "@{[map {@$_} @words{@{$matches{$_}}}]}", $/, $/ for keys %matches; __END__

    Running this on my machine as 'time perl anaword.pl location brevity Zoroaster' prints:

    location contains loon octal loot tool Al an at in Io it no on to Acto +n canto Alton onto alnico lion loin Olin lint toil action into Toni I +lona Latin colon location Tonio Cain Inca clan coal cant coat lotion +lain nail tail anti Tina iota Lac can act cat ail Ali Ian Lao ant Nat + tan oat Clint con coo cot loan tonic NATO Clio coil loci coin icon L +in nil oil lit ion tin Ito cool clot colt lot coon not ton too antic zoroaster contains Zoroaster as at re et or so to Oz ooze Eros ores Ro +se sore tore zero toes Azores errs rest zest rotor roost roots arose +Erato roaster rears Serra rater Terra aster rates stare tears razors +orators sorer store zeros Starr rare rear Ares ears eras sear rate te +ar Ezra raze east eats sate seat teas rooster resort roster sorter or +ator are ear era Rae sea ate eat tea oar Sao oat art rat tar sat root + soot zoos rots sort ore roe toe Zoe err set arrest rarest raster rat +ers Sartre starer ersatz roar oars Rosa soar oats Taos arts rats star + roars razor Astor roast too zoo Orr rot rooter brevity contains try be by re et it very rivet bier Brie bite Bert ver +b byte bevy brevity bet bye rib bit biter Tiber tribe ire tie vie rye + yet ivy rite tier tire vier Viet real 0m8.766s user 0m7.450s sys 0m0.070s
    I didn't show how to limit attention to four letter words, but that is easy given the structure of this treatment. There are other nodes here which use this technique. Search 'anagram' to find them.

    Update: Modified regex to anchor at the ends and not miss unused characters, updated times to the new values.

    After Compline,
    Zaxo

Re: N Permutations of M Items
by Enlil (Parson) on Dec 19, 2002 at 23:25 UTC
    Here is one solution:
    use strict; use warnings; use Algorithm::Permute; my $p = new Algorithm::Permute([qw(c i o l n a o t)]); my %seen_permutation; my $length = 5; while ( my @res = $p->next) { my @tmp = @res[0..$length - 1]; #get first n letters of + @res; my $key = join('',sort {$a cmp $b} @tmp); #return asciibetical li +st as key; unless ( $seen_permutation{$key} ) #skip if this pattern has b +een seen before { $seen_permutation{$key} = 1; my $r = new Algorithm::Permute([@tmp]); while ( my @sml = $r->next ) { print join ('',@sml), $/; } } }
    Since you are looking for words you might want also skip anything that doesn't contain vowels (though there are words without vowels, such as cwm).

    -enlil

Re: N Permutations of M Items
by traveler (Parson) on Dec 19, 2002 at 23:24 UTC
    It sounds as though what you want is pretty close to this. Maybe you can use it as a basis for your code. You might also try Algorithm::ChooseSubsets.

    HTH, --traveler

Re: N Permutations of M Items
by waswas-fng (Curate) on Dec 20, 2002 at 00:05 UTC
    If you are using a dictonary why permute in the first place? loop through the dict and do two tests,
    Psudocode: If word is Lenght you are looking for if no skip word, If yest then check to see if all of the chars in word are in the input array If yes you have a hit next word.
    -Waswas
Re: N Permutations of M Items
by I0 (Priest) on Dec 20, 2002 at 03:38 UTC
    $_=shift||"ciolnaot"; my $s = join'?',(sort split//),""; while( <> ){ chomp; next unless length == 4; my $t=join'',sort split//; print "$_\n" if $t=~/^$s$/; }
Re: N Permutations of M Items
by atcroft (Abbot) on Dec 20, 2002 at 01:25 UTC

    Don't know of a module, but if you are just looking at strings, the following code would work, printing each generated possibility only once:

    # # Assumes $word and $desired_length previously defined. # my @characters = split('', $word); my (%found); &gen_nofm(\@characters, \%found, '', $desired_length); print($_, "\n") foreach (sort({lc($a) cmp lc($b)} keys(%found))); sub gen_nofm { my ($charlist, $found_thusly, $partial_word, $desired_length) = @_; if ($desired_length) { for (my $index = 0; $index < scalar(@{$charlist}); $index++) { &gen_nofm($charlist, $found_thusly, $partial_word . ${$charlist}[$index], $desired_length - 1); } } else { ${$found_thusly}{$partial_word}++; } }