Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
There is a popular word game caled "Word Morph". In this game you need to go from one word to another by modifying one letter in each word to form a new word. for example (head to foot):
HEAD bead beat boat boot FOOT
One of my English-speaking friends told me about this game. As my English is quite bad, I could not play this game fairly; but i wrote a perl script to solve the problem :)

This command-line tool finds a shortest way from one word to another using the given dictionary. I used the 2of12 dictonary from the 12Dicts project (http://wordlist.sourceforge.net), but any newline-character-delimited wordlist of any language can be used.

I don't know how 'cool' is this usage, but i mean that it is quite interesting :)

It is of course not ideal, so I am open for any suggestions :)

#!/usr/bin/perl #ver 1.02 use warnings; use strict; my $dict = '2of12.txt'; die <<HELP unless @ARGV == 2; usage: transform.pl <word1> <word2> The program finds a way from one word to other, like this: % transform.pl love shit love-lose-lost-loot-soot-shot-shit HELP my ($left, $right) = @ARGV[0,1]; for ($left, $right) { $_ = lc; } die "the length of given words is not equal!\n" if length($left) != le +ngth $right; open DICT, $dict or die "Cannot open dictionary $dict: $!"; my @words; while (<DICT>) { chomp; push @words, $_ if length == length $left; } eval { my @ways = ([transform($left, $right, \@words)], [reverse transfor +m($right, $left, \@words)]); if (@{$ways[0]} != @{$ways[1]}) { printway( @{$ways[0]} > @{$ways[1]} ? $ways[0] : $ways[1] ); } elsif (grep {$ways[0]->[$_] ne $ways[1]->[$_]} (0..(scalar(@{$ways +[0]}) - 1) )) { printway($ways[0]); printway($ways[1]); } else {printway($ways[0])} 1; } or print $@; sub transform { my $left = shift; my $right = shift; my @words = @{+shift}; my (@left, %left, @right, %right); # @left and @right- arrays + containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f +ie] ...) # %left and %right - indic +es containing word offsets in arrays @left and @right $left[0] = [$left]; $right[0] = [$right]; $left{$left} = 0; $right{$right} = 0; my $leftstart = 0; my $rightstart = 0; my @way; my (%leftstarts, %rightstarts); SEARCH: for (;;) { my @left_ids = $leftstart..$#left; + # choose array of indices of new words $leftstart = $#left; die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left +start}++ >2; # finish search if the way could not be found for my $id (@left_ids) { + # come through all new words my @prefix = @{$left[$id]}; my @patterns = wordpattern(pop @prefix); + # build patterns to find related words: foo -> (/^.oo$/,/^f.o$/, /^ +fo.$/) push @prefix, $id; foreach my $word (@words) { next if $left{$word}; + # skip words which are already in the tree if (scalar grep {$word =~ /$_/} @patterns) { + # if matched... push @left, [@prefix, $word]; $left{$word} = $#left; + # add new word to array and index #print join " ", @{$left[-1]}, "\n"; #debugging if ( defined(my $r_id = $right{$word}) ) { + # and check if the word appears in right index. if yes... my @end = reverse(print_rel($r_id, \@right)); shift @end; @way = (print_rel($#left, \@left), @end); + # build the way between the words last SEARCH; + # and finish the search } } } } my @right_ids = $rightstart..$#right; + # all the same :) the tree is build from both ends to speed up the +process $rightstart = $#right; die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri +ghtstart}++ > 2; for my $id (@right_ids) { # build right relational table my @prefix = @{$right[$id]}; my @patterns = wordpattern(pop @prefix); push @prefix, $id; foreach my $word (@words) { next if $right{$word}; if (scalar grep {$word =~ /$_/} @patterns) { push @right, [@prefix, $word]; $right{$word} = $#right; # print join " ", @{$right[-1]}, "\n"; #debugging if ( defined(my $l_id = $left{$word}) ) { my @end = reverse print_rel($#right, \@right); shift @end; @way = (print_rel($l_id, \@left), @end); last SEARCH; } } } } } return @way; } sub wordpattern { my $word = shift; my @patterns; for my $i (0..(length($word)-1)) { substr((my $pat = $word), $i, 1, '.'); push @patterns, qr/^$pat$/; } return @patterns; } sub print_rel { my $id = shift; my $ary = shift; my @line; my @rel = @{$ary->[$id]}; push @line, (pop @rel); foreach my $ref_id (reverse @rel) { unshift @line, $ary->[$ref_id]->[-1]; } return wantarray ? @line : join "\n", @line, ""; } sub printway { my @way = @{+shift}; print join "-", @way; print "\n"; }

UPDATE:
The dicussion with Limbic~Region lead to an improved variant of my script — at first time it is called it builds a helper data structure based on the given dictionary, stores it using Storable on the hard disk and loads its to memory at the next calls. So the first execution of the script lasts a very long time (about 5-8 minutes on a usual machine), but the next calls take less than half a second.

Its implementation uses modules Storable and Text::LevenshteinXS while my first variant was completely standalone. But I mean the faster variant is better, so I post it here.

#!/usr/bin/perl #ver 2.00 use warnings; use strict; use Storable; use Text::LevenshteinXS 'distance'; my $dict = '2of12.txt'; die <<HELP unless @ARGV == 2; usage: transform.pl <word1> <word2> The program finds a way from one word to other, like this: % transform.pl love shit love-lose-lost-loot-soot-shot-shit HELP my ($left, $right) = @ARGV[0,1]; for ($left, $right) { $_ = lc; } die "the length of given words is not equal!\n" if length($left) != le +ngth $right; my $db = -e 'dictionary.db' ? retrieve('dictionary.db') : build_db(); my $len = length $left; foreach my $word ($left, $right) { if (!$db->{$len}{$word}) { foreach my $test (keys %{$db->{$len}}) { if (distance($word, $test) == 1) { push @{$db->{$len}{$word}}, $test; push @{$db->{$len}{$test}}, $word; } } } } my $list = $db->{length($left)}; eval { printway([transform($left, $right, $list)]); 1; } or print $@; sub transform { my $left = shift; my $right = shift; my $list = shift; my (@left, %left, @right, %right); # @left and @right- arrays + containing word relation trees: ([foo], [0, foe], [0, fou], [0, 1, f +ie] ...) # %left and %right - indic +es containing word offsets in arrays @left and @right $left[0] = [$left]; $right[0] = [$right]; $left{$left} = 0; $right{$right} = 0; my $leftstart = 0; my $rightstart = 0; my @way; my (%leftstarts, %rightstarts); SEARCH: for (;;) { my @left_ids = $leftstart..$#left; + # choose array of indices of new words $leftstart = $#left; die "Cannot solve! Bad word '$left' :(\n" if $leftstarts{$left +start}++ >2; # finish search if the way could not be found for my $id (@left_ids) { + # come through all new words my @prefix = @{$left[$id]}; my $searched = pop @prefix; push @prefix, $id; foreach my $word (@{$list->{$searched}}) { next if $left{$word}; + # skip words which are already in the tree push @left, [@prefix, $word]; $left{$word} = $#left; # +add new word to array and index #print join " ", @{$left[-1]}, "\n"; #debugging if ( defined(my $r_id = $right{$word}) ) { # +and check if the word appears in right index. if yes... my @end = reverse(print_rel($r_id, \@right)); shift @end; @way = (print_rel($#left, \@left), @end); # +build the way between the words last SEARCH; # +and finish the search } } } my @right_ids = $rightstart..$#right; + # all the same :) the tree is build from both ends to speed up the +process $rightstart = $#right; die "Cannot solve! Bad word '$right' :(\n" if $rightstarts{$ri +ghtstart}++ > 2; for my $id (@right_ids) { # build right relational table my @prefix = @{$right[$id]}; my $searched = pop @prefix; push @prefix, $id; foreach my $word (@{$list->{$searched}}) { next if $right{$word}; push @right, [@prefix, $word]; $right{$word} = $#right; # print join " ", @{$right[-1]}, "\n"; #debugging if ( defined(my $l_id = $left{$word}) ) { my @end = reverse print_rel($#right, \@right); shift @end; @way = (print_rel($l_id, \@left), @end); last SEARCH; } } } } return @way; } sub print_rel { my $id = shift; my $ary = shift; my @line; my @rel = @{$ary->[$id]}; push @line, (pop @rel); foreach my $ref_id (reverse @rel) { unshift @line, $ary->[$ref_id]->[-1]; } return wantarray ? @line : join "\n", @line, ""; } sub printway { my @way = @{+shift}; print join "-", @way; print "\n"; } sub build_db { #thanks to Limbic~Region, http://p +erlmonks.org/index.pl?node_id=180961 open (my $dict, '<', '2of12.txt') or die "Unable to open '2of12.tx +t' for reading: $!"; my ($db, %data); while (<$dict>) { chomp; push @{$data{length()}}, $_; } for my $len (keys %data) { my $end = $#{$data{$len}}; for my $i (0 .. $end - 1) { my $word = $data{$len}[$i]; for my $j ($i + 1 .. $end) { my $test = $data{$len}[$j]; if (distance($word, $test) == 1) { push @{$db->{$len}{$word}}, $test; push @{$db->{$len}{$test}}, $word; } } } } store $db, 'dictionary.db'; return retrieve('dictionary.db'); }
Any further enchancements are welcome :)

In reply to Play and win the word morph game with the help of Perl :) by Ieronim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-19 08:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found