http://qs321.pair.com?node_id=1161650


in reply to Re: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Just to clarify, I'm not a computer science student and my deadline is not a class deadline. I'm an artist with a background in mathematics, and I also occasionally publish crosswords and other word puzzles. My sense was that Perl and regular expressions could be enormously helpful in the latter, so I've been teaching myself with the O'Reilly book, but haven't learned enough yet to solve on my own a particular matching question that I could really use an answer to in the next few days, as it's to do with a printer deadline for a puzzle that I'm writing for my best friend's wedding.

Thanks for these thoughts on the spell-checking approach. The code looks enormously interesting but obviously a lot to work through, and maybe more than I need at the moment? The solutions above appear to be able to tackle my specific problem, but perhaps there is something I'm not seeing.

  • Comment on Re^2: Comparing Lines within a Word List

Replies are listed 'Best First'.
Re^3: Comparing Lines within a Word List
by poj (Abbot) on Apr 27, 2016 at 16:40 UTC

    Try this with your dictionary file. If performance is a problem break into separate files according to word length.

    #!perl use strict; open IN,'dict.txt' or die "$!"; my %dict=(); for (<IN>){ chomp; $dict{uc $_}=1; }; close IN; for my $word (sort keys %dict){ next unless $word =~ /R/; my @f = split //,$word; # loop over each letter # changing R to S # to create new word in $w for my $i (0..$#f){ if ($f[$i] eq 'R'){ my $w = $word; substr($w,$i,1) = "S"; # check if generated word exists in dict if (exists $dict{$w}){ print "$word $w\n"; } } } }
    poj
Re^3: Comparing Lines within a Word List
by Anonymous Monk on Apr 27, 2016 at 16:40 UTC

    That's fine of course, we just get a lot of "do my homework for me" type questions here. The code works (at least on my machine) and as far as I can tell from your descriptions in this thread it generates the output you're looking for, so feel free to just use it. The learning curve of this particular code may be a little steep, and it contains a lot of benchmarking/statistics code which of course you don't need, but at least the theory behind a Trie is still something you can look into if you like. Doing a linear search on a long word list is inefficient when run often (which doesn't seem to be the case in your situation) so the size/speed tradeoff of a trie can be very useful.