Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Spell Check

by Anonymous Monk
on May 30, 2000 at 18:54 UTC ( [id://15404]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there anyway to write a spell checker in perl?? Please help.

Replies are listed 'Best First'.
Re: Spell Check
by lhoward (Vicar) on May 30, 2000 at 19:04 UTC
    There is the Text::Ispell module from CPAN.

    If you have a dictionary file (like /usr/dict/words under UNIX) it is easy to write a simple misspelling-detector in perl:

    my %dict; open D,"</usr/dict/words"; while(<D>){ chomp; $dict{lc($_)}=$_; } close D; my $text='Is there anyway to write a spell checker in perl?? Please he +lp.'; my @words=split /[^a-zA-Z0-9']+/,$text; foreach (@words){ if(!defined $dict{lc($_)}){ print "\"$_\" is not in the dictionary\n"; } }
    This could be enhanced significantly, but will get you started on the right path.
      Nota Bene! You should be using Lingua::Ispell, not Text::Ispell.

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

Re: Spell Check
by swiftone (Curate) on May 30, 2000 at 19:15 UTC
    I have no doubt that there are modules that do this (Perhaps Text::Ispell or Lingua::Ispell?), but for a quick 1 minute inefficient version, try:

    $dictfile="/some/path/to/dictfile"; #Dictfile should have a all words (or regex for words), one per line open (DICT, $dictfile) or die "Can't open $dictfile:$!"; while(<DICT>){ $words{lc($_)}=1; #using hash as cheap lookup #Note that a "real" dictionary file will nail you on efficency her +e. Perhaps a Dbm would be better? } while(<>){ foreach $word (split($_)){ if (!defined($words{lc($word)})){ print "$.:$word?\n"; } } }
    (Now the perl wizards will trounce this code, but I'll post it anyways in hopes of learning from them.)
      Now the perl wizards will trounce this code
      Let them, indeed. But you'll still get my vote. Pretty cool in my ragged little book :-)
Re: Spell Check
by KM (Priest) on May 30, 2000 at 19:52 UTC
    You could also use ispell to find out what possible spellings of the word are available:

    my @words = split(" ", "Hello, I live in a blue hoose."); for (@words) { my @spell = `echo $_ | ispell -a -S`; if (grep /^\*/, @spell) { next; }else{ @spell = grep /^\s*&/, @spell; chomp $spell[0]; my @rest = split(/\ |\,\ /,$spell[0]); @rest = splice(@rest,4); if (scalar(@rest) != 0) { print "Alternate spellings for $_: @rest\n"; } } }

    Milage may vary, change as needed.

    This did bring up something I would like to see in Perl (I need to check p5p). I think it would be useful to do:

    my @rest = splice((my @tmp = split(/\ |\,\ /,$spell[0])),4);
    or
    my @rest = splice(split(/\ |\,\ /,$spell[0]),4);

    But, splice() doesn't accept a split() or list assignment as it's first argument. I think it should. But, that is likely more of a discussion for elsewhere.

    Cheers,
    KM

      Thank You

      You have no idea how long I've tried to discover how to get ispell to read from stdin. I've read the man pages ten times (funny though, now that I know the flag is -a, I can look in the man and find it right away.) Sigh.

Re: Spell Check
by Storm (Novice) on May 31, 2000 at 00:56 UTC
    Or you can use Text::Ispell with all it's features. This is pulled from cpan showing all the options except for adding new words.


    use Text::Ispell qw( spellcheck ); Text::Ispell::allow_compounds(1); for my $r ( spellcheck( "hello hacking perl salmoning fruithammer shr +dlu 42" ) ) if ( $r->{'type'} eq 'ok' ) { # as in the case of 'hello' print "'$r->{'term'}' was found in the dictionary.\n"; } elsif ( $r->{'type'} eq 'root' ) { # as in the case of 'hacking' print "'$r->{'term'}' can be formed from root '$r->{'root'}'\n"; } elsif ( $r->{'type'} eq 'miss' ) { # as in the case of 'perl' print "'$r->{'term'}' was not found in the dictionary;\n"; print "Near misses: $r->{'misses'}\n"; } elsif ( $r->{'type'} eq 'guess' ) { # as in the case of 'salmoning' print "'$r->{'term'}' was not found in the dictionary;\n"; print "Root/affix Guesses: $r->{'guesses'}\n"; } elsif ( $r->{'type'} eq 'compound' ) { # as in the case of 'fruithammer' print "'$r->{'term'}' is a valid compound word.\n"; } elsif ( $r->{'type'} eq 'none' ) { # as in the case of 'shrdlu' print "No match for term '$r->{'term'}'\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://15404]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found