Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Constructive criticism of a dictionary / text comparison script

by ajdelore (Pilgrim)
on Aug 29, 2003 at 22:50 UTC ( [id://287852]=note: print w/replies, xml ) Need Help??


in reply to Constructive criticism of a dictionary / text comparison script

This is more a suggestion on functionality than a critique of code. One thing that I ran into with my boggle script is that the unix dict file doesn't have variants of words. For example, it has huge but not hugely, fish but not fishes or fishing, etc.

Ideally, you would have some kind of functionality to address this. One possibility is to stem words before you check them. I know that Lingua::Stem implements one popular algorithim to do this. I didn't look into it close enough to see if it would do the trick for me.

</ajdelore>

  • Comment on Re: Constructive criticism of a dictionary / text comparison script

Replies are listed 'Best First'.
Re: Re: Constructive criticism of a dictionary / text comparison script
by allolex (Curate) on Aug 30, 2003 at 06:35 UTC

    I really like your idea and it would work very well if I were dealing with texts languages that all had a stemming module. I am seriously considering writing one for French. Currently, I am working with Italian, which does have Lingua::Stem::It, but my dictionary has word forms as well. The huge advantage of working with a stemmer is that it is also capable of stemming novel constructions (like stemage), which the dictionary does not account for. It would be a very interesting modification to create a dictionary of stem forms, but it would also be a lot more work checking its accuracy.

    What would really be cool is a stemming module that defined all affixes via a hash of some kind, so that tense, mode/mood, plural, person, etc. could be looked up like

    my %hash_of_verb_suffixes = ( future => qw([ei]rò [ei]rai [ei]rà [ei]remo [ei]rete [ei]ranno), conditional => qw([ei]rei [ei]resti [ei]rebbe [ei]remmo [ei]reste [ +ei]rebbero) )

    and so on.

    Oh, wait. That's a POS tagger;)

    In any case, I can see we think along similar lines. Thanks!

    --
    Allolex

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://287852]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 09:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found