Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: abbreviation checking

by dree (Monsignor)
on Dec 02, 2002 at 20:34 UTC ( #217047=note: print w/replies, xml ) Need Help??

in reply to abbreviation checking

Better than soundex are Metaphone and Double Metaphone.

If you have to compare phrases there is Text::PhraseDistance

Replies are listed 'Best First'.
Re: Re: abbreviation checking
by Anonymous Monk on Dec 02, 2002 at 22:05 UTC
    While making an MP3-renaming script, which attacks a problem similar to yours, I used a combination of Metaphone and "distance" modules. My approach:
    1. Get a list of "known-good" words. I use already-verified MP3 filenames as a source of these.
    2. Calculate their Metaphones.
    3. Calculate the Metaphone of any new words and look for matches. If none, see if there are any matches with a distance of 1 or 2. Distances larger than 2 produce too many matches.
    4. Have the user confirm the 'corrections'.
    It's not an exact science, and human intervention is unavoidable if correctness matters.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://217047]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2022-05-27 05:51 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (94 votes). Check out past polls.