Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: De Duping Street Addresses Fuzzily

by m-rau (Scribe)
on Feb 07, 2005 at 16:49 UTC ( [id://428739]=note: print w/replies, xml ) Need Help??


in reply to De Duping Street Addresses Fuzzily

There are severe problems to identify typos. You might want to use Text::Levenshtein to identify typos. But I really do not know if this works. The module implements the Levenshtein edit distance, a measure of the degree of proximity between two strings. The distance is the number of substituations, deletions or insertions (edits) needed to transform one string into the other one (and vice versa). Of course, you can use this after having cleansed the data, only (fifth => 5th, etc.).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://428739]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-26 04:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found