Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Fuzzy matching of text strings

by TedPride (Priest)
on Dec 14, 2005 at 19:38 UTC ( [id://516744]=note: print w/replies, xml ) Need Help??


in reply to Fuzzy matching of text strings

Remove all leading and trailing whitespace; change all instances of more than one whitespace character to a space; lowercase; remove punctuation. Then create a 3-character frequency count and match it against the 3-character frequency count for existing fields. If the percentage match is over a certain level, add possible matches to an array. Sort by percentage match.

Now either choose a record (or field) to merge with, or mark the record as unique and move on. Hopefully you won't have to manually deal with too many records.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://516744]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-20 01:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found