Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: De Duping Street Addresses Fuzzily

by punkish (Priest)
on Jan 31, 2005 at 22:24 UTC ( [id://426745]=note: print w/replies, xml ) Need Help??


in reply to De Duping Street Addresses Fuzzily

it is a tedious problem, but you could start by splitting each record into its components -- streetnum, streetpredir, streetname, streettype, streetpostdir, etc... There are accepted standard values available. Just search on the web. Then match the split-ted values against your reference lookup table. Once you have done that, you could come up with some kind of scoring to flag the most likely to least likely. Depending on how many records you have to scan and how often you have to do this, as well as how important it is for you to not have false positives/negatives in the match, you can decide if this exercise is worth your time. Otoh, if your employer is paying you for this...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://426745]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (None)
    As of 2024-04-25 01:13 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found