Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: De Duping Street Addresses Fuzzily

by brian_d_foy (Abbot)
on Jan 31, 2005 at 22:46 UTC ( [id://426753]=note: print w/replies, xml ) Need Help??


in reply to De Duping Street Addresses Fuzzily

When i was looking at the postal regulations for my periodicals license for The Perl Review, I discovered that there is a whole industry that does just this: give them a list and they give it back to you with duplicates removed and addresses cleaned-up.

This service comes as a true service (someone else does the work) or off-the-shelf software. Depending on how much work you have to do and how much it is worth to the company, you might want to skip doing this yourself.

However, if you program it yourself, part of the solution (along with what people have already mentioned) is getting the canonical address. People will often give you their version of their address (for instance, I can't remember if my street is a Road or an Avenue, so I use them interchangeably). The US Postal Service has all sorts of data and tools to help you figure out what it should be based on the zip code. Other post offices have similar things (the Royal Mail address lookup kicks ass). From there you get closer to finding the duplicates than just looking at the address the customer gave you or someone keyed in.

Good luck and let us know how it turns out.

--
brian d foy <bdfoy@cpan.org>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://426753]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-24 12:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found