Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

When i was looking at the postal regulations for my periodicals license for The Perl Review, I discovered that there is a whole industry that does just this: give them a list and they give it back to you with duplicates removed and addresses cleaned-up.

This service comes as a true service (someone else does the work) or off-the-shelf software. Depending on how much work you have to do and how much it is worth to the company, you might want to skip doing this yourself.

However, if you program it yourself, part of the solution (along with what people have already mentioned) is getting the canonical address. People will often give you their version of their address (for instance, I can't remember if my street is a Road or an Avenue, so I use them interchangeably). The US Postal Service has all sorts of data and tools to help you figure out what it should be based on the zip code. Other post offices have similar things (the Royal Mail address lookup kicks ass). From there you get closer to finding the duplicates than just looking at the address the customer gave you or someone keyed in.

Good luck and let us know how it turns out.

--
brian d foy <bdfoy@cpan.org>

In reply to Re: De Duping Street Addresses Fuzzily by brian_d_foy
in thread De Duping Street Addresses Fuzzily by patrickrock

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-18 06:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found