Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Efficient Fuzzy Matching Of An Address

by BrowserUk (Patriarch)
on Aug 19, 2008 at 18:10 UTC ( [id://705290]=note: print w/replies, xml ) Need Help??


in reply to Efficient Fuzzy Matching Of An Address

I think the mechanism I describe in the subthread starting at Re^3: Comparing text documents could be adapted to this purpose depending upon what your actual goal is?

When you say "Given an address as input, find any "similar" addresses in the DB" your not trying to (for example) locate next door neighbours, but rather locate duplicates with minor typos or transcription errors?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Efficient Fuzzy Matching Of An Address

Replies are listed 'Best First'.
Re^2: Efficient Fuzzy Matching Of An Address
by Limbic~Region (Chancellor) on Aug 19, 2008 at 18:22 UTC
    BrowserUk,
    When you say "Given an address as input, find any "similar" addresses in the DB" your not trying to (for example) locate next door neighbours, but rather locate duplicates with minor typos or transcription errors?

    The latter (typos and transcription errors) with a caveat. If the input is '123 Main St' and the DB has a record of '356 Main Street', I would hope that the search wouldn't return "no results found". On the other hand, if the input was '12 Elm Ave' (two houses over), then the search should definately not find a match. Ultimately, I would like to return the best N matches ordered by degree of similarity.

    Thanks for the pointer to Re^3: Comparing text documents. If I get real motivated, I will try it out.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://705290]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-04-26 05:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found