Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Re: Duplicate detection (SQL)

by hartwig (Sexton)
on Oct 20, 2003 at 13:10 UTC ( [id://300561]=note: print w/replies, xml ) Need Help??


in reply to Re: Duplicate detection (SQL)
in thread Duplicate (similarity) detection (SQL)

I assume to check on database level is the best solution but it is not working for everybody, e.g. if the database is popultated already. In that case you have to normalize the data (eg. the given adress: 24 thompsonrd., 10-03 is going to be normalized to -> thompsonroad 24, level 24, unitnumber 3 ...) and eventually validate the data. To do that properly you can easily spend a few hours :) Then it becomes quite easy to check for duplicates:
data{"normalzed key"} =+ 1
Cheers Hartwig

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://300561]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-25 15:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found