|XP is just a number|
Re: Duplicate detection (SQL)by gmax (Abbot)
|on Oct 20, 2003 at 08:50 UTC||Need Help??|
As tachyon suggests, you should deal with exact matches using a UNIQUE index.
For "similar" records, the most appropriate strategy depends on the nature of your data. There are cases where a part of the record can be duplicated. People have the same first name, or the same surname, or both, but two people with the same name, surname, middle name, and date of birth are suspicious. Products may have the same name, but if they have also the same manufacturer and the same price, they could be an input mistake.
However, checking for identical fields is not enough. You must also account for spelling and common inversion mistakes.
Without entering into details of how I deal with the problem in a specific area, I give you the general strategy. In this sequence of checking steps, you go from one step to the next unless you get a hit. If all the steps fail, then you should not have a duplicate.
This way, most of the checking is done by the database engine. Your code should only prepare the needed queries with the appropriate combinations.
Notice that there is no guarantee that you get all the possible duplicates. It depends on what you want to catch and how much free rein you want to give to your users.
_ _ _ _ (_|| | |(_|>< _|