Re^4: return primary key if duplicate entry exists?

Actually, if you perform an INSERT which results in a duplicate key error, then by definition you must know the key already. How else could you have inserted that record otherwise?

The last_insert_id is only useful with auto-incrementing keys (and even then I find it rarely needed), but in any case auto-incrementing keys should never give a duplicate key error.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Comment on Re^4: return primary key if duplicate entry exists? Select or Download Code

Replies are listed 'Best First'.
Re^5: return primary key if duplicate entry exists? by diyaz (Beadle) on Jan 24, 2016 at 18:58 UTC
that is a good point, and probably a good reason to use unique values as primary keys. I guess I am slightly confused on database design then. I am working with DNA sequences and they can be quite long strings. So I am using a crc32 to check for uniqueness. Would it still be a good idea to use that as a primary key? I merely assumed that was not an ideal key?	[reply]
Re^6: return primary key if duplicate entry exists? by CountZero (Bishop) on Jan 24, 2016 at 19:58 UTC
CRC32 should only be used as a checksum to verify integrity of the data, but not as a key since there is no guarantee whatsoever that it will be unique for each different input and the result is only 32 bits long (4 bytes). What you need is a message digest. Have a look at Digest::SHA1. The digest function will return a 20 byte binary or 40 byte hexadecimal result that still isn't guaranteed to be unique for each different input but given that its result is now 160 bits long, the risk of a collision (i.e. the same digest value for a different input) is much smaller. Anyhow, if two DNA sequences have a different digest value, they are guaranteed to be different. If two sequences have the same digest value, they can still be different (this is called "collision") and you should check the full DNA sequence to make sure they are different or not. But as even the SHA1 digest does not guarantee "uniqueness" it cannot be used as a key in your database. In such cases, you should think of an auto-incrementing primary key and save both the full DNA sequence and its digest in the database. The digest can be used as an index to quickly check if the full DNA sequence is unique or already known and stored in the database. If you find a duplicate digest value then you must check the full DNA sequence to make sure it is not a (rare) collision case. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]


Perl: the Markov chain saw
	PerlMonks