Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^4: return primary key if duplicate entry exists?

by CountZero (Bishop)
on Jan 24, 2016 at 18:45 UTC ( [id://1153511]=note: print w/replies, xml ) Need Help??


in reply to Re^3: return primary key if duplicate entry exists?
in thread return primary key if duplicate entry exists?

Actually, if you perform an INSERT which results in a duplicate key error, then by definition you must know the key already. How else could you have inserted that record otherwise?

The last_insert_id is only useful with auto-incrementing keys (and even then I find it rarely needed), but in any case auto-incrementing keys should never give a duplicate key error.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Replies are listed 'Best First'.
Re^5: return primary key if duplicate entry exists?
by diyaz (Beadle) on Jan 24, 2016 at 18:58 UTC
    that is a good point, and probably a good reason to use unique values as primary keys. I guess I am slightly confused on database design then. I am working with DNA sequences and they can be quite long strings. So I am using a crc32 to check for uniqueness. Would it still be a good idea to use that as a primary key? I merely assumed that was not an ideal key?
      CRC32 should only be used as a checksum to verify integrity of the data, but not as a key since there is no guarantee whatsoever that it will be unique for each different input and the result is only 32 bits long (4 bytes).

      What you need is a message digest. Have a look at Digest::SHA1. The digest function will return a 20 byte binary or 40 byte hexadecimal result that still isn't guaranteed to be unique for each different input but given that its result is now 160 bits long, the risk of a collision (i.e. the same digest value for a different input) is much smaller. Anyhow, if two DNA sequences have a different digest value, they are guaranteed to be different. If two sequences have the same digest value, they can still be different (this is called "collision") and you should check the full DNA sequence to make sure they are different or not.

      But as even the SHA1 digest does not guarantee "uniqueness" it cannot be used as a key in your database. In such cases, you should think of an auto-incrementing primary key and save both the full DNA sequence and its digest in the database. The digest can be used as an index to quickly check if the full DNA sequence is unique or already known and stored in the database. If you find a duplicate digest value then you must check the full DNA sequence to make sure it is not a (rare) collision case.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1153511]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-23 23:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found