Of Mysql, DBI and PK best practices

jimbus has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Of Mysql, DBI and PK best practices by jimbus (Friar) on Apr 17, 2007 at 14:32 UTC
I parallel posted this to MySQL's forum and it was suggested that I was looking for the MySQL syntax "INSERT ... ON DUPLICATE KEY UPDATE". I'd never heard of such a thing and a brief google search showed it is probably what I'm looking for (I'm a pretty big believer that with reasonable usage, the db is usually going to be more efficient that anything I write). I spent hours searching Google for options on what I was seeing: "MySQL primary key violation on insert" and got no where. --Jimbus aka Jim Babcock Wireless Data Engineer and Geek Wannabe jim-dot-babcock-at-usa-dot-com	[reply]
Re^2: Of Mysql, DBI and PK best practices by andye (Curate) on Apr 17, 2007 at 17:59 UTC
Hi jimbus, I've used `ON DUPLICATE KEY UPDATE` in MySQL and it pretty much does exactly what it says on the tin. Seems to work well. A more database-neutral alternative is to `select` the key from the table, and if it doesn't exist then `insert`, otherwide `update`. You'll need to wrap this in a transaction to avoid the race condition kyle mentions above (unless you are the only process working on the db). In MySQL this means you either need to get a table lock, or switch your table type to InnoDB and do a row-level transaction as described on this page: InnoDB locking reads. By the way, `ON DUPLICATE KEY UPDATE` will lock the auto-increment table if you have an auto-increment field in your table, so this situation is one that can cause locking problems whichever solution you choose, if you have a lot of concurrent inserts/updates. (update: will only lock the auto-increment table if an insert actually takes place, I think - still, a potential problem) As I see it, the problem with the original solution (use insert, check for error, use insert if update didn't work) is, what if the error wasn't a duplicate key error? To do this properly, I think you'd need to check the error type. That way you could try again with an insert if the error was a duplicate key error, or if it was a different error then report the error and exit as usual. You'd need to ensure that `RaiseError` was switched off on your DBI handle. HTH! Best wishes, andye	[reply] [d/l] [select]
Re: Of Mysql, DBI and PK best practices by ptum (Priest) on Apr 17, 2007 at 14:36 UTC
Personally, I tend to do two things differently: First, I select a full list of primary keys and put it in a hash. Before I try to do an insert, I check to see if the primary key for that record exists in the hash. If it does, I do an update instead of an insert. If the primary key isn't in the hash, then I do an insert, and add it to the hash if the insert was successful. This seems to be a little cleaner and (potentially) halves my number of SQL executions. Second, I tend to use the RaiseError option in my database connection so that my program doesn't die when DBI encounters an error. I examine $DBI::errstr to specifically identify the nature of the error (in Oracle, you get error codes) so that I cover my bases for the unexpected. There's no sense in attempting an update if the table doesn't exist or your SQL is malformed. I'm not a MySQL guy, but the use of eval and $@ seems less than precise -- you might react to what you think is a duplicate insert error but is really something much more severe. Update: As noted by kyle above, however, if this code is executed by multiple users/processes, you could end up with a race condition, such that your inserts might still fail because some other guy just added that record since the time that you selected all the primary keys.	[reply]
Re^2: Of Mysql, DBI and PK best practices by dragonchild (Archbishop) on Apr 17, 2007 at 18:28 UTC
Not unless you do your work within a transaction. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re: Of Mysql, DBI and PK best practices by derby (Abbot) on Apr 17, 2007 at 13:43 UTC
Your approach is somewhat reasonable and I can think of a few ways to improve it but why do people say it's bad? I would do the following to improve the situation: look at $@ and make a decision based on PK error or another error. wrap the second execute in an eval also if the second execute fails, write that to a log -derby	[reply]
Re: Of Mysql, DBI and PK best practices by herveus (Prior) on Apr 17, 2007 at 14:23 UTC
Howdy! The alternative is to first select the PK from the table. If you get a result, then you do the update, otherwise do the insert. That means that errors will be real errors. That means you always execute two statements, but your normal condition is that there are no errors. yours, Michael	[reply]
Re^2: Of Mysql, DBI and PK best practices by kyle (Abbot) on Apr 17, 2007 at 14:34 UTC
If there's only ever one client, that's fine. Otherwise, this contains a race condition that the OP does not have. Imagine two clients select, both see nothing, both try to insert, one fails. At that point, the one that failed to insert should update, exactly like what the OP is already doing (but without the select). Update: It also occurs to me that if you're not concerned about a race condition, you could do something like this without the select. First try to update, then check to see how many rows were affected. If there weren't any, do the insert. As I say, this has a race condition also, but it will sometimes be one statement instead of always being two.	[reply]
Re: Of Mysql, DBI and PK best practices by dragonchild (Archbishop) on Apr 17, 2007 at 13:42 UTC
It sounds like you have more than one piece of information for a given PK. In that case, you should be using two tables. Think about it in terms of invoices and lineitems. You go buy 3 things from CDW. You get one invoice indicating the date of purchase, the payer, the shipto address, CDW's invoice ID, etc. That's a record in the invoices table. But, you need to indicate the three lineitems. So, you have a lineitems table which has a FK back to the invoices table. Same idea here. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^2: Of Mysql, DBI and PK best practices by jimbus (Friar) on Apr 17, 2007 at 14:18 UTC
Dragonchild, I was deliberately being 'data neutral' to keep focus on the mechanics... The PK is a date-time based on yyyy-mm-dd hh::00:00 and the information comes from multiple nodes from one system. In the end, I need a report on the usage of all of the nodes and the total for the system based on time. Additionally, log files are 'line count' based, so hours often overlap the files which are processed every 15 minutes. However, thanks for taking the time :) --Jimbus aka Jim Babcock Wireless Data Engineer and Geek Wannabe jim-dot-babcock-at-usa-dot-com	[reply]
Re: Of Mysql, DBI and PK best practices by roboticus (Chancellor) on Apr 17, 2007 at 19:07 UTC
jimbus: For the "general" solution, I'd suggest that you let the database engine handle it, so you can avoid race conditions, etc. You can give it to the database engine by writing an appropriate stored procedure, or a simple if statement. Most database engines I'm aware of will let you do something similar to: `if exists (select * from table where PK='value') update table set col1=val1, col2=val2 where PK='value' else insert table (PK, col1, col2) values ('value', val1, val2)` [download] ...roboticus	[reply] [d/l]
Re: Of Mysql, DBI and PK best practices by RMGir (Prior) on Apr 17, 2007 at 14:31 UTC
Note that you can combine any of the solutions suggested above with "%seenKey" hash, so you can at least know if _this instance_ of the loader has already seen a particular key and go directly to the update branch in that case. Mike	[reply]
Re: Of Mysql, DBI and PK best practices by Errto (Vicar) on Apr 17, 2007 at 18:01 UTC
It would appear that MySQL has native support for this, albeit in a non-standard way. I know that Oracle does as well through the `MERGE` statement. A more standard-respecting way to do it would be to try the update first and then, if the update returns no rows, try the insert, ie. `my $rowcount = $update_stm->execute(@args); if ($rowcount == 0) { $insert_stm->execute(@other_args); }` [download]	[reply] [d/l] [select]


Pathologically Eclectic Rubbish Lister
	PerlMonks