mysql's join too slow; using Perl to compare two tables

Andre_br has asked for the wisdom of the Perl Monks concerning the following question:

Hello esteemed fellow monks

I hope you guys can help me on this. Here´s the problem: I have a table named 'catalog', that´s where the products are kept. It´s something like this:

table 'catalog' 
product_id 
author 
title 
label 
description 
price 
vendor_id 
log
[download]

From time to time, my vendors send me a .txt file with their updated inventory, wich has the following columns:

author 
title 
label 
description 
price
[download]

So, that´s when I have to check wich CDs are new at theis inventory, you know? I could just erase their entire online inventory, and then record the new one they´ve just sent, but this ain´t good because it´s important that the product_ids of the veteran products don´t change, you know?

Then, what I do? I insert the bew data on a table I call 'pivot', wich has the same fields as the ones send me, and also a received_id:

table 'pivot' 
received_id 
author 
title 
label 
description 
price
[download]

Then I perform this inner join below, to find out the received_ids for the products that are in fact veteran (wich will be erased from the 'pivot' table before inserting the remaining into the catalog (so there are only new products there)).

select pivot.received_id 

from catalog inner join pivot on 

( catalog.author = pivot.author and 

catalog.titulo = pivot.titulo and 

catalog.label = pivot.label and 

catalog.description = pivot.description and 

catalog.price = pivot.price );
[download]

The problem is this inner join with multiple ons takes too much effort and time - we´re talking about a 100k records inner join! I was looking for another way to compare these two tables without this inner join, but it occurs to me now that maybe I could use Perl instead of mysql.

I could, insted of using this 'pivot' table, load the 'catalog' data into an array - being each element a concatenation like ('thisauthor thistitle thislabel thisdescription thisprice', 'thisanotherauthor thisanothertitle thisanotherlabel thisanotherdescription thisanotherprice', ...) and so on. Then I could do the same with the data the vendor sends me, and then find out the redundant ones with grep. Well, don´t know it this will sound like an heresy, sorry if it does.

What do you think about?

Thanks a lot!

Andre

Comment on mysql's join too slow; using Perl to compare two tables Select or Download Code

Replies are listed 'Best First'.
Re: mysql's join too slow; using Perl to compare two tables by reneeb (Chaplain) on Feb 04, 2006 at 23:07 UTC
why not `my $st_select = q~SELECT product_id FROM catalog WHERE author = ? AND title = ? AND label = ? AND description = ? AND price = ?~; my $sth = $dbh->prepare($st_select) or die $dbh->errstr(); for my $arrayref(@data_from_txt){ $sth->execute(@$arrayref) or die $dbh->errstr(); unless($sth->fetchrow_array()){ print "This is not in catalog: ",Dumper($arrayref); } }` [download] in `@data_from_txt` all information from the txt with an arrayref for each dataset...	[reply] [d/l] [select]
Re: mysql's join too slow; using Perl to compare two tables by monarch (Priest) on Feb 04, 2006 at 22:59 UTC
I have personally found myself somewhat disgusted with MySQL performance when it comes to simple joins. I don't know if this is because I didn't index the tables properly of some inherent disability in MySQL. Postgres, on the other hand, seems to take the same time to do inner, outer, or whatever style joins with no performance penalty. So, no, it's not wrong to scan each table one at a time and getting perl to do the intelligent work with MySQL.	[reply]
Re^2: mysql's join too slow; using Perl to compare two tables by friedo (Prior) on Feb 04, 2006 at 23:15 UTC
I don't know if this is because I didn't index the tables properly... That's probably it. Proper indexing is essential with any non-trivial amount of data.	[reply]
Re^2: mysql's join too slow; using Perl to compare two tables by dws (Chancellor) on Feb 05, 2006 at 21:00 UTC
Postgres, on the other hand, seems to take the same time to do inner, outer, or whatever style joins with no performance penalty. ... Unless you don't have the necessary columns indexed, in which case performance can appear to suck mightily, particularly once you've left a small, tidy test environment and start dealing with huge piles of customer data.	[reply]
Re: mysql's join too slow; using Perl to compare two tables by srdst13 (Pilgrim) on Feb 05, 2006 at 01:51 UTC
Probably the simplest way to do this is to do a select using your "new" data minus any information that might have changed (like the price). Then, if you get a result back, do an update. If not, then do an insert. Pretty simple, I think. As for the speed issue, I'm fairly sure that this is an indexing issue. I routinely do inner joins of tables with 60 million rows in mysql on a web-based app. For best performance, any column that appears in a where clause or a join clause should be indexed, in general. Sean	[reply]
Re: mysql's join too slow; using Perl to compare two tables by converter (Priest) on Feb 05, 2006 at 15:27 UTC
Are you using indices, and if so, are you certain that the database is doing index scans instead of sequential scans in your query? Before you give up on MySQL and try to solve the problem with Perl, you should have a look at the EXPLAIN and ANALYZE statements. Since you didn't mention which version of MySQL you're using, I'll take a guess and include Optimizing Queries with EXPLAIN for versions 3.23 through 4.1. converter	[reply]
Re: mysql's join too slow; using Perl to compare two tables by JamesNC (Chaplain) on Feb 05, 2006 at 16:53 UTC
I have to agree with friedo. This sounds like an index issue. What kind of indexes do you have on catalog and pivot? If you don't know what I am talking about, then you need to educate yourself on indexes. Here is how you would create some indexes on these two tables. Enter these in the mysql command line tool. `create index c_atldp on catalog ( author, titulo, label, description, +price ) create index a_atldp on author ( author, titulo, label, description, p +rice )` [download] You can create several different types of indexes depending on how and what kind of queries you are writing and a full discussion of indexes is OT. Sometimes you need to write new ones for a new type of query. JamesNC	[reply] [d/l]
Re^2: mysql's join too slow; using Perl to compare two tables by Andre_br (Pilgrim) on Feb 05, 2006 at 20:36 UTC
Ok! I've implemented the indexes and things are now warp speed! Mysql is great, I was the one screwing up... lghs. Also because I was imposing a conversion to binary of all the fields, to make sure the different cases weren´t ignored when comparing similar records: `select pivot.received_id from catalog inner join pivot on ( binary catalog.author = binary pivot.author and binary catalog.titulo = binary pivot.titulo and binary catalog.label = binary pivot.label and binary catalog.description = binary pivot.description and binary catalog.price = binary pivot.price );` [download] I´ve put the indexes and things didn´t get any better. Then I decided to get less perfectionist, and took away all the binaries... ok, mysql, no problem, ths isn´t really thaaaat necesary for me. Well, then the join that was taking 3 minutes, surprised me with the output screen in, guess what, 1 second! Thanks for all the inputs, my fellow coders.And sorry for not having mentioned this 'binary' issue, as I thought it might not be it. But as you directed me to the only one thing that could explain the slowness, and it didn´t solve it, I could start suspecting about the binary. Take care Andre	[reply] [d/l]
Re: mysql's join too slow; using Perl to compare two tables by badaiaqrandista (Pilgrim) on Feb 05, 2006 at 23:10 UTC
I think you should try to use INSERT ON DUPLICATE KEYS UPDATE statement. It checks if the data you want to insert is already there, if it is, update the columns instead of creating new record. With this you don't need to do any select query at all and it (theoritically) will increase performance because you only do one INSERT query. The only thing you need is to create a unique index for the fields you mention (author, etc). -------------- badaiaqrandista	[reply]
Re^2: mysql's join too slow; using Perl to compare two tables by srdst13 (Pilgrim) on Feb 06, 2006 at 12:17 UTC
I hadn't seen this before. Nice little trick. Sean	[reply]
Re: mysql's join too slow; using Perl to compare two tables by TedPride (Priest) on Feb 05, 2006 at 13:15 UTC
Do a SELECT for all the existing data. Convert the fields to some standard format (in this case, lowercase tab delimited is probably fine) and create a hash for fields => id, or fields hash => id. Now all you have to do is take the new data, convert it also to the same format, and check to see if each item is in the hash. If it is, UPDATE using the id. If it isn't, do an INSERT. If memory use is a problem, create a dump of the existing data, do the above processing on your own computer rather than the server, and split the new data into two files, one for UPDATE and one for INSERT. The server will then be able to process the result with minimum effort.	[reply]


Syntactic Confectionery Delight
	PerlMonks