DBI and JOINs

I suspect this might be one of those things that is going to be situation dependant, but I'd thought I poll the experience here and see what advice I can get.

I'm using Apache::DBI, mod_perl, Mysql, and all that sort of goodness with my site. Let's say that I have two tables (out of several I'm working with) that deal with forums on my site. One table holds a forum ID #, the base node of that forum via it's message ID, and several other specific details on the forum. The other table holds the messageid, the poster, and other details specific to the message, including the forum ID as above for the forum that the message is part of. One cgi script that I am writing will poll all the posts from a given poster, and then return details of those posts to the user, including information about which forum they were in, etc. Now easily, TISMTOWTDI, but the two obvious ones are:

Perform one SQL query to get each message id; then foreach on that messageid to do another SQL to get the forum information, so that if the user has n messages, it would take n+1 SQL queries.
Perform a single SQL query using the JOIN functionality that merges the tables on the forumid, and get all the information with 1 SQL query.

Obviously, the second seems to be reasonable, but when I try such queries at the mysql command line, they seem to take much longer than if I did the n+1 method, but this could just be an apparent effect. The SQL commands are mainly retrieval in this case, with the odd DATE_TO_UNIXTIME conversion of selected data, but no large amount of processing. Because I'm using Apache::DBI and mod_perl, I have a persistence DB connection, so I don't have to worry about start up costs. The SQL queries can be cache'd out prior to running anything, so I don't have to consider those as repeated operations.

Does anyone have any good practical experience in such a situation? Again, I suspect it's a matter of trying it out for myself, and either way, it's not a problem programming-wise.

Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

Comment on DBI and JOINs

Replies are listed 'Best First'.
Re: DBI and JOINs by dws (Chancellor) on Apr 03, 2001 at 05:44 UTC
SQL JOINs should scale nicely provided that the field(s) on which you are JOINing are indexed in all tables that are part of the join. It's important to index all tables, rather than just the ones "on right right" because smart query optimizers will try to guess which of the two (or N) tables to scan first based in part on the number of rows in the table. If they happen to decide on the n>1st table, the 1st table had better be indexed on the join field(s). (It's actually quite a bit more complicated than this, but the takeaway point is that tables aren't necessarily processed left to right.) I don't know whether MySQL does this (and now that you've asked, there goes my evening :) If it doesn't, it might in the future. It's odd that you're finding the JOIN case to be slower from the command line. That's worth some extra examination. It shouldn't be the case. Update: MySQL can be made to explain its query plans. See http://www.mysql.com/doc/E/X/EXPLAIN.html	[reply]
Re: DBI and JOINs by voyager (Friar) on Apr 03, 2001 at 20:02 UTC
Not sure if I follow your question exactly, but an alternative to your "foreach ... do-a-select", is to use the foreach to build a (possibly long) IN clause. Then one sql statement will do it: `my @values = ...; # get the values however my $in_parts = join ',', @values; my $sql = "select * from aTable where some_field in ($in_parts)";` [download] You will have to deal with quotes if your field is not numeric. There will be a little more Perl coding to sort out the results, but a lot better overall performance.	[reply] [d/l]
Re^2: DBI and JOINs by OfficeLinebacker (Chaplain) on Sep 30, 2006 at 03:45 UTC
I do exactly that in one of my programs but for an INSERT statement. a la `My @values # get updated/pushed on whatever--sometimes an arrayref fro +m a selectall_arrayref on another DB of a different subtype $dbh->do("insert into table1 (col2, col3, col4) values('",join("','",@ +values),"')");` [download] Well, something like that. I think by now I've refactored it to set a variable with the join and just interpolate that into the quoted SQL command just as in the post above. _________________________________________________________________________________ I like computer programming because it's like Legos for the mind.	[reply] [d/l]