|The stupid question is the question not asked|
Thanks, all, for the help & direction.
I'm trying to put it into practice. Here's the situation:
I have ~20,000 users, each reporting an average of 600 characteristics out of a possible 3,000 characteristics. The objective of this system is to count users who correspond to various collections of characteristics, in the general form:
count all users who reported characteristics a, b &c
I've created a two-column databse that looks like this:
characteristic_id (int), user_id(int)
with no indexes to start, as a baseline. So, for the average user, there are 600 rows, that makes 12-million rows in total.
Following your suggestions, I connected by telnet to execute a few queries.
select count(*) from theTable
select count(*) from theTable where characteristic_id = 123
At this point I'm thinking this is really great. But then
select count(*) from theTable where user_id = 456
And then applying VSarkiss' excellent style:
select count(*) from theTable A inner join theTable B on A.user_id = B.user_id where A.characteristic_id = 123 and B.characteristic_id = 124
So - in effect, nothing to do with Perl.
But a new problem. It takes several dozen such queries to compile the report data I need. Clearly 3-minutes per query is unacceptable.
Personally, I'm amazed by the magnitude of the increase in response time.
Sooooooo - dare I hope to see a two-orders-of-magnitude decrease in response time by adding indexes? Or do you think my database design is fubar?