OT: Generalizing SQL Select statements

AidanLee has asked for the wisdom of the Perl Monks concerning the following question:

For all of you who find their Perl programming touches on database access, I've a riddle that I have not yet managed to crack. I fear it may come to a compromise in the end, but I thought that after several unproductive hours pounding on this problem I'd see if the Monestary had any ideas.

My application is an administrative web-interface for a client's web site. There are different types of objects that they can create, destroy, and relate to one another, be they pages, contacts, news articles, etc. I'm using the fairly standard method of showing the administrator a list of the same kind of objects, from which they click on one to update it (as well as a link to delete a given object or add a new one).

The problem comes down to implementing this list page, which I've done several times in a number of different ways. Each time I reimplement this functionality I've found a slightly more generalized way (read: prefereable... so that a listing method is not directly tied to what kind of object is being listed) to make these lists.

The snag I've run into is that to get the desired funcionality, my current methods are too slow when dealing with a few thousand objects.

Consider the following schema (mysql):

# --------------------------------------------------------
# Table structure for table 'Nodes'

CREATE TABLE Nodes (
   ID int(10) unsigned NOT NULL auto_increment,
   Date_Created datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
   Date_Modified datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
   Status int(1) unsigned NOT NULL,
   Type int(1) unsigned NOT NULL,
   PRIMARY KEY (ID)
);

# --------------------------------------------------------
# Table structure for table 'Contacts'

CREATE TABLE Contacts (
   ID int(2) unsigned NOT NULL,
   FirstName varchar(48) NOT NULL,
   LastName varchar(48) NOT NULL,
   Company varchar(48) NOT NULL,
   Street1 varchar(48) NOT NULL,
   Street2 varchar(48) NOT NULL,
   City varchar(48) NOT NULL,
   State varchar(24) NOT NULL,
   Zip varchar(12) NOT NULL,
   Phone varchar(24) NOT NULL,
   Email varchar(48) NOT NULL,
   ContactPreference tinyint(3) unsigned NOT NULL,
   Subscribed char(1) NOT NULL,
   PRIMARY KEY (ID)
);

# --------------------------------------------------------
# Table structure for table 'Contact_Groups'

CREATE TABLE Contact_Groups (
   ID int(2) unsigned NOT NULL,
   Title varchar(128) NOT NULL,
   Description varchar(255),
   PRIMARY KEY (ID)
);

# --------------------------------------------------------
# Table structure for table 'Node_Relationships'

CREATE TABLE Node_Relationships (
   ThisID int(2) unsigned NOT NULL,
   ThatID int(2) unsigned NOT NULL,
   RelationshipType int(1) unsigned NOT NULL,
   SortIndex smallint(5) unsigned NOT NULL,
   PRIMARY KEY (ThisID, ThatID, RelationshipType)
);
[download]

Everything is a "Node" (pardon the borrowed vocabulary) including Contacts and ContactGroups. All relationships between nodes are drawn through the Nodes_Relationships table.

Now, the listing of Contacts available for editing lists the Last and First Name, the Phone, Email and the Title of the ContactGroup the contact is a member of. The listing also has the following features:

previous/next buttons, so we only display a portion of the list at a time
the ability to sort on any column, and to keep track of sorting history (so you can sort first by Name, then by Email, etc)
The ability to search on certain columns

All of this is easy until I try to make it possible to search or sort on the ContactGroup column (which is a requirement by the client). The reasons are thus:

I can't use a multi-table query to pull the Contact Group Title down at the same time as the rest of the information, because the relationship is optional (a contact doesn't _have_ to be a member of a group). I could probably use an Outer Join in this instance, but that does not work if another optional relationship (say we make Company it's own object/node type, rather than just a field in the Contact table) needs to be displayed as a column. Additionally, I want a general technique that can be applied to all my nodes, not just Contact nodes.
The other option is to pull all the results off the database and sort the results using perl. This is too slow for two reasons. a) I'm pulling down thousands of records when I only want 30, and b) the database can sort things far quicker than I can in Perl.

I've currently got my application doing number two, above. The problem is that I've been asked to make the system faster, as it's beyond sluggish when dealing with the current data set of 3000-some-odd records. In my mind the solution is to have all the manipulation for searching and sorting done on the database and to return only the 30 records I want displayed. But seeing as I'm running mysql, the only two tools I see useful for this (stored procedures and subselects) aren't available to me.

Do any other monks have an idea of how to speed the system up without sacrificing the aforementioned functionality? Please ask me to clarify anything that seems unclear, as it's all a rather large dilemma and I'm not sure I've managed to keep the whole thing in my head while writing this

Edit: chipmunk 2002-01-15

Comment on OT: Generalizing SQL Select statements Download Code

Replies are listed 'Best First'.
Re: OT: Generalizing SQL Select statements by perrin (Chancellor) on Jan 16, 2002 at 04:27 UTC
You may have generalized yourself into a corner. The fastest way to deal with optional data is to make it required in the actual database but have a value that means "NO DATA". Then you can use a normal join, which will outperform an outer join. If you can't do that, you have to use an outer join or do multiple queries, one for each record. Multiple queries would be slower than what you're doing now, so I don't recommend it. Be careful when generalizing your schema. One of the things databases do is enforce certain rules about your data relationships. If you don't let the database do that, you can get into tricky territory.	[reply]
Re: OT: Generalizing SQL Select statements by screamingeagle (Curate) on Jan 16, 2002 at 05:37 UTC
You could pull the records off the tables faster using resultset paging. I seem to recall there is a built-in functions in mySQL u can use to do that, namely "limit", which u need to include in the SELECT statement (check the mySQL documentation).Retreiving them in batches of 20 or 30 should make the pages load much faster.	[reply]
Re: OT: Generalizing SQL Select statements by metadoktor (Hermit) on Jan 16, 2002 at 04:41 UTC
But seeing as I'm running mysql, the only two tools I see useful for this (stored procedures and subselects) aren't available to me. You may want to look into PostgreSQL which is as far as I know far superior to MySQL. metadoktor "The doktor is in."	[reply]