http://qs321.pair.com?node_id=227180

This morning the database server was spending a great deal of its time waiting for disk data to be read. This meant we had lots of DB requests sitting around for long periods of time while the DB server's CPU was rather idle. This meant the site was down (too slow to be at all useful most of the time and likely to give you errors as long-hung DB queries got killed or other timeouts occurred) for a couple of hours.

I opened a trouble ticket with Pair (who graciously and efficiently hosts us though I don't think we pay them -- I'm not in on that end of things, but I'm thankful). I have not yet gotten word on what they found. But the site appears to be back to more normal responsiveness.

The site has been slower more often than "usual" the last couple of weeks. I think the main reason for this change is simply "popularity".

In the process of looking into this morning's problems, I noticed that Pair has an operating system upgrade planned for next week (FreeBSD STABLE-4.1.x to STABLE-4.6, IIRC) and they told me this has improved disk I/O and so might help site speed in general.

In the process of looking into this morning's problems, I also found evidence that the Zombie mysql on FreeBSD problem (I didn't create that user, but follow its "evil" link) may be causing us slowness even when we aren't doing long-running queries (which was why searching had to be revamped and backups were so tricky to get right). So I pointed Pair to the information and they'll be trying to upgrade MySQL to improve that situation after the operating system is upgraded.

For further performance improvements, start a fund to pay me to take a couple of weeks off my real job and I'll be sure to finally revamp the node cache and a couple of other inefficiencies I've been eyeing for a while. (: (update: that's a joke too -- I'll reply with more on this)

                - tye