To cache or not to cache

blogical has asked for the wisdom of the Perl Monks concerning the following question:

An open query:
I'm looking for resources and tips concerning the wisdom and best practices of handling data (and possibly preserving) coming out of a DB. Specifically, on my site I'm using mySQL as the definitive source for info that I'm accessing through an object representing a site user. When I ask for info about the user (say $user->get('zipcode')) I first check if the data is currently defined in the instance. If not, I check the session cache (currently Cache::Filecache) and finally, if it hasn't been found, I query the DB. I store any retrieved info in the instance, then the DESTROY method caches any instance data for later use in this session.

I'm also pre-loading any other relevant data from any called table in anticipation of it being requested (i.e., if I ask for the 'zipcode' I might also ask for the 'street' and 'town', which are on the same record, so get them all on the same DB call.) My records are fairly small, so this shouldn't eat up much memory that wouldn't have been used, and at best avoids several DB calls.

Or so my line of thought goes... but this is all fairly new to me. I can test performance now for an accurate current answer, but I expect the environment I'm running this in to change several times from here to later production use. I thought I'd see if any fellow monks might offer up insight into this area (or share their favorite resource links) so I can anticipate what the best practice might look like. Thanks!

Comment on To cache or not to cache Download Code

Replies are listed 'Best First'.
Re: To cache or not to cache by perrin (Chancellor) on Mar 13, 2006 at 18:39 UTC
Cache::Filecache is often slower than MySQL. I suggest you switch to Cache::FastMmap instead. It's one of the only things around that are faster than a MySQL query.	[reply]
Re^2: To cache or not to cache by diego_de_lima (Beadle) on Mar 15, 2006 at 20:35 UTC
I´ve writen a fully functional persistent framework which was finished just some weeks ago. My experience was: 1. I tried Class::DBI and Tangram first, but they were too poor in resources for the project we were creating. 2. Decided to create my own persistence class. Started simple, finished complex (3000 lines of intensive programing). 3. I´ve found the need to cache all schema definitions, permissions, user info, etc. Choosed MLDBM with Storable and BerkeleyDB, just becouse it is faster than anything else (Cache::, MySQL, etc). There iss no cache os table data in the system, but only system/schema definitions. 4. My persistence class loads up all columns of the row into memory executing a "SELECT ", with no need to access Postgres latter. This data is stored in the object properties. When the object is destroyed, all data is destroied and must be instantiated again with a new SQL statement in the next page loaded (it´s a web app). But the schema information (column names, types, sizes, etc.) is cached. 5. The performance is quite nice. Diego de Lima	[reply]
Re^3: To cache or not to cache by perrin (Chancellor) on Mar 15, 2006 at 21:19 UTC
I think you might have been happy with Rose::DB::Object. It's a more complete and faster system than the others.	[reply]
Re: To cache or not to cache by Tanktalus (Canon) on Mar 13, 2006 at 18:36 UTC
From what I've gleaned from perlmonks, if faced with a similar question, I would: Pay close attention to Class::DBI. You can set it up to load the "bulk" of a user at once, but leave larger, less-often-used columns out until you need them. Tune my database to do the caching for me. Databases are good at this. Well, if they aren't, they should be (try a different database). And, at the very least, they're probably better than I am. Especially once I start scaling. A small, single-machine server is one thing. But if it grows such that a single machine is used solely for the db, and then there are multiple webservers, having the cache done by the db will probably save you time (both in development and at runtime). And then, if you grow further such that you need multiple machines to act as your server, again, I'm going to guess that the database cache will be even more effective for this. Also, the database can take care of concurrency issues, rather than having to worry about stale data sitting in your local cache. But that's just me. I worry about the unlikely issues of scalability because I'm a dreamer ;-) (That, and if I got a job where it mattered, I'd already have a bit of experience thinking about it.)	[reply]
Re: To cache or not to cache by philcrow (Priest) on Mar 13, 2006 at 16:36 UTC
We usually start without caching, then add it for pieces that seem slow. Phil	[reply]
Re^2: To cache or not to cache by gloryhack (Deacon) on Mar 13, 2006 at 20:28 UTC
That's something I insist upon when I'm da boss: Don't optimize until you know where the time is spent. I just recently brought a long-running process down from 2+ hours per invocation to just over five minutes per, using aggressive caching (and lots of RAM!) but I would have guessed wrong had I not profiled first. It turned out that the chunk I thought would be the biggest CPU hog was actually number six on the list, and not worth fooling with.	[reply]
Re: To cache or not to cache by kwaping (Priest) on Mar 13, 2006 at 17:41 UTC
This isn't directly related to the OP's question, but it sparked my curiousity: When I ask for info about the user (say $user->get('zipcode')) I first check if the data is currently defined in the instance. If not, I check the session cache (currently Cache::Filecache) and finally, if it hasn't been found, I query the DB. How much overhead do you (OP and others reading this) think this adds per request? I guess the value would be highly dependent on how frequently the request makes it to the DB request phase. I wonder if maybe a persistent DBI connection might provide more overall value than locally caching the data. --- It's all fine and dandy until someone has to look at the code.	[reply]
Re^2: To cache or not to cache by blogical (Pilgrim) on Mar 13, 2006 at 18:12 UTC
Hopefully it reduces the overhead over the course of a session. It seems better than pre-loading everything that might be needed into the instance for every request, or calling the DB for each piece individually as it is requested over the course of the session. Look as close to home as possible first (memory->cache->DB) Trade a little memory/storage space for the length of the request/session to cut down on DB calls. Try to avoid DB calls by retrieving relevant info on the first call. Reusable data migrates into the cache, and lives there until the session ends- hopefully quicker and lower cost than DB calls. I'm using a persistant dbhandle (or it WILL persist when I get this onto mod_perl). But calling also requires figuring out where to get the data from, forming the request, dealing with the call...	[reply]