Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

To cache or not to cache

by blogical (Pilgrim)
on Mar 13, 2006 at 16:06 UTC ( #536286=perlquestion: print w/replies, xml ) Need Help??

blogical has asked for the wisdom of the Perl Monks concerning the following question:

An open query:
I'm looking for resources and tips concerning the wisdom and best practices of handling data (and possibly preserving) coming out of a DB. Specifically, on my site I'm using mySQL as the definitive source for info that I'm accessing through an object representing a site user. When I ask for info about the user (say $user->get('zipcode')) I first check if the data is currently defined in the instance. If not, I check the session cache (currently Cache::Filecache) and finally, if it hasn't been found, I query the DB. I store any retrieved info in the instance, then the DESTROY method caches any instance data for later use in this session.

I'm also pre-loading any other relevant data from any called table in anticipation of it being requested (i.e., if I ask for the 'zipcode' I might also ask for the 'street' and 'town', which are on the same record, so get them all on the same DB call.) My records are fairly small, so this shouldn't eat up much memory that wouldn't have been used, and at best avoids several DB calls.

Or so my line of thought goes... but this is all fairly new to me. I can test performance now for an accurate current answer, but I expect the environment I'm running this in to change several times from here to later production use. I thought I'd see if any fellow monks might offer up insight into this area (or share their favorite resource links) so I can anticipate what the best practice might look like. Thanks!

Replies are listed 'Best First'.
Re: To cache or not to cache
by perrin (Chancellor) on Mar 13, 2006 at 18:39 UTC
    Cache::Filecache is often slower than MySQL. I suggest you switch to Cache::FastMmap instead. It's one of the only things around that are faster than a MySQL query.
      Ive writen a fully functional persistent framework which was finished just some weeks ago. My experience was:

      1. I tried Class::DBI and Tangram first, but they were too poor in resources for the project we were creating.

      2. Decided to create my own persistence class. Started simple, finished complex (3000 lines of intensive programing).

      3. Ive found the need to cache all schema definitions, permissions, user info, etc. Choosed MLDBM with Storable and BerkeleyDB, just becouse it is faster than anything else (Cache::*, MySQL, etc). There iss no cache os table data in the system, but only system/schema definitions.

      4. My persistence class loads up all columns of the row into memory executing a "SELECT *", with no need to access Postgres latter. This data is stored in the object properties. When the object is destroyed, all data is destroied and must be instantiated again with a new SQL statement in the next page loaded (its a web app). But the schema information (column names, types, sizes, etc.) is cached.

      5. The performance is quite nice.

      Diego de Lima
        I think you might have been happy with Rose::DB::Object. It's a more complete and faster system than the others.
Re: To cache or not to cache
by Tanktalus (Canon) on Mar 13, 2006 at 18:36 UTC

    From what I've gleaned from perlmonks, if faced with a similar question, I would:

    • Pay close attention to Class::DBI. You can set it up to load the "bulk" of a user at once, but leave larger, less-often-used columns out until you need them.
    • Tune my database to do the caching for me. Databases are good at this. Well, if they aren't, they should be (try a different database). And, at the very least, they're probably better than I am. Especially once I start scaling. A small, single-machine server is one thing. But if it grows such that a single machine is used solely for the db, and then there are multiple webservers, having the cache done by the db will probably save you time (both in development and at runtime). And then, if you grow further such that you need multiple machines to act as your server, again, I'm going to guess that the database cache will be even more effective for this. Also, the database can take care of concurrency issues, rather than having to worry about stale data sitting in your local cache.
    But that's just me. I worry about the unlikely issues of scalability because I'm a dreamer ;-) (That, and if I got a job where it mattered, I'd already have a bit of experience thinking about it.)

Re: To cache or not to cache
by philcrow (Priest) on Mar 13, 2006 at 16:36 UTC
    We usually start without caching, then add it for pieces that seem slow.

    Phil

      That's something I insist upon when I'm da boss: Don't optimize until you know where the time is spent.

      I just recently brought a long-running process down from 2+ hours per invocation to just over five minutes per, using aggressive caching (and lots of RAM!) but I would have guessed wrong had I not profiled first. It turned out that the chunk I thought would be the biggest CPU hog was actually number six on the list, and not worth fooling with.

Re: To cache or not to cache
by kwaping (Priest) on Mar 13, 2006 at 17:41 UTC
    This isn't directly related to the OP's question, but it sparked my curiousity:

    When I ask for info about the user (say $user->get('zipcode')) I first check if the data is currently defined in the instance. If not, I check the session cache (currently Cache::Filecache) and finally, if it hasn't been found, I query the DB.

    How much overhead do you (OP and others reading this) think this adds per request? I guess the value would be highly dependent on how frequently the request makes it to the DB request phase.

    I wonder if maybe a persistent DBI connection might provide more overall value than locally caching the data.

    ---
    It's all fine and dandy until someone has to look at the code.
      Hopefully it reduces the overhead over the course of a session. It seems better than pre-loading everything that might be needed into the instance for every request, or calling the DB for each piece individually as it is requested over the course of the session.
      • Look as close to home as possible first (memory->cache->DB)
      • Trade a little memory/storage space for the length of the request/session to cut down on DB calls.
      • Try to avoid DB calls by retrieving relevant info on the first call.
      • Reusable data migrates into the cache, and lives there until the session ends- hopefully quicker and lower cost than DB calls.

      I'm using a persistant dbhandle (or it WILL persist when I get this onto mod_perl). But calling also requires figuring out where to get the data from, forming the request, dealing with the call...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://536286]
Approved by planetscape
Front-paged by kwaping
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2022-06-28 11:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (90 votes). Check out past polls.

    Notices?