SQL Result Cache Performance Comparison

mp has asked for the wisdom of the Perl Monks concerning the following question:

I wrote a benchmark to compare 3 caching methods (Cache::FileCache, Cache::MemoryCache, and a simple memory cache from here to cache response from a select using a primary key from a MySQL database.

I was a little surprised to find no speed improvement with Cache::FileCache and Cache::MemoryCache, and was also surprised at how well Randall's hash worked.

Do these results seem realistic, and does the benchmark below look correct?

Results:

 FileCache: 21 wallclock secs (18.24 usr +  1.79 sys = 20.03 CPU) @ 499.25/s (n=10000)
MemoryCache:  9 wallclock secs ( 8.97 usr +  0.02 sys =  8.99 CPU) @ 1112.35/s (n=10000)
SimpleCache:  0 wallclock secs ( 0.41 usr +  0.01 sys =  0.42 CPU) @ 23809.52/s (n=10000)
       raw: 11 wallclock secs ( 6.14 usr +  0.64 sys =  6.78 CPU) @ 1474.93/s (n=10000)
               Rate   FileCache MemoryCache         raw SimpleCache
FileCache     499/s          --        -55%        -66%        -98%
MemoryCache  1112/s        123%          --        -25%        -95%
raw          1475/s        195%         33%          --        -94%
SimpleCache 23810/s       4669%       2040%       1514%          --

Here is the code:

use strict;
use warnings;

my $DB = "test";
my $USER = "dbusername";
my $PASS = "dbpassword";
my $DEBUG = 0;
my $BENCHMARK_LOOP_LIMIT = 10_000;
my $EXPIRATION_TIME = 60;

use Data::Dumper;
use DBI;
my $dbh = DBI->connect("dbi:mysql:$DB", $USER, $PASS, {RaiseError => 1
+})
  || die "Cannot connect to db";

### Prepare caches
use Cache::FileCache;
my $fc = Cache::FileCache->new({default_expires_in => $EXPIRATION_TIME
+});
$fc->Clear;

use Cache::MemoryCache;
my $mc = Cache::MemoryCache->new({default_expires_in => $EXPIRATION_TI
+ME});
$mc->Clear;                     # Superfluous

my %cache;
my $flush_interval = 10;
my $flush_time = time + $flush_interval;

use Benchmark qw( timethese cmpthese );
my $results = timethese
  ($BENCHMARK_LOOP_LIMIT,
   {
    raw         => sub {my $r = raw_load_from_cachetest(5000 + int(ran
+d(30)));},
    FileCache   => sub {my $r = fc_load_from_cachetest(5000 + int(rand
+(30)));},
    MemoryCache => sub {my $r = mc_load_from_cachetest(5000 + int(rand
+(30)));},
    SimpleCache => sub {my $r = sc_load_from_cachetest(5000 + int(rand
+(30)));},
   },
  );
cmpthese($results);

# Load straight from database
sub raw_load_from_cachetest {
  my ($id) = @_;
  my $sql = qq[select * from cachetest where id=?];
  my $sth = $dbh->prepare($sql);
  $sth->execute($id);
  my $result = $sth->fetchrow_hashref || return;
  return $result;
}

# Cached load using Cache::FileCache
sub fc_load_from_cachetest {
  my ($id) = @_;
  my $result;
  my $key = "cachetest$id";
  $result = $fc->get($key) || do {
    $result = raw_load_from_cachetest($id);
    $fc->set($key, $result);
    $result
  };
  return $result;
}

# Cached load using Cache::MemoryCache
sub mc_load_from_cachetest {
  my ($id) = @_;
  my $result;
  my $key = "cachetest$id";
  $result = $mc->get($key) || do {
    $result = raw_load_from_cachetest($id);
    $mc->set($key, $result);
    $result
  };
  return $result;
};

# Cached load using simple cache found at: 
sub sc_load_from_cachetest {
  my ($id) = @_;
  ($flush_time, %cache) = time + $flush_interval if $flush_time < time
+;
  $cache{$id} ||= raw_load_from_cachetest($id);
}
[download]

The database table structure looks like this:

create table cachetest (
  id int default '0' not null auto_increment,
  a int default '0' not null,
  b int default '0' not null,
  c int default '0' not null,
  d int default '0' not null,
  e int default '0' not null,
  f int default '0' not null,
  g int default '0' not null,
  primary key(id)
);

and I pre-filled the table with 20,000 rows of pseudo-random data using

use strict;
use warnings;

### Begin configuration
my $DSN = "dbi:mysql:test";
my $USER = "dbusername";
my $PASS = "dbpassword";
### End configuration

use DBI;
my $dbh = DBI->connect($DSN, $USER, $PASS, {RaiseError => 1})
  || die "Cannot connect to db";

$dbh->do(qq[delete from cachetest]);
for(my $i=1;$i<=20000; $i++) {
  $dbh->do(qq[insert into cachetest (a,b,c,d,e,f,g) values (?,?,?,?,?,
+?,?)],
           undef,
           rand(1_000_000),
           rand(1_000_000),
           rand(1_000_000),
           rand(1_000_000),
           rand(1_000_000),
           rand(1_000_000),
           rand(1_000_000),
          );
[download]

Edited: ~Tue Jul 23 21:28:45 2002 (GMT), by footpad: Added <READMORE> tag. Per Consideration.

Comment on SQL Result Cache Performance Comparison Select or Download Code

Replies are listed 'Best First'.
Re: SQL Result Cache Performance Comparison by IlyaM (Parson) on Jul 23, 2002 at 22:02 UTC
SQL query you are using is very simple and doesn't represent real world situations when you want to use caching. MySQL perfomance for record lookup by primary key is very good especially since you are connecting to database via unix socket missing networking layer. No surprise it beats FileCache and MemoryCache which have additional logic that deals with cached data expiration (BTW I bet you can get better perfomance by tuning data expiration settings). In real world normally you need to cache only results of complex queries. Also you may want you caching code to handle data expiration, support caching of big data that you cannot fit in RAM, support cache shared by several processes, etc. When you need all these features in your caching code it is time to consider using something like Cache::Cache. -- Ilya Martynov (http://martynov.org/)	[reply]
Re: Re: SQL Result Cache Performance Comparison by mp (Deacon) on Jul 24, 2002 at 15:25 UTC
The simple memory cache from Randall's article provides data expiration. It never returns cached data older than the timeout period. With its 15x speed improvement over simple MySQL queries, it could be useful for caching simple persistent objects that require only a simple select by primary key to load. It could also be easily extended to limit memory usage (rudimentarily) by limiting the number of entries in the cache. Whether it is worthwhile in a particular application would depend on several factors (memory availability, size of elements being cached, locality of reference, expiration time limit, others?). I agree that (at least with MySQL as the DB, accessed through Unix sockets) it would only be beneficial to use Cache::MemoryCache or *::FileCache to cache more complex queries or data that require a significant amount of perl execution time to construct. (Example: complex joins, selects that return many rows which must be iterated over to retrieve the data, or data structures that are built by combining the results of several queries). Another possible use is in a multi-server environment, where caching lets you transfer part of the work load from an overloaded database server to an less overloaded application server even without a single process performance gain. Any ideas why there is so little performance difference between Cache::FileCache and Cache::MemoryCache in this benchmark? Is that realistic?	[reply]
Re: Re: Re: SQL Result Cache Performance Comparison by IlyaM (Parson) on Jul 24, 2002 at 15:44 UTC
Any ideas why there is so little performance difference between Cache::FileCache and Cache::MemoryCache in this benchmark? Hmm. From your benchmark it looks like Cache::MemoryCache is two times faster than Cache::FileCache. I would not call it 'little performance difference' :) -- Ilya Martynov (http://martynov.org/)	[reply]


We don't bite newbies here... much
	PerlMonks