Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

learning memcached

by punkish (Priest)
on May 21, 2011 at 02:04 UTC ( [id://906015]=perlquestion: print w/replies, xml ) Need Help??

punkish has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to learn memcached, so I installed it on my laptop and fired it up. I also created a simple SQLite db 'mem.sqlite' like so

CREATE TABLE t (id INTEGER PRIMARY KEY, str TEXT)

and filled the table with 20_000 random strings. Then I used the following simplistic script to benchmark.

use Cache::Memcached; my $memd = new Cache::Memcached {'servers' => [ "localhost:11212" ]}; use DBI qw(:sql_types); my $dbh = DBI->connect("dbi:SQLite:dbname=mem.sqlite"); use Benchmark qw(:all); my $count = 40_000; cmpthese($count, { 'query_dbh' => sub { my $id = int(rand(20_000)); my $sth = $dbh->prepare("SELECT str FROM t WHERE id = ?"); $sth->execute($id); my ($str) = $sth->fetchrow_array; open F, ">", "foo.txt" or die $!; print F "id: $id, str: $str\n"; close F; }, 'query_mem' => sub { my $id = int(rand(20_000)); open F, ">", "foo.txt" or die $!; my $str = $memd->get($id); unless ($str) { my $sth = $dbh->prepare("SELECT str FROM t WHERE id = ?"); $sth->execute($id); ($str) = $sth->fetchrow_array; $memd->set($id, $str); } print F "id: $id, str: $str\n"; close F; } }); Rate query_mem query_dbf query_mem 2723/s -- -29% query_dbh 3846/s 41% --

I consistently get results such as above, while I expected the memcache to slowly fill up and speed up the queries way faster than only accessing the file based db. What am I doing wrong, or are my expectations wrong?



when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: learning memcached
by Marshall (Canon) on May 22, 2011 at 03:59 UTC
    Your thinking made a huge leap between "I am trying to learn memcached.." (which is a fine objective) and I want SQLite queries to fun faster. Your benchmark results are certainly believable (it runs slower when trying to use memcached). I certainly would not anticipate any role at all for memcached in an SQLite application.

    First SQLite has its own memory cache and it can be dynamically adjusted by the user. Default is 2MB. A dynamic increase of cache to 200MB would look like this: $dbh->do('PRAGMA cache_size = 200000'); #size in pages Certain operations like index creation can take a LOT of cpu and memory. In the app I'm currently working on, when creating the DB from a "scratch" situation, I run the cache size way up to decrease index creation from 60 to 30 seconds in my app. Big "bang" for one line of code!

    The SQLite code is C/C++ and will do a far faster job of local cache management than some app, that is potentially on another machine. Also problems can arise when inserting cache between you and the DB (the stale data problem, if records are being updated).

    The big hitter in your benchmark probably is the file open operation. Move that out of the loop to get a better idea of what the DB can do. With 20,000 strings, if avg size is 100 bytes, that is 2,000,000 bytes, 2MB. SQLite is already caching that much stuff without you doing anything at all. Making a cache of SQLite's cache is pointless.

      I was using SQLite only because it is easy to use, and I use it just about for everything and everywhere I can. My intent was not to make SQLite queries run faster.

      That said, I actually posted my query on the memcache mailing list and discovered several things. The entire thread is at http://bit.ly/lnfiuv. The major takeaway was that Cache::Memcached was way to slow. Per "Dormando's" suggestion, I installed Cache::Memcached::libmemcached and memcached code actually became about 30% faster than pure SQLite code. I also modified my test to get_multi IDs, as I learned that network travel is just a lot slower than disk travel. Here is the result --

      Benchmark: timing 10000 iterations of query_dbh, query_mem... query_dbh: 6 wallclock secs ( 3.87 usr + 0.92 sys = 4.79 CPU) @ 20 +87.68/s (n=10000) query_mem: 6 wallclock secs ( 2.51 usr + 1.23 sys = 3.74 CPU) @ 26 +73.80/s (n=10000)

      For the record, here is my complete test script, including the code for creating the test db.

      #!/usr/local/bin/perl use strict; use Cache::Memcached::libmemcached; use DBI qw(:sql_types); use Benchmark qw(:all); my $dbh = DBI->connect( "dbi:SQLite:dbname=mem.sqlite","","", {RaiseError => 1, AutoCommit => 0} ); my $memd = new Cache::Memcached::libmemcached ({ 'servers' => [ "localhost:11212" ] }); my $sql = "SELECT Ifnull(str, 'none') FROM t WHERE id = ?"; my @ids = map { int(rand(19_999)) || 1 } (0 .. 20); timethese(10_000, { 'query_mem' => sub { my $hashref = $memd->get_multi(@ids); my @res = (); while (my ($id, $str) = each %$hashref) { unless ($str) { my $sth = $dbh->prepare($sql); $sth->execute($id); ($str) = $sth->fetchrow_array; $memd->set($id, $str); } push @res, [$id, $str]; } out(@res); }, 'query_dbh' => sub { my $sth = $dbh->prepare($sql); my @res = (); for (@ids) { $sth->execute($_); my ($str) = $sth->fetchrow_array; push @res, [$_, $str]; } out(@res); }, }); sub out { my @res = @_; open F, ">", "foo.txt" or die $!; say F "id: " . $_->[0] . ", str: " . $_->[1] for (@res); close F; } sub create_db { my $sth = $dbh->prepare( "CREATE TABLE t (id INTEGER PRIMARY KEY, str TEXT" ); $sth->execute; $dbh->commit; } sub pop_db { my $sth = $dbh->prepare("INSERT INTO t (str) VALUES (?)"); $sth->execute(randstr(rand(100))) for (0 .. 20_000); $dbh->commit; } sub randstr { my $len = shift; my @chr = ('a'..'z','A'..'Z','0'..'9','_', ' '); return join "", map { $chr[rand @chr] } 1 .. $len; }


      when small people start casting long shadows, it is time to go to bed
Re: learning memcached
by Anonymous Monk on May 21, 2011 at 02:44 UTC
    What am I doing wrong, or are my expectations wrong?

    I think you benchmark is flawed. You say you have 20_000 random entries, but you only make 2723 queries.

    Instead of benchmarking for 3 wallclock seconds (the default, -3), try benchmarking multiples of 20_000 iterations.

    After that, I would pre-load entries into the cache, and benchmark retrieval only.

      > I think you benchmark is flawed. You say you have 20_000 random 
      > entries, but you only make 2723 queries.
      

      How did you determine that I made only 2723 queries? The benchmark shows that I made 40_000 queries for each code fragment at a rate of 2723 queries per second with the memcache process and 3846 queries per second with the straight-from-db process. Or, am I reading the results wrong?

      Update: Here is another measurement, this time using timethese. The results are the same as before.

      query_dbh: 24 wallclock secs ( 5.28 usr + 5.29 sys = 10.57 CPU) @ 37 +84.30/s (n=40000) query_mem: 70 wallclock secs ( 7.80 usr + 6.08 sys = 13.88 CPU) @ 28 +81.84/s (n=40000)

      Whatever I am doing, it is far faster to open up the SQLite database every time and query it than it is to query the memory cached value. (I am creating a $dbh and $sth every time, and also opening a file and writing to it every time to kinda emulate a CGI process that starts afresh on every hit of the browser.



      when small people start casting long shadows, it is time to go to bed
        Or, am I reading the results wrong?

        You're not reading the results wrong, I obviously missed the count :)

        Another (insert better word here) "flaw" , is your sql query is too simple :) The way I understand memcached, you get benefit if you cache data which is expensive to calculate

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://906015]
Approved by roubi
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-04-19 15:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found