http://qs321.pair.com?node_id=152749

Here's the results of some simple performance testing I did on CDB vs BDB (v3) vs SQLite.

The lines marked "SQLite 2" are with large-grained transactions (a commit at the end of the inserts), whereas the other SQLite is with a commit on every row inserted.

Benchmark: timing 30000 iterations of Insert Berkeley, Insert CDB, Ins +ert SQLite, Insert SQLite 2... Insert Berkeley: 10 wallclock secs ( 3.26 usr + 5.66 sys = 8.92 CPU) Insert CDB: 0 wallclock secs ( 0.25 usr + 0.02 sys = 0.27 CPU) (warning: too few iterations for a reliable count) Insert SQLite: 20 wallclock secs ( 9.61 usr + 7.96 sys = 17.57 CPU) Insert SQLite 2: 6 wallclock secs ( 4.92 usr + 0.08 sys = 5.00 CPU) Benchmark: timing 30000 iterations of Select Berkeley, Select CDB, Sel +ect SQLite, Select SQLite 2... Select Berkeley: 5 wallclock secs ( 2.21 usr + 2.66 sys = 4.87 CPU) Select CDB: 2 wallclock secs ( 1.10 usr + 0.83 sys = 1.93 CPU) Select SQLite: 5 wallclock secs ( 4.61 usr + 0.15 sys = 4.76 CPU) Select SQLite 2: 8 wallclock secs ( 4.70 usr + 0.24 sys = 4.94 CPU)

Here's the benchmarking code (it's messy - code comments not welcome ;-)

use DBI; use CDB_File; use BerkeleyDB; # use Benchmark qw(timethese cmpthese); use Benchmark qw(timethese); # do some setup my $id_lit = 0; my $id_lit2 = 0; my $id_bdb = 0; my $id_cdb = 0; my $txt = 'a' x 100; unlink("sqlite.test", "sqlite2.test", "bdb.test", "cdb.test"); my $dbh = DBI->connect('dbi:SQLite:dbname=sqlite.test','','', { AutoCommit => 1, RaiseError => 1 }); my $dbh2 = DBI->connect('dbi:SQLite:dbname=sqlite2.test','','', { AutoCommit => 0, RaiseError => 1 }); $dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)) +'); $dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100) +)'); $dbh->do('PRAGMA default_synchronous = off'); my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?)' +); my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?'); my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ? +)'); my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?'); my %ch; my $cdb = new CDB_File ('cdb.test', "cdb.test.$$") or die "Can't creat +e"; my %bh; my $bdb = tie %bh, 'BerkeleyDB::Hash', -Filename => 'bdb.test', -Flags => DB_CREATE, ; sub insertLite { $id_lit++; $sth_ins->execute($id_lit, $txt); } sub insertLite2 { $id_lit2++; $sth2_ins->execute($id_lit2, $txt); } sub insertCDB { $id_cdb++; $cdb->insert($id_cdb, $txt); } sub insertBDB { $id_bdb++; $bh{$id_bdb} = $txt; } sub selectLite { $id_lit++; $sth_sel->execute($id_lit); 1 while $sth_sel->fetch; } sub selectLite2 { $id_lit2++; $sth2_sel->execute($id_lit2); 1 while $sth2_sel->fetch; } sub selectCDB { $id_cdb++; $ch{$id_cdb}; } my $t1 = timethese(30_000, { 'Insert SQLite' => \&insertLite, 'Insert SQLite 2' => \&insertLite2, 'Insert Berkeley' => \&insertBDB, 'Insert CDB' => \&insertCDB, }, ); $dbh2->commit; $cdb->finish; $dbh->{AutoCommit} = 0; $cdb = tie %ch, 'CDB_File', 'cdb.test' or die "tie failed: $!\n"; # cmpthese $t1; $id_lit = 0; $id_lit2 = 0; $id_bdb = 0; $id_cdb = 0; my $t2 = timethese(30_000, { 'Select SQLite' => \&selectLite, 'Select SQLite 2' => \&selectLite2, 'Select Berkeley' => \&selectBDB, 'Select CDB' => \&selectCDB, }, ); # cmpthese $t2;

Replies are listed 'Best First'.
New Benchmark: SQLite vs CDB_File vs BerkeleyDB
by demerphq (Chancellor) on Mar 20, 2002 at 17:29 UTC
    Hi.

    Well as I stated earlier I had serious reservations about the validity of your benchmark. When I messaged you about it you replied "The point was to use all the modules' simplest interfaces". This is a specious argument. There is no difference between the interface for a DB_HASH or a DB_BTREE. The difference is underneath at the implementation level. And the difference is huge.

    I reran your benchmark without CDB_File, it wont build cleanly on W32, and produced the following results which incidentally do not completely support my earlier coments ( the file size differences are not what I said they would be.) However they do show that your comparison of DB_HASH against DBD::SQLite is misleading (SQLite gets _creamed_ by a $DB_TREE), and they also show that if you are looking for high speed with a BerkeleyDB then you should definately go with a $DB_BTREE instead of a $DB_HASH.

    My results were as follows:

    Benchmark: timing 30000 iterations of 
    	I B+Tree, I B+Tree (A), I Hash, I Hash (A), I SQLite, I SQLite 2...
        I B+Tree:  1 wallclock secs ( 0.88 usr +  0.11 sys =  0.98 CPU) @ 30487.80/s (n=30000)
    I B+Tree (A):  1 wallclock secs ( 0.95 usr +  0.02 sys =  0.97 CPU) @ 30959.75/s (n=30000)
          I Hash:  8 wallclock secs ( 2.25 usr +  4.38 sys =  6.63 CPU) @ 4528.30/s  (n=30000)
      I Hash (A):  7 wallclock secs ( 2.31 usr +  4.58 sys =  6.89 CPU) @ 4353.50/s  (n=30000)
        I SQLite: 76 wallclock secs (14.84 usr + 49.66 sys = 64.50 CPU) @ 465.12/s   (n=30000)
      I SQLite 2:  4 wallclock secs ( 3.30 usr +  0.09 sys =  3.39 CPU) @ 8849.56/s  (n=30000)
      
                    Rate I SQLite I Hash (A) I Hash I SQLite 2 I B+Tree I B+Tree (A)
    I SQLite       465/s       --       -89%   -90%       -95%     -98%         -98%
    I Hash (A)    4354/s     836%         --    -4%       -51%     -86%         -86%
    I Hash        4528/s     874%         4%     --       -49%     -85%         -85%
    I SQLite 2    8850/s    1803%       103%    95%         --     -71%         -71%
    I B+Tree     30488/s    6455%       600%   573%       245%       --          -2%
    I B+Tree (A) 30960/s    6556%       611%   584%       250%       2%           --
    
    
    
    Benchmark: timing 30000 iterations of 
    	S B+Tree, S B+Tree(A), S Hash, S Hash(A), S SQLite, S SQLite 2...
       S B+Tree:  1 wallclock secs ( 0.77 usr +  0.09 sys =  0.86 CPU) @ 34965.03/s (n=30000)
    S B+Tree(A):  1 wallclock secs ( 0.95 usr +  0.05 sys =  1.00 CPU) @ 30000.00/s (n=30000)
         S Hash:  2 wallclock secs ( 1.03 usr +  1.63 sys =  2.66 CPU) @ 11295.18/s (n=30000)
      S Hash(A):  3 wallclock secs ( 1.47 usr +  1.36 sys =  2.83 CPU) @ 10608.20/s (n=30000)
       S SQLite: 11 wallclock secs ( 6.80 usr +  3.69 sys = 10.49 CPU) @ 2861.23/s  (n=30000)
     S SQLite 2:  4 wallclock secs ( 3.89 usr +  0.09 sys =  3.98 CPU) @ 7530.12/s  (n=30000)
     
                   Rate S SQLite S SQLite 2 S Hash(A)    S Hash S B+Tree(A) S B+Tree
    S SQLite     2861/s       --       -62%      -73%      -75%        -90%     -92%
    S SQLite 2   7530/s     163%         --      -29%      -33%        -75%     -78%
    S Hash(A)   10608/s     271%        41%        --       -6%        -65%     -70%
    S Hash      11295/s     295%        50%        6%        --        -62%     -68%
    S B+Tree(A) 30000/s     948%       298%      183%      166%          --     -14%
    S B+Tree    34965/s    1122%       364%      230%      210%         17%       --
    
    Resulting File Sizes:
    bdb.hash.test :5226496
    bdb.btree.test :6701056
    bdb.hash.a.test :5095424
    bdb.btree.a.test :3579904
    sqlite.test :3845120
    sqlite2.test :2043904
    
    And the code I used to produce this benchmark is
    use DB_File; use DBI; use Benchmark qw(cmpthese); use strict; use warnings; # do some setup my $id_BDB_hash = 0; my $id_BDB_hash_a = 0; my $id_BDB_btree_a = 0; my $id_BDB_btree = 0; my $id_lit2 = 0; my $id_lit = 0; my $txt = 'a' x 100; unlink( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a.test", "sqlite.test", "sqlite2.test" ); my $dbh = DBI->connect( 'dbi:SQLite:dbname=sqlite.test', '', '', { AutoCommi +t => 1, RaiseError => 1 } ); my $dbh2 = DBI->connect( 'dbi:SQLite:dbname=sqlite2.test', '', '', { AutoComm +it => 0, RaiseError => 1 } ); $dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)) +'); $dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100) +)'); $dbh->do('PRAGMA default_synchronous = off'); my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?) +'); my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?'); my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ? +)'); my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?'); tie my %bdb_btree_a, "DB_File", "bdb.btree.a.test", O_RDWR | O_CREAT, +0640, $DB_BTREE or die "Cannot open file 'bdb.btree.a.test': $!\n"; tied(%bdb_btree_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 +) } ); tied(%bdb_btree_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) +} ); tie my %bdb_hash_a, "DB_File", "bdb.hash.a.test", O_RDWR | O_CREAT, 06 +40, $DB_HASH or die "Cannot open file 'bdb.hash.a.test': $!\n"; tied(%bdb_hash_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 ) + } ); tied(%bdb_hash_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) } + ); tie my %bdb_hash, "DB_File", "bdb.hash.test", O_RDWR | O_CREAT, 0640, +$DB_HASH or die "Cannot open file 'bdb.hash.test': $!\n"; tie my %bdb_btree, "DB_File", "bdb.btree.test", O_RDWR | O_CREAT, 0640 +, $DB_BTREE or die "Cannot open file 'bdb.btree.test': $!\n"; sub insertLite { $sth_ins->execute( $id_lit++, $txt ); } sub insertLite2 { $sth2_ins->execute( $id_lit2++, $txt ); } sub insertBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ } = $txt; } sub insertBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ } = $txt; } sub insertBDB_btree { $bdb_btree{ $id_BDB_btree++ } = $txt; } sub insertBDB_hash { $bdb_hash{ $id_BDB_hash++ } = $txt; } sub selectLite { $sth_sel->execute( $id_lit++ ); 1 while $sth_sel->fetch; } sub selectLite2 { $sth2_sel->execute( $id_lit2++ ); 1 while $sth2_sel->fetch; } sub selectBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ }; } sub selectBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ }; } sub selectBDB_btree { $bdb_btree{ $id_BDB_btree++ }; } sub selectBDB_hash { $bdb_hash{ $id_BDB_hash++ }; } cmpthese( 30_000, { 'I SQLite' => \&insertLite, 'I SQLite 2' => \&insertLite2, 'I Hash' => \&insertBDB_hash, 'I B+Tree' => \&insertBDB_btree, 'I Hash (A)' => \&insertBDB_hash_a, 'I B+Tree (A)' => \&insertBDB_btree_a, }, ); $id_BDB_hash = 0; $id_BDB_hash_a = 0; $id_BDB_btree_a = 0; $id_BDB_btree = 0; $id_lit2 = 0; $id_lit = 0; cmpthese( 30_000, { 'S SQLite' => \&selectLite, 'S SQLite 2' => \&selectLite2, 'S Hash' => \&selectBDB_hash, 'S B+Tree' => \&selectBDB_btree, 'S Hash(A)' => \&selectBDB_hash_a, 'S B+Tree(A)' => \&selectBDB_btree_a, }, ); foreach my $file ( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a +.test", "sqlite.test", "sqlite2.test" ) { print "$file :" . ( -s $file ) . "\n"; }
    As Matts said, please no comments about the benchmarking code, I know its could be a lot nicer. (Its only a benchmark... :-)

    Yves / DeMerphq
    --
    Benchmarking things isn't as easy or as straightforward as many people seem to think....

      Hi, years later, SQLite is better, and there are some optimisations possible.
      Rate I SQLite I Hash (A) I Hash I SQLite 2 I B+Tree (A +) I B+Tree I SQLite 18927/s -- -50% -51% -78% -80 +% -80% I Hash (A) 37975/s 101% -- -3% -56% -60 +% -60% I Hash 38961/s 106% 3% -- -55% -59 +% -59% I SQLite 2 86957/s 359% 129% 123% -- -9 +% -9% I B+Tree (A) 95238/s 403% 151% 144% 10% - +- 0% I B+Tree 95238/s 403% 151% 144% 10% 0 +% -- Rate S SQLite S Hash(A) S Hash S B+Tree(A) S SQLite +2 S B+Tree S SQLite 32787/s -- -44% -52% -63% -64 +% -68% S Hash(A) 58252/s 78% -- -16% -35% -37 +% -44% S Hash 68966/s 110% 18% -- -23% -25 +% -33% S B+Tree(A) 89552/s 173% 54% 30% -- -3 +% -13% S SQLite 2 92308/s 182% 58% 34% 3% - +- -11% S B+Tree 103448/s 216% 78% 50% 16% 12 +% --
      Here your modified benchmark code:
      use DB_File; use DBI; use Benchmark qw(cmpthese); use strict; use warnings; # do some setup my $id_BDB_hash = 0; my $id_BDB_hash_a = 0; my $id_BDB_btree_a = 0; my $id_BDB_btree = 0; my $id_lit2 = 0; my $id_lit = 0; my $txt = 'a' x 100; unlink( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a.test", "sqlite.test", "sqlite2.test" ); my $dbh = DBI->connect( 'dbi:SQLite:dbname=sqlite.test', '', '', { A +utoCommit => 1, RaiseError => 1 } ); my $dbh2 = DBI->connect( 'dbi:SQLite:dbname=sqlite2.test', '', '', { A +utoCommit => 0, RaiseError => 1 } ); $dbh->do('PRAGMA page_size = 4096'); $dbh->do('PRAGMA synchronous = off'); $dbh->do('PRAGMA cache_size = 6000'); $dbh->do('PRAGMA journal_mode = MEMORY'); $dbh->do('PRAGMA temp_store = MEMORY'); $dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)) +'); $dbh2->do('PRAGMA page_size = 4096'); $dbh2->do('PRAGMA cache_size = 6000'); $dbh2->do('PRAGMA journal_mode = MEMORY'); $dbh2->do('PRAGMA temp_store = MEMORY'); $dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100) +)'); my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?) +'); my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?'); my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ? +)'); my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?'); tie my %bdb_btree_a, "DB_File", "bdb.btree.a.test", O_RDWR | O_CREAT, +0640, $DB_BTREE or die "Cannot open file 'bdb.btree.a.test': $!\n"; tied(%bdb_btree_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 +) } ); tied(%bdb_btree_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) +} ); tie my %bdb_hash_a, "DB_File", "bdb.hash.a.test", O_RDWR | O_CREAT, 06 +40, $DB_HASH or die "Cannot open file 'bdb.hash.a.test': $!\n"; tied(%bdb_hash_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 ) + } ); tied(%bdb_hash_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) } + ); tie my %bdb_hash, "DB_File", "bdb.hash.test", O_RDWR | O_CREAT, 0640, +$DB_HASH or die "Cannot open file 'bdb.hash.test': $!\n"; tie my %bdb_btree, "DB_File", "bdb.btree.test", O_RDWR | O_CREAT, 0640 +, $DB_BTREE or die "Cannot open file 'bdb.btree.test': $!\n"; sub insertLite { $sth_ins->execute( $id_lit++, $txt ); } sub insertLite2 { $sth2_ins->execute( $id_lit2++, $txt ); } sub insertBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ } = $txt; } sub insertBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ } = $txt; } sub insertBDB_btree { $bdb_btree{ $id_BDB_btree++ } = $txt; } sub insertBDB_hash { $bdb_hash{ $id_BDB_hash++ } = $txt; } sub selectLite { $dbh->selectall_arrayref($sth_sel, undef, $id_lit2++ ) } sub selectLite2 { $dbh2->selectall_arrayref($sth2_sel, undef, $id_lit2++ ) } sub selectBDB_btree_a { $bdb_btree_a{ $id_BDB_btree_a++ }; } sub selectBDB_hash_a { $bdb_hash_a{ $id_BDB_hash_a++ }; } sub selectBDB_btree { $bdb_btree{ $id_BDB_btree++ }; } sub selectBDB_hash { $bdb_hash{ $id_BDB_hash++ }; } cmpthese( 60_000, { 'I SQLite' => \&insertLite, 'I SQLite 2' => \&insertLite2, 'I Hash' => \&insertBDB_hash, 'I B+Tree' => \&insertBDB_btree, 'I Hash (A)' => \&insertBDB_hash_a, 'I B+Tree (A)' => \&insertBDB_btree_a, }, ); $dbh2->commit; $id_BDB_hash = 0; $id_BDB_hash_a = 0; $id_BDB_btree_a = 0; $id_BDB_btree = 0; $id_lit2 = 0; $id_lit = 0; cmpthese( 60_000, { 'S SQLite' => \&selectLite, 'S SQLite 2' => \&selectLite2, 'S Hash' => \&selectBDB_hash, 'S B+Tree' => \&selectBDB_btree, 'S Hash(A)' => \&selectBDB_hash_a, 'S B+Tree(A)' => \&selectBDB_btree_a, }, ); foreach my $file ( "bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a +.test", "sqlite.test", "sqlite2.test" ) { print "$file :" . ( -s $file ) . "\n"; }
        against which version of libdb?
Re: SQLite vs CDB_File vs BerkeleyDB
by perrin (Chancellor) on Mar 19, 2002 at 17:21 UTC
    Interesting, thanks for the benchmark. Strong showing from CDB. I'm about to be doing some similar benchmarking on a broader range of stuff (Cache::Cache, Cache::Mmap, etc.) with a focus on multi-process use (with mod_perl), and I will definitely include SQLite after seeing this. I don't know about CDB, since it doesn't allow updates and that makes it basically useless for storing state data.

    I think you hurt BDB performance a bit by using the tied interface. I never use it, since it's slower than calling the methods directly.

      We did consider not using the tied interface, but wanted to make sure we tested what we would actually be using at work, and I couldn't see people liking the non-tied API. Maybe I'll add that, but it's fairly obvious that it'll only speed things up.

      The read-only problem of CDB is frustrating. Though the speed of regenerating the entire DB seems pretty fast, so perhaps it's not so bad for our uses (where we'd be doing rebuilds in a cron job).

      Will there be any mention of Alzabo's Sync and Store classes?
        Probably, if time permits. There are a lot of options to cover.
Re: SQLite vs CDB_File vs BerkeleyDB
by dws (Chancellor) on Mar 19, 2002 at 17:21 UTC
    Excellent!

    One question. Why   $dbh->do('PRAGMA default_synchronous = off'); in the default (commit after each insert) case? Is this not something that could be hidden beneath the options on the connect?

      Otherwise SQLite does a fflush on every row, making it much slower ;-)

      The option is only relevant if you're worried about data integrity during a filesystem crash.

        Otherwise SQLite does a fflush on every row, making it much slower ;-)

        Uh... That sounds like you're defeating AutoCommit => 1, which would make this a misleading benchmark.

Re: SQLite vs CDB_File vs BerkeleyDB
by demerphq (Chancellor) on Mar 20, 2002 at 13:02 UTC
    Interesting. Thanks.

    But I have a reservation: The BerkeleyDB example I feel may be misleading. I base this only on a limited experience with DB_File which is an alternate interface to BerkeleyDB. My problem is this: The DBI examples are using an integer primary key (which would almost certainly be stored internally as a B+Tree). The BDB examples are using a STRING key (despite appearances) and are running in $DB_HASH mode. This is not a fair test.

    First of all the $DB_HASH representation is signifigantly slower and produces much larger files than the equivelent $DB_BTREE (especially under the circumstance your benchmark presents, ie keys being added inorder).

    Second of all even if you convert your benchmark to use $DB_BTREE you will also need to either a) supply your own inorder function so that records are inserted in the correct numeric order and not the lexicographical order that they would be inserted in now or b) supply your own key accessors (my preferred method) that use a pack function. This means the keys are stored in a compact form but also provide numeric ordering EG:

    tied(%db_btree)->filter_fetch_key ( sub { $_ = unpack("N", $_ || 0 ) +} ) ; tied(%db_btree)->filter_store_key ( sub { $_ = pack ("N", $_ || 0 ) } + ) ;
    I wont promise, but im pretty sure your results will change quite a bit with the BDB once you use B+Trees and not the hashing mechanism. Either way I would consider the above issues enough to invalidate the results of this benchmark.

    OTOH, the benchmark is indeed interesting and I look forward to you either posting updated results, or Ill do it when I get time (could be a while though)

    Incidentally this illustrates why SQL is so nice. It hides away representation issues behind an engine that usually will make the correct decision transparently. Whereas using something like BerkeleyDB requires an understanding of the basic data storage algorithms and their domain applicability. Something that we dont all know about. (Until it bites us.)

    Yves / DeMerphq
    --
    This rent for space.