comment on

Interesting. Thanks.

But I have a reservation: The BerkeleyDB example I feel may be misleading. I base this only on a limited experience with DB_File which is an alternate interface to BerkeleyDB. My problem is this: The DBI examples are using an integer primary key (which would almost certainly be stored internally as a B+Tree). The BDB examples are using a STRING key (despite appearances) and are running in $DB_HASH mode. This is not a fair test.

First of all the $DB_HASH representation is signifigantly slower and produces much larger files than the equivelent $DB_BTREE (especially under the circumstance your benchmark presents, ie keys being added inorder).

Second of all even if you convert your benchmark to use $DB_BTREE you will also need to either a) supply your own inorder function so that records are inserted in the correct numeric order and not the lexicographical order that they would be inserted in now or b) supply your own key accessors (my preferred method) that use a pack function. This means the keys are stored in a compact form but also provide numeric ordering EG:

tied(%db_btree)->filter_fetch_key  ( sub { $_ = unpack("N", $_ || 0 ) 
+} ) ;
tied(%db_btree)->filter_store_key  ( sub { $_ = pack ("N", $_ || 0 ) }
+ ) ;
[download]

I wont promise, but im pretty sure your results will change quite a bit with the BDB once you use B+Trees and not the hashing mechanism. Either way I would consider the above issues enough to invalidate the results of this benchmark.

OTOH, the benchmark is indeed interesting and I look forward to you either posting updated results, or Ill do it when I get time (could be a while though)

Incidentally this illustrates why SQL is so nice. It hides away representation issues behind an engine that usually will make the correct decision transparently. Whereas using something like BerkeleyDB requires an understanding of the basic data storage algorithms and their domain applicability. Something that we dont all know about. (Until it bites us.)

Yves / DeMerphq
--
This rent for space.

In reply to Re: SQLite vs CDB_File vs BerkeleyDB by demerphq
in thread SQLite vs CDB_File vs BerkeleyDB by Matts

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl: the Markov chain saw
	PerlMonks