Re: Perl Hash vs BerkeleyDB vs MySQL
by Corion (Patriarch) on May 09, 2006 at 06:44 UTC
|
The big advantages of BerkeleyDB are:
- Storage on disk instead of memory
- Managed concurrent access
So whenever I want a hash that persists on disk because my RAM isn't big enough to hold all of it, or when I want the data to persist between invocations of the program, I use a tied hash, that maps a Perl hash to a BerkeleyDB.
Another case is when I want multiple instances of a program to (infrequently) write data and the dangers of conflicts are relatively small - then BDB also is convenient.
In other cases, I use a plain Perl hash instead.
| [reply] [d/l] |
|
I agree, but I see some disadvantages also:
- Need for recovery when the process is not normally stopped.
- Other conception of concurrency/locking, different from sql. It is necessary to know it preciselly, before app design.
- No shared memory between processes (as sql machine can do).
| [reply] |
|
What is this "other conception of concurrency" you're referring to? BerkeleyDB allows either database level or page level locks. Page level locks are trickier, since you have to run the deadlock daemon. Database level locks are trivial and require no special knowledge, and perform well.
I don't understand your shared memory comment. MySQL is implemented as multiple threads which share memory, but they don't share memory with your program. BerkeleyDB does use a shared memory cache, and it runs resident in your process, so you are accessing data directly from shared memory, unlike MySQL where you access it over a socket.
| [reply] |
|
|
|
|
Managed concurrent access
What do you mean by that Corion? This is the first time
I heard about this term.
You are saying Perl can't achieve that also?
| [reply] |
|
By concurrent access I mean two or more programs writing to the file virtually at the same time.
BerkeleyDB knows how to lock and unlock the database so that only one program modifies the database at one time and the file doesn't get corrupted. When BerkeleyDB manages the writing, I don't need to worry about this in my programs - and file locking is easy to get wrong. The only thing I have to worry about is when two programs modify the same value at the same time, but that's something I have to think about anyway.
| [reply] |
Re: Perl's Hash vs BerkeleyDB vs MySQL
by ioannis (Abbot) on May 09, 2006 at 10:09 UTC
|
Previous replies focused on ACID properties, with this post
I observe other kinds of differences:
Advantages of Perl Hash:
- Easy to store a structure as value
- Easy to later add more hashes (databases) , but
in db4 you commit on it ahead of time.
- Easier to store multiple values for a key (arrayref)
- No need to worry about recovery and proper shutdown
Advantages of BerkelyDB (not the same as DB_File):
- Easier to access values bases on combination of keys and values
- Can to construct cursors (iterators)
- Can construct custom indexes (not yet supported in the DB4 Perl api)
- The first 4k of data are in memory, the rest on disk
- Can add callbacks to modify the standard access and retrieval methods.
Comparisons with SQL databases miss the whole point of
an embedded database. DB4 is not a multi-process system;
it is an embedded database, a a C library intended meant to run in
the same address space with the main program (when not using rpc).
The general advantages and disadvantages of stand-alone models versus
the client-server model should be noted.
| [reply] |
|
| [reply] |
|
A lot more than 4K of the BerkeleyDB data might also be in memory, in the shared memory cache.
| [reply] |
Re: Perl's Hash vs BerkeleyDB vs MySQL
by dragonchild (Archbishop) on May 09, 2006 at 14:21 UTC
|
DBM::Deep combines the ease-of-use of a hash and the persistence goodnesses of BDB. I suggest you check it out.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
| [reply] |
Re: Perl's Hash vs BerkeleyDB vs MySQL
by wazzuteke (Hermit) on May 09, 2006 at 14:28 UTC
|
Adding to a bunch of other comments here, I've enjoyed combining Perl structures with BerkelyDB. As a multi-level nested data structure, I've found that MLDBM is a good source.
I do want to try and play devil's advicate for a little bit, though. I always (usuall sometimes) try to think a little harder of when I am using a hash to 'see if I can figure out a better way' and implement the same idea using an array instead. Here's my reasonings on that (and mind you, there have been only some rare occasions where I have really been able to better implement an algorithem with an array versus a Perl-ish hash.
I try to remember that a Perl hash is just an associative array. Whereas it is really an array of linked-lists. Therefore, no matter what, it can't be as fast as accessing an array directly. Not only does the key have to be hashed, but there inlies a chance it will have to iterate through n linked values before it finds the right key. Not to say this isn't blazingly fast, though not like the lightning you can pull out of an array index.
Like I said, there have been some seriously rare cases where I (or those amongst me) have been able to pull a smidgen of speed or efficiency out of an array implementation rather than a hash, although always something to put in the back of your mind.
print map{chr}(45,45,104,97,124,124,116,97,45,45);
... and I posted this while I was at work => whitepages.com | INC
| [reply] [d/l] |
MySQL && BDB? (was Re: Perl's Hash vs BerkeleyDB vs MySQL)
by BerntB (Deacon) on May 09, 2006 at 13:38 UTC
|
Let me piggyback another question.
MySQL can use different storage models, including BerkeleyDB (BDB). How much overhead is it if you go the MySQL way -- compared if you use BDB directly?
Any other practical (speed and scaling) differences? (It is easier to use the SQL database, I assume.)
Disclaimer: I am a Postgresql person, so I might be totally confused. :-)
| [reply] |