Perl's Hash vs BerkeleyDB vs MySQL

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl Hash vs BerkeleyDB vs MySQL by Corion (Patriarch) on May 09, 2006 at 06:44 UTC
The big advantages of BerkeleyDB are: Storage on disk instead of memory Managed concurrent access So whenever I want a hash that persists on disk because my RAM isn't big enough to hold all of it, or when I want the data to persist between invocations of the program, I use a `tie`d hash, that maps a Perl hash to a BerkeleyDB. Another case is when I want multiple instances of a program to (infrequently) write data and the dangers of conflicts are relatively small - then BDB also is convenient. In other cases, I use a plain Perl hash instead.	[reply] [d/l]
Re^2: Perl Hash vs BerkeleyDB vs MySQL by pajout (Curate) on May 09, 2006 at 07:39 UTC
I agree, but I see some disadvantages also: Need for recovery when the process is not normally stopped. Other conception of concurrency/locking, different from sql. It is necessary to know it preciselly, before app design. No shared memory between processes (as sql machine can do).	[reply]
Re^3: Perl Hash vs BerkeleyDB vs MySQL by perrin (Chancellor) on May 09, 2006 at 12:51 UTC
What is this "other conception of concurrency" you're referring to? BerkeleyDB allows either database level or page level locks. Page level locks are trickier, since you have to run the deadlock daemon. Database level locks are trivial and require no special knowledge, and perform well. I don't understand your shared memory comment. MySQL is implemented as multiple threads which share memory, but they don't share memory with your program. BerkeleyDB does use a shared memory cache, and it runs resident in your process, so you are accessing data directly from shared memory, unlike MySQL where you access it over a socket.	[reply]
Re^4: Perl Hash vs BerkeleyDB vs MySQL by pajout (Curate) on May 09, 2006 at 15:03 UTC
Re^5: Perl Hash vs BerkeleyDB vs MySQL by perrin (Chancellor) on May 09, 2006 at 16:13 UTC
Some notes below your chosen depth have not been shown here
Re^2: Perl Hash vs BerkeleyDB vs MySQL by monkfan (Curate) on May 09, 2006 at 06:47 UTC
Managed concurrent access What do you mean by that Corion? This is the first time I heard about this term. You are saying Perl can't achieve that also? Regards, Edward	[reply]
Re^3: Perl Hash vs BerkeleyDB vs MySQL by Corion (Patriarch) on May 09, 2006 at 06:51 UTC
By concurrent access I mean two or more programs writing to the file virtually at the same time. BerkeleyDB knows how to lock and unlock the database so that only one program modifies the database at one time and the file doesn't get corrupted. When BerkeleyDB manages the writing, I don't need to worry about this in my programs - and file locking is easy to get wrong. The only thing I have to worry about is when two programs modify the same value at the same time, but that's something I have to think about anyway.	[reply]
Re: Perl's Hash vs BerkeleyDB vs MySQL by ioannis (Abbot) on May 09, 2006 at 10:09 UTC
Previous replies focused on ACID properties, with this post I observe other kinds of differences: Advantages of Perl Hash: Easy to store a structure as value Easy to later add more hashes (databases) , but in db4 you commit on it ahead of time. Easier to store multiple values for a key (arrayref) No need to worry about recovery and proper shutdown Advantages of BerkelyDB (not the same as DB_File): Easier to access values bases on combination of keys and values Can to construct cursors (iterators) Can construct custom indexes (not yet supported in the DB4 Perl api) The first 4k of data are in memory, the rest on disk Can add callbacks to modify the standard access and retrieval methods. Comparisons with SQL databases miss the whole point of an embedded database. DB4 is not a multi-process system; it is an embedded database, a a C library intended meant to run in the same address space with the main program (when not using rpc). The general advantages and disadvantages of stand-alone models versus the client-server model should be noted.	[reply]
Re^2: Perl's Hash vs BerkeleyDB vs MySQL by herveus (Prior) on May 09, 2006 at 11:46 UTC
Howdy! ...of course, you can have both if you use SQLite, as it's an embedded database with SQL and all those goodies... yours, Michael	[reply]
Re^2: Perl's Hash vs BerkeleyDB vs MySQL by perrin (Chancellor) on May 09, 2006 at 12:53 UTC
A lot more than 4K of the BerkeleyDB data might also be in memory, in the shared memory cache.	[reply]
Re: Perl's Hash vs BerkeleyDB vs MySQL by dragonchild (Archbishop) on May 09, 2006 at 14:21 UTC
DBM::Deep combines the ease-of-use of a hash and the persistence goodnesses of BDB. I suggest you check it out. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re: Perl's Hash vs BerkeleyDB vs MySQL by wazzuteke (Hermit) on May 09, 2006 at 14:28 UTC
Adding to a bunch of other comments here, I've enjoyed combining Perl structures with BerkelyDB. As a multi-level nested data structure, I've found that MLDBM is a good source. I do want to try and play devil's advicate for a little bit, though. I always (usuall sometimes) try to think a little harder of when I am using a hash to 'see if I can figure out a better way' and implement the same idea using an array instead. Here's my reasonings on that (and mind you, there have been only some rare occasions where I have really been able to better implement an algorithem with an array versus a Perl-ish hash. I try to remember that a Perl hash is just an associative array. Whereas it is really an array of linked-lists. Therefore, no matter what, it can't be as fast as accessing an array directly. Not only does the key have to be hashed, but there inlies a chance it will have to iterate through n linked values before it finds the right key. Not to say this isn't blazingly fast, though not like the lightning you can pull out of an array index. Like I said, there have been some seriously rare cases where I (or those amongst me) have been able to pull a smidgen of speed or efficiency out of an array implementation rather than a hash, although always something to put in the back of your mind. `print map{chr}(45,45,104,97,124,124,116,97,45,45);` ... and I posted this while I was at work => whitepages.com \| INC	[reply] [d/l]
MySQL && BDB? (was Re: Perl's Hash vs BerkeleyDB vs MySQL) by BerntB (Deacon) on May 09, 2006 at 13:38 UTC
Let me piggyback another question. MySQL can use different storage models, including BerkeleyDB (BDB). How much overhead is it if you go the MySQL way -- compared if you use BDB directly? Any other practical (speed and scaling) differences? (It is easier to use the SQL database, I assume.) Disclaimer: I am a Postgresql person, so I might be totally confused. :-)	[reply]


Just another Perl shrine
	PerlMonks