Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Perl's Hash vs BerkeleyDB vs MySQL

by monkfan (Curate)
on May 09, 2006 at 06:38 UTC ( [id://548147]=perlquestion: print w/replies, xml ) Need Help??

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

To my - limited - knowledge, much of the functionality of BerkeleyDB (hence DB_FILE) is provided/ can be achieved by Perl's hash.

My question is, how do you know when to go for BerkeleyDB instead of using Perl's hash?
Furthermore, how do you decide when to use the non-relational BerkeleyDB instead of relational MySQL?

Regards,
Edward

Replies are listed 'Best First'.
Re: Perl Hash vs BerkeleyDB vs MySQL
by Corion (Patriarch) on May 09, 2006 at 06:44 UTC

    The big advantages of BerkeleyDB are:

    • Storage on disk instead of memory
    • Managed concurrent access

    So whenever I want a hash that persists on disk because my RAM isn't big enough to hold all of it, or when I want the data to persist between invocations of the program, I use a tied hash, that maps a Perl hash to a BerkeleyDB.

    Another case is when I want multiple instances of a program to (infrequently) write data and the dangers of conflicts are relatively small - then BDB also is convenient.

    In other cases, I use a plain Perl hash instead.

      I agree, but I see some disadvantages also:
      • Need for recovery when the process is not normally stopped.
      • Other conception of concurrency/locking, different from sql. It is necessary to know it preciselly, before app design.
      • No shared memory between processes (as sql machine can do).
        What is this "other conception of concurrency" you're referring to? BerkeleyDB allows either database level or page level locks. Page level locks are trickier, since you have to run the deadlock daemon. Database level locks are trivial and require no special knowledge, and perform well.

        I don't understand your shared memory comment. MySQL is implemented as multiple threads which share memory, but they don't share memory with your program. BerkeleyDB does use a shared memory cache, and it runs resident in your process, so you are accessing data directly from shared memory, unlike MySQL where you access it over a socket.

      Managed concurrent access
      What do you mean by that Corion? This is the first time I heard about this term.
      You are saying Perl can't achieve that also?

      Regards,
      Edward

        By concurrent access I mean two or more programs writing to the file virtually at the same time.

        BerkeleyDB knows how to lock and unlock the database so that only one program modifies the database at one time and the file doesn't get corrupted. When BerkeleyDB manages the writing, I don't need to worry about this in my programs - and file locking is easy to get wrong. The only thing I have to worry about is when two programs modify the same value at the same time, but that's something I have to think about anyway.

Re: Perl's Hash vs BerkeleyDB vs MySQL
by ioannis (Abbot) on May 09, 2006 at 10:09 UTC
    Previous replies focused on ACID properties, with this post I observe other kinds of differences:

    Advantages of Perl Hash:

    • Easy to store a structure as value
    • Easy to later add more hashes (databases) , but in db4 you commit on it ahead of time.
    • Easier to store multiple values for a key (arrayref)
    • No need to worry about recovery and proper shutdown

    Advantages of BerkelyDB (not the same as DB_File):

    • Easier to access values bases on combination of keys and values
    • Can to construct cursors (iterators)
    • Can construct custom indexes (not yet supported in the DB4 Perl api)
    • The first 4k of data are in memory, the rest on disk
    • Can add callbacks to modify the standard access and retrieval methods.

    Comparisons with SQL databases miss the whole point of an embedded database. DB4 is not a multi-process system; it is an embedded database, a a C library intended meant to run in the same address space with the main program (when not using rpc). The general advantages and disadvantages of stand-alone models versus the client-server model should be noted.

      Howdy!

      ...of course, you can have both if you use SQLite, as it's an embedded database *with* SQL and all those goodies...

      yours,
      Michael
      A lot more than 4K of the BerkeleyDB data might also be in memory, in the shared memory cache.
Re: Perl's Hash vs BerkeleyDB vs MySQL
by dragonchild (Archbishop) on May 09, 2006 at 14:21 UTC
    DBM::Deep combines the ease-of-use of a hash and the persistence goodnesses of BDB. I suggest you check it out.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Perl's Hash vs BerkeleyDB vs MySQL
by wazzuteke (Hermit) on May 09, 2006 at 14:28 UTC
    Adding to a bunch of other comments here, I've enjoyed combining Perl structures with BerkelyDB. As a multi-level nested data structure, I've found that MLDBM is a good source.

    I do want to try and play devil's advicate for a little bit, though. I always (usuall sometimes) try to think a little harder of when I am using a hash to 'see if I can figure out a better way' and implement the same idea using an array instead. Here's my reasonings on that (and mind you, there have been only some rare occasions where I have really been able to better implement an algorithem with an array versus a Perl-ish hash.

    I try to remember that a Perl hash is just an associative array. Whereas it is really an array of linked-lists. Therefore, no matter what, it can't be as fast as accessing an array directly. Not only does the key have to be hashed, but there inlies a chance it will have to iterate through n linked values before it finds the right key. Not to say this isn't blazingly fast, though not like the lightning you can pull out of an array index.

    Like I said, there have been some seriously rare cases where I (or those amongst me) have been able to pull a smidgen of speed or efficiency out of an array implementation rather than a hash, although always something to put in the back of your mind.

    print map{chr}(45,45,104,97,124,124,116,97,45,45);
    ... and I posted this while I was at work => whitepages.com | INC
MySQL && BDB? (was Re: Perl's Hash vs BerkeleyDB vs MySQL)
by BerntB (Deacon) on May 09, 2006 at 13:38 UTC
    Let me piggyback another question.

    MySQL can use different storage models, including BerkeleyDB (BDB). How much overhead is it if you go the MySQL way -- compared if you use BDB directly?

    Any other practical (speed and scaling) differences? (It is easier to use the SQL database, I assume.)

    Disclaimer: I am a Postgresql person, so I might be totally confused. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://548147]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found