comment on

I'm sure you meant to say "you should be locking a sentinel file".

I draw from the DB_File pod (as I don't recall where *else* I've seen info on this before, or if it applies to AnyDBM_File, which I think it does), and I quote verbatim (as proported by pod2html):

Locking: The Trouble with fd

Until version 1.72 of this module, the recommended technique for locking DB_File databases was to flock the filehandle returned from the ``fd'' function. Unfortunately this technique has been shown to be fundamentally flawed (Kudos to David Harris for tracking this down). Use it at your own peril!

The locking technique went like this.

    $db = tie(%db, 'DB_File', '/tmp/foo.db', O_CREAT|O_RDWR, 0644)
        || die "dbcreat /tmp/foo.db $!";
    $fd = $db->fd;
    open(DB_FH, "+<&=$fd") || die "dup $!";
    flock (DB_FH, LOCK_EX) || die "flock: $!";
    ...
    $db{"Tom"} = "Jerry" ;
    ...
    flock(DB_FH, LOCK_UN);
    undef $db;
    untie %db;
    close(DB_FH);

In simple terms, this is what happens:

Use ``tie'' to open the database.
Lock the database with fd & flock.
Read & Write to the database.
Unlock and close the database.

Here is the crux of the problem. A side-effect of opening the DB_File database in step 2 is that an initial block from the database will get read from disk and cached in memory.

To see why this is a problem, consider what can happen when two processes, say ``A'' and ``B'', both want to update the same DB_File database using the locking steps outlined above. Assume process ``A'' has already opened the database and has a write lock, but it hasn't actually updated the database yet (it has finished step 2, but not started step 3 yet). Now process ``B'' tries to open the same database - step 1 will succeed, but it will block on step 2 until process ``A'' releases the lock. The important thing to notice here is that at this point in time both processes will have cached identical initial blocks from the database.

Now process ``A'' updates the database and happens to change some of the data held in the initial buffer. Process ``A'' terminates, flushing all cached data to disk and releasing the database lock. At this point the database on disk will correctly reflect the changes made by process ``A''.

With the lock released, process ``B'' can now continue. It also updates the database and unfortunately it too modifies the data that was in its initial buffer. Once that data gets flushed to disk it will overwrite some/all of the changes process ``A'' made to the database.

The result of this scenario is at best a database that doesn't contain what you expect. At worst the database will corrupt.

The above won't happen every time competing process update the same DB_File database, but it does illustrate why the technique should not be used.

Safe ways to lock a database

Starting with version 2.x, Berkeley DB has internal support for locking. The companion module to this one, BerkeleyDB, provides an interface to this locking functionality. If you are serious about locking Berkeley DB databases, I strongly recommend using BerkeleyDB.

If using BerkeleyDB isn't an option, there are a number of modules available on CPAN that can be used to implement locking. Each one implements locking differently and has different goals in mind. It is therefore worth knowing the difference, so that you can pick the right one for your application. Here are the three locking wrappers:

Tie::DB_Lock: A DB_File wrapper which creates copies of the database file for read access, so that you have a kind of a multiversioning concurrent read system. However, updates are still serial. Use for databases where reads may be lengthy and consistency problems may occur.
Tie::DB_LockFile: A DB_File wrapper that has the ability to lock and unlock the database while it is being used. Avoids the tie-before-flock problem by simply re-tie-ing the database when you get or drop a lock. Because of the flexibility in dropping and re-acquiring the lock in the middle of a session, this can be massaged into a system that will work with long updates and/or reads if the application follows the hints in the POD documentation.
DB_File::Lock: An extremely lightweight DB_File wrapper that simply flocks a lockfile before tie-ing the database and drops the lock after the untie. Allows one to use the same lockfile for multiple databases to avoid deadlock problems, if desired. Use for databases where updates are reads are quick and simple flock locking semantics are enough.

______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

In reply to (crazyinsomniac) Re^2: Stand-Alone CGI Frame Chat by crazyinsomniac
in thread Stand-Alone CGI Frame Chat by {NULE}

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl: the Markov chain saw
	PerlMonks