[OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app

DrWhy has asked for the wisdom of the Perl Monks concerning the following question:

Okay so this is really a Berkeley DB question, but it's for a Perl application so I'm posting it here to see if anyone can help out.

I'm using Berkeley DB in an manner that is not approved of by the great Berkeley DB oracles-that-be. I'm using it from a shared disk in a distributed environment with multiple processes on mulitple machines reading and writing simultaneously. The saving grace, if there is one here is that there is expected to be very infrequent, but still possible cases of two processes on different machines accessing the BDB at the same time.

Going off a tip from the Oracle Berkeley DB FAQ online, I'm taking the following approach: Do external locking on the entire DB, handled by the application. I'm planning to use File::SharedNFSLock for this purpose.

I could just get a lock at the beginning of each app's run and that would be sufficient to get the locking I need, completely serializing access to the DB at a very high level. This would waste alot of time so I was looking at having each app take and release a lock several times around logical units of DB work within the app. Each unit of activity would look like:

Obtain SharedNFSLock.
Do work.
Force DB sync. (to clear out dirty pages in the cache)
Release SharedNFSLock.

What I'm not doing here is closing and reopening the DB inside each unit of work -- the DB handle is opened at the beginning of each Perl script and closed only at the end. If I do this though I wonder what's going to happen about cached pages. Will they be able to detect when a page has been modified by another process on another computer and reget that page when an operation is called involving that page? If not, then processes will miss getting updates made by other processes while they are running. I haven't been able to find anywhere on the web a description of how the caching works in Berkeley DB (and I don't feel like reading that Berekeley DB source to figure it out for myself, since that seems like it could take me quite along time to get to the information I need). So I'm wondering if anyone here knows enough about the internals of the BDB cache to be able to help out here.

Thanks.

--DrWhy

"If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

Comment on [OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app

Replies are listed 'Best First'.
Re: [OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app by anonymized user 468275 (Curate) on May 06, 2011 at 07:10 UTC
http://pybsddb.sourceforge.net/ref/rpc/intro.html and the following pages there explain how to use the Berkeley DB in an RPC Client/Server mode One world, one people	[reply]
Re^2: [OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app by DrWhy (Chaplain) on May 06, 2011 at 15:05 UTC
I did run across this in my rambling around the innertubes, and if my project were bigger than it is, I'd consider this solution. It just seemed like a little more effort and infrastructure to maintain than I was looking for. --DrWhy "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."	[reply]
Re: [OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app by Anonymous Monk on May 06, 2011 at 01:04 UTC
See http://download.oracle.com/docs/cd/E17076_02/html/gsg_txn/C/multithread-intro.html, the way I read it, what you wish to do is possible if you're using transactions	[reply]
Re^2: [OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app by DrWhy (Chaplain) on May 06, 2011 at 02:16 UTC
No, all of Berkeley DB's built-in concurrency and transaction control is predicated on a DB 'Environment', which, among other things, contains shared memory segments. So all processes/threads that are under transaction control must be on the same computer. My application runs on multiple computers sharing an NFS mount, and under this architecture the BDB Environment structures will not work properly. What I was looking for was information from anyone in the know about BDB's Memory Pools (the structures that implement the caching layer in BDB). What I wondered is whether these memory pools were smart enough to detect when a file had been modified by some agnet outside of their control and could then be made to invalidate any cached pages for that file. What I've been able to find on the Web so far is inconclusive, but it doesn't look like there's any mention of such smarts being a part of this technology. Therefore unless someone can tell me otherwise I think I'm stuck with closing and reopening database connections every time I want to obtain a lock and do work in order to assure that the caching is not invalidated by work done by other processes since the last time I had a database lock. --DrWhy "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."	[reply]
Re^3: [OT]? Behavior of Berkeley DB cache when reading cached page that was changed on disk by another app by Anonymous Monk on May 06, 2011 at 04:01 UTC
Therefore unless someone can tell me otherwise I think I'm stuck with closing and reopening database connections every time I want to obtain a lock and do work in order to assure that the caching is not invalidated by work done by other processes since the last time I had a database lock. I won't come out and say otherwise even though I believe so, but I will say, devise a simple test, and test it Here is a paper of interest Chunk: A Framework for Modular Distributed Shared Memory Systems	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks