Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Reliable Work Queue Manager

by ph713 (Pilgrim)
on Oct 27, 2005 at 05:30 UTC ( [id://503234]=note: print w/replies, xml ) Need Help??


in reply to Reliable Work Queue Manager

I've done similar modules from scratch for projects before. For maximum reliability, simplicity, and portability, I tend to rely on *nix atomic rename() calls within a fixed work queue directory to provide the gaurantees I need.

The general idea is: You have a queue directory that all the readers and writers know the location of. The "enqueue()" method writes your work to a temporary file in that directory (say ".__w58yto4er.qfile", name generated by File::Temp, but with your .__ prefix and .qfile suffix). Once it is successfully written to disk, you then do an atomic rename() call to remove the ".__" from the name. Queue readers dequeue work by scanning for files that don't start with dots in the work queue directory. If a worker decides to take a job, the first thing he does is again do an atomic rename() and renames it to ".--w58yto4er.qfile" before working on the request, and moves on silently if the rename fails (because another worker beat him to it).

With just that basic technique you have a disk-persistent unordered work queue that's atomically self-consistent and allows multiple readers and writers. At startup time after a crash you can rename all the .-- files back to normal names before firing up the workers to restart work on jobs that were interrupted by a crash or whatever. Be sure to use a (preferably data-)journalling filesystem and all that jazz.

If you want the queue to be ordered, you can put the queue job serial number encoded in the queuefile name in place of the File::Temp random characters, you'll just need a locked source of incrementing serial numbers. Perhaps a special file inside or outside the queue directory which starts with the contents "1". The enqueue() method could first wait on an advisory flock() on the serial number file, then once the lock is obtained, read the "1", write "2" to a new temporary file, and then atomically rename() the new temporary file over the name of the serial file he just read, and then close() and release the flock on the now-dead "1" file. Now he can use "1" for his job number, and the next guy in line will receive "2". If enqueue()-ers fail at odd points in the process you may have holes in your serial number sequence, but never duplicates.

Just food for thought if you're thinking of rolling your own.

Edited to Add: I just re-read that and the flock/rename() serial method doesn't really work, because the flock() would be tied to the inode not the name. The problem with just locking and overwriting the contents of the file is that a crash halfway through might leave a corrupted serial number. But then again in your startup code after a crash I suppose you could re-generate the serial number number to "1" for an empty queue directory, or the next number after the highest job sitting in the queue still, assuming the serial numbers don't have meaning outside of the queue and therefore the sequence doesn't need to generate truly unique numbers for all time.

Replies are listed 'Best First'.
Re^2: Reliable Work Queue Manager
by Adze (Acolyte) on Oct 28, 2005 at 14:50 UTC
    Some good advice above. I have used IPC::DirQueue with good results in the past.

    Read the specs of the maildir format at <http://cr.yp.to/proto/maildir.html> - quite a profound lesson to be learnt about the combined power of the filesystem and atomic syscalls when it comes to designing systems which are robust and avoid contention issues (e.g. flock over NFS).

    I would recommend against reinventing the wheel here.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://503234]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-23 06:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found