Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Will these functions for use with Storable properly lock my file?

by nysus (Parson)
on Aug 03, 2021 at 12:35 UTC ( [id://11135570]=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I've got a data file that will be read/written to by several processes running at the same time. I want to be sure the data in the file is opened, manipulated and saved by one process at at time to ensure the processes don't stomp on each other's work. From what I've read, using a semaphore file is the way to go. I am using Storable to read/write the data. I came up with these two helper functions for store and retrieve which I'm hoping will do the trick:

use Storable; use Fcntl qw(:flock) sub _store { my $data = shift; my $data_file = 'data/plugin_portfolio'; store $data, $data_file; close LOCK; } sub _retrieve { my $data_file = 'data/plugin_portfolio'; return 0 if !-f $data_file; my $lock = $data_file . '.lck'; open (LOCK, "> $lock") or die "Can't open lock file"; flock(LOCK, LOCK_EX); retrieve $data_file; }

I believe the file handles are global so I don't think it's a problem having LOCK in two different subroutines. But I'm worried that there might be something I'm missing that will cause me to lose data. Or maybe there's a simpler way...

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Will these functions for use with Storable properly lock my file?
by davido (Cardinal) on Aug 03, 2021 at 15:33 UTC

    Storable supports advisory locking out of the box:

    # Advisory locking use Storable qw(lock_store lock_nstore lock_retrieve) lock_store \%table, 'file'; lock_nstore \%table, 'file'; $hashref = lock_retrieve('file');

    This is from the Storable POD. Could you use these tools to handle locking for you?


    Dave

      I had initially tried using the locking functions provided with Storable. Maybe I wasn't using them properly but the processes stomped all over one another. That's when I did some searching and decided to try out a semaphore file instead.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Will these functions for use with Storable properly lock my file?
by eyepopslikeamosquito (Archbishop) on Aug 03, 2021 at 13:43 UTC
Re: Will these functions for use with Storable properly lock my file?
by afoken (Chancellor) on Aug 03, 2021 at 17:13 UTC
    use Storable; use Fcntl qw(:flock) sub _store { my $data = shift; my $data_file = 'data/plugin_portfolio'; store $data, $data_file; close LOCK; } sub _retrieve { my $data_file = 'data/plugin_portfolio'; return 0 if !-f $data_file; my $lock = $data_file . '.lck'; open (LOCK, "> $lock") or die "Can't open lock file"; flock(LOCK, LOCK_EX); retrieve $data_file; }

    Not pretty.

    • Two-argument open was already mentioned in Re: Will these functions for use with Storable properly lock my file?. Use three-argument open.
    • Don't use bareword file handles. They are truely global and may collide with other (legacy) code using bareword file handles. Use lexical file handles.
    • _retrieve() leaks the bareword file handle LOCK, the lock file is kept locked until some other code explicitly closes that handle.
    • _store() closes a bareword file handle never opened there.
    • Are the previous two points intentional? Spooky action at a distance?

    Regarding the last point: I did not look up the Storable API, so maybe Storable explictly requires this behaviour? Does Storable guarantee that _retrieve() and _store() are always in this order, and never alone? If so, the code lacks a clear comment indicating that behaviour.

    I would expect Storable to call either _store() or _retrieve(), but not both. In that case, your code effectively locks only when you read first, but then until you end the process, read again, or write.

    If you have two instances running in parallel, and one chooses to read, then do a lot of other stuff, the other instance will fail to get the lock even if the first instance has finished reading long ago. At least, in this situation you won't loose data. But you effectively have only one working process at any time.

    If you have two instances running in parallel, and one chooses to write (calling _store()) without prior read, while the other chooses to read (calling _retrieve()), the writer will simply damage the data on disk while the reader assumes to be safe because it holds a lock. The writer doesn't even get an error, because close lacks error checks. Instant loss of data. (And trust me in that regard, Murphy will make sure that your data is damaged at the most inconvienient moment in time, causing the maximum damage.)

    davido++ explains in Re: Will these functions for use with Storable properly lock my file? that Storable already has code for file locking. If that was not so, you should lock both reading and writing, each time using a lexical file handle that is implicitly unlocked and closed when you leave the reading and writing routines. That way, only one process can ever hold a lock for the data file, and it will hold the lock only as long as absolutely needed.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      _retrieve() leaks the bareword file handle LOCK, the lock file is kept locked until some other code explicitly closes that handle. _store() closes a bareword file handle never opened there.

      Yes, I realize this is a litle precarious. It relies on my code always calling _retrieve() followed by _store(() at some point. It also counts on the lock getting removed when the process. Wasn't sure how else I could get around this.

      If you have two instances running in parallel, and one chooses to write (calling _store()) without prior read, while the other chooses to read (calling _retrieve()), the writer will simply damage the data on disk while the reader assumes to be safe because it holds a lock. The writer doesn't even get an error, because close lacks error checks. Instant loss of data. (And trust me in that regard, Murphy will make sure that your data is damaged at the most inconvienient moment in time, causing the maximum damage.)

      Yes. Bu tin my case, the processes are always upating the existing data in the file.

      I tried Storable's lock functions first. They didn't work in my case (though it entirely possible I wasn't using them right. It seemed to me that something like this was happening:

      1) Process A open data file to read, puts lock on it then removes lock + when done 2) Process B reads data file, puts lock on it, remove it when done. 3) Process C does the same. Now we have three different process with same data. Fine. But: 1) Process A finishes and writes to file. 2) Process D reads file. 3) Process B finishes and writes data file, overwriting A's work. 4) Process E reads file (saved by Process B). Now Process D and E have totally different starting points.

      But again, maybe I was using the lock functions wrong. Not sure.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Will these functions for use with Storable properly lock my file?
by stevieb (Canon) on Aug 03, 2021 at 14:59 UTC
    But I'm worried that there might be something I'm missing that will cause me to lose data.

    What happens if a process finds it can't write to the file? Will you queue that data for later? If not, there's data loss right there.

    If losing data is the real concern, why not use a database that has no issues with simultaneous writes? How about an asynchronous system where one function queues up all of the data from all processes, then writes the file when it knows it's safe to do so?

    You haven't described the type of data being written, so I'm just guessing.

      These are definitely things to think about. Fortunately, this data I'm working with is just a cache of information that's on the network that can be restored pretty easily. It's not a huge deal if the file gets corrupted. But I should definitely figure out a way to get alerted to a problem so I can reload the data from the network.

      No database on this server and it would be overkill for a non-critical cache.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

      As I noodle this out in my head, I don't think a database would help in this case. The process has to lock the file for the entire duration of the time it has the data file "checked out," not just for the brief periods while performing write and read operations.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Will these functions for use with Storable properly lock my file?
by bliako (Monsignor) on Aug 03, 2021 at 15:26 UTC

    Possibly, another solution could be to use a memory-mapped file, especially if your data can easily fit in your RAM. I am still confused on the exact way to do this. On the one hand, there is :mmap of PerlIO, something like this, but I don't know how to lock this:

    my $fh; if( ! open($fh, '>>:mmap', 'file.txt') ){ die "error opening, $!" } select $fh; $| = 1; my $i; while($i++<10){ print $fh "hello $$ at $i\n"; sleep 1; } close $fh;

    There is also File::Map which offers locking but from its documentation is not clear to me how to even append to the file and locking blocks. Very likely I am doing something wrong.

    If you manage to get this right then it seems to me it will be faster. There are lots of other modules which use a shared-memory space to share data, like Cache::FastMmap

    bw, bliako

Re: Will these functions for use with Storable properly lock my file?
by karlgoethebier (Abbot) on Aug 03, 2021 at 17:36 UTC

    See also

    «The Crux of the Biscuit is the Apostrophe»

Re: Will these functions for use with Storable properly lock my file?
by nysus (Parson) on Aug 03, 2021 at 13:12 UTC

    After a little testing, it seems to work well. Hard to say for sure with these race conditions. I did make a change to the _retrieve sub to lock the semaphore file even if the data file does not exist:

    sub _retrieve { my $data_file = 'data/plugin_portfolio'; my $lock = $data_file . '.lck'; open (LOCK, "> $lock") or die "Can't open lock file"; flock(LOCK, LOCK_EX); return 0 if !-f $data_file; retrieve $data_file; }

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks

Re: Will these functions for use with Storable properly lock my file?
by Anonymous Monk on Aug 03, 2021 at 13:42 UTC
    2 argument open? After 20 years?
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135570]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-04-19 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found