comment on

That's a lot of questions... ;) And I was slow to respond, so I'm mostly reiterating what grep said. But let me back up a bit:

The file to be modified in my case is an important flat-file database (one record per line); web users can use a CGI script to either add data to the file or to edit their own records; I'm concerned about possible file corruption when two or more users are submitting new or revised data at about the same instant.

In that sort of scenario, there are a couple things to watch out for:

There's no locking. Bob pulls the data into his browser at 10:00, spends 15 minutes figuring out how to change it, then uploads his version. Meanwhile, Joe pulls the data at 10:05, spends 5 minutes working on his update, and uploads it. As of 10:15, Joe's updates are lost forever (or until he sees a problem and repeats his work).
There is locking, but Joe and Bob actually manage to beat the odds and both their updates hit the sever within a few cpu cycles of each other (relatively speaking); Joe's thread opens the file for output, then Bob's opens it for output, then Joe tries to get the lock on the file, then... ouch! my brain!!

Obviously, the first scenario is the one you really should worry about. It's not just a matter of using flock on the file; in fact, the more I think about it, the more unsuitable flock seems to be for web-based stuff. If you solve the first problem, the second one is a moot point.

As the first reply points out, you need some sort of "check-out/check-in" mechanism to keep different users from stepping on each other's updates. A user needs to explicitly request write access to the data file, and when your cgi script services that request, it has to know whether someone else has already been given write access.

And that's where you need to resolve any possible race condition: any given thread either gets the access (thereby blocking others), or else fails to do so because it is currently granted to someone else. For this purpose, checking for the existence of some "access.lock" file and creating it if it does not exist is almost atomic enough -- something like:

  my $fh = undef;
  ( -e "access.lock" || open( $fh, ">", "access.lock" ));
  if ( not defined( $fh )) {
      # report that someone else is editing the file
  } else {
      # write client/session-id data to access.lock and close it
      # so you can verify when this client sends the update
  }
[download]

(The truly paranoid programmer will find a chink there, and will hopefully offer the correct way to seal it up tight.)

But web interactions being what they are, you also need a policy: some upper bound on how long a client may hold the access lock. If Bob does a check-out at 10:00am and tries to upload his update at 10:00pm, it might be prudent to tell him at that point that he waited to long to submit the update and please try again using a fresh download (and please try to return it more quickly).

Or the policy could be more flexible: client may keep the lock up at least N minutes, or until someone else requests the lock after the minimum N minutes have passed -- that is, another client can "steal" the lock if it's more than N minutes old.

I know I could use a real database but I really want to figure out file locking using Perl. Seems like this issue must come up all the time in a multiuser environment, whether web or internal network.

It's good to make sure you understand file locking, even if it doesn't exactly apply to the current task. And yes, it's an old topic. Consider this old node, drawn from an even older article by Sean Burke, published in The Perl Journal back in 2001 (and sadly hard to find these days). Meanwhile, get started on using a real database for your current web app.

In reply to Re: Best practices for modifying a file in place: q's about opening files, file locking, and using the rename function by graff
in thread Best practices for modifying a file in place: q's about opening files, file locking, and using the rename function by davebaker

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.