http://qs321.pair.com?node_id=7058

File Locking

Introduction

This document will describe what file locking is, when you should use it, and how it is done in perl. To lock a file in perl, use the flock command (pronounced as a flock of sheep, not "eff lock"). For the impatient, here is a quick example:

open(MYFILE, ">>$myfile") || die; flock(MYFILE, 2) || die; print MYFILE "Cottleston Pie\n"; close(MYFILE);
(Okay, now that the impatient ones have left, let us look at things in a bit more detail)

What is file locking, and why should you use it?

File locking is a way of ensuring the integrity of files. It allows many people (actually, processes) to share a file in a safe way, without stepping on each other's toes. Sometimes, file locking is not needed - if only one process is working on the file, then there is no need to worry about anybody else changing it. However, when a single file is trying to be changed by two or more processes, conflicts can arise, and some sort of file locking is needed.

For example, let us say that you wish to create a simple text file (named "friends.txt") that has a list of all your friends, one per line. Now let's supppose you have written a very basic web page that allows your friends to add their name to your file through a very simple cgi script. Here is what you have come up with:

#!/usr/bin/perl print "Content-type: text/html\n\n"; $myfile = "friends.txt"; $newfriend = $ENV{'QUERY_INFO'}; open(MYFILE, "$myfile") || die; while(<MYFILE>) { if (m/^$newfriend$/) { print "You are already on the list!\n"; exit; } } close(MYFILE); push(@friends, $newfriend); open(MYFILE, ">$myfile") || die; print MYFILE @friends; close(MYFILE); print "You are now in my list, $newfriend!\n" exit;

Not a very complicated script, but we do have a problem.
Check out this line:

open(MYFILE, ">$myfile") || die;

When perl opens the file for writing like this, it "erases" the file first, by basically setting the size to zero, in anticipation of you writing something.

By way of example, let us say that your file contains the following two names:

Now let us imagine that two of your friends, Diana and Robin, are trying to add their names to your list at the same time. Diana gets their a split-second before Robin, so she is the first to open the file. She opens the file, reads in the two names already there (which are stored in the @friends array), and then closes the file. She adds her name to @friends, reopens the file for writing, puts the three names from @friends into the file, and closes it again. However, after she opens the file for writing, but before she writes anything to the file, Robin comes along and tries to read in the names. Since the file is empty at that exact moment, he reads in no names, and @friends is empty. He closes the file. Then he adds his name to the list, which now contains only his name, and reopens the file for writing. He then puts into it the single name from @friends, and closes the file again. At this point the file contains only Robin's name: Clark, Bruce, and Diana are lost forever.

Here is a timeline of what happens:

  1. Diana opens the file, reads in two names, and closes the file.
  2. Diana adds her name to the list.
  3. Diana reopens the file for writing, setting the length to zero.
  4. Robin reads the file right then, reads in NO names, and closes the file.
  5. Diana writes three names to the file and closes it.
  6. Robin adds his name to the (empty) list.
  7. Robin reopens the file for writing, setting the length to zero.
  8. Robin writes one name to the file and closes it.
  9. The file now contains only one name.

It may seem as though there is a very small chance of this happening, but the point is that there is a chance. Instead of this simple example, imagine a giant file with hundreds of people reading and writing to it at the same time. No matter the odds, nobody wants to have their file messed up.

All about flock

Here (finally!) is where file locking comes in. File locking is done at the system level, meaning that the actual details of applying the lock itself is not something you have to worry about.

File locking is done, in perl, with the flock command. The basic format for flock is:

flock FILEHANDLE, OPERATION

The OPERATION is actually a number, either 1, 2, 4, or 8. They are also commonly written in another form, as LOCK_SH, LOCK_EX, LOCK_NB, and LOCK_UN. Perl does not know what these mean, so you can use the numbers, or do something like this:

sub LOCK_SH { 1 } ## shared lock sub LOCK_EX { 2 } ## exclusive lock sub LOCK_NB { 4 } ## non-blocking sub LOCK_UN { 8 } ## unlock
Each is described later, for now, let's just fix up our example script to include some file locking:
#!/usr/bin/perl print "Content-type: text/html\n\n"; $myfile = "friends.txt"; $newfriend = $ENV{'QUERY_INFO'}; open(MYFILE, "$myfile") || die; flock(MYFILE, 1); while(<MYFILE>) { if (m/^$newfriend$/) { print "You are already on the list!\n"; exit; } } close(MYFILE); push(@friends, $newfriend); open(MYFILE, "+< $myfile") || die; flock(MYFILE, 2); seek(MYFILE, 0, 0); truncate(MYFILE, 0); print MYFILE @friends; close(MYFILE); print "You are in my list, $newfriend!\n" exit;

Notice that we have added two flock commands. The first one adds a shared lock, and the second one adds an exclusive lock. Looking back, we see that the number "1" represents "LOCK_SH", which stands for "lock, shared." Similarly, the number "2" corresponds to "LOCK_EX", or "lock, exclusive."

The difference between a shared lock and an exclusive lock is an important one. A shared lock is usually applied when you simply want to read the file, and it is okay if others read the file while you do. An exclusive lock is used when you want to make changes to the file. Only one exclusive lock can be on a file, so that only one process at a time can make changes. If your file is a large manilla envelope full of papers, then a shared lock slaps a little "Hey! I'm reading this!" note on the front. An exclusive note slips a note saying "Hey! I'm might make some changes to this, so look but don't touch until I'm done!."

Unlocking a file is not necessary, as long as you remember to close it. Closing the file automatically unlocks it as well - that is why we do not need any specific unlock commands in our example script.

Let's look at our example script again, at the first flock line:

flock(MYFILE, 1);

This does more than it first appears. Not only does it set a lock, but it checks for other locks first. In the case of a shared lock, it checks to see if there is an exclusive lock on the file. If there is, it waits until the exclusive lock is gone, and only then will it add its shared lock. It does not care if there are other shared locks on it. What this basically does is to say "I want to read this file, but only if I'm sure that nobody is in the middle of making changes to it, and I want to let everyone know that I am reading it."

Now look at the second flock command:

flock(MYFILE, 2);

This one sets an exclusive lock, because we want to make changes to the file. To set an exclusive lock, you must have write access to the file (a shared lock only needs read access). With an exclusive lock, the rule is "there can be only one." The flock command in this case will check to see whether there are *ANY* other locks on the file, shared or exclusive, and will wait until they are all removed. When they are, it locks the file. What this basically says is "Hands off! I might make some changes to this file, so nobody mess with it until I am done"

So, in our example above with Diana and Robin, the new script would clear up the problem. We also made some other small changes. This line:

open(MYFILE, "+< $myfile") || die;
tells us to open the file in read/write mode. In other words, the file is NOT set to zero-lengh, because we do not want to mess with the contents until after we have locked it. Once we have locked it, we need two other commands:
seek(MYFILE, 0, 0); truncate(MYFILE, 0);
These bring us to the end of the file, and then sets the length to zero. This is basically what happens when we open a file in write only mode (i.e. "> $myfile") but we could not do that here because we want to lock it before truncating it.

Here is another timeline, with file locking:

  1. Diana opens the file, locks it (shared), reads in two names, and closes the file (which removes the lock).
  2. Diana adds her name to the list.
  3. Diana reopens the file for read/write, locks it (exclusive), and sets the length to zero.
  4. Robin opens the file, tries to get a shared lock, but cannot, becase the file is locked. So he is blocked until Diana is done.
  5. Diana writes three names to the file and closes it, which also removes her lock.
  6. Robin sees that the lock is gone, locks it himself (shared), and reads in the three names.
  7. Robin adds his name to the list.
  8. Robin reopens the file for read/write, locks it (exclusive), and sets the length to zero.
  9. Robin writes four names to the file and closes it, which also removes his lock.
  10. The file now contains four names!

The other two values, LOCK_NB and LOCK_UN are not used as often. The LOCK_NB means "NON_BLOCKING" and tells the system not to wait for other locks to come off the file, but to return right away with an error if there is already another lock on the file. The LOCK_UN means "unlock", but, as mentioned above, is not usually needed as close does the job for you.

flock vs. lockf

You may have also heard about lockf, flock's cousin. lockf can do everything that flock can, and then some. It can actually apply locks to *part* of a file, as well as applying advisory and mandatory locks. Flock only does advisory locks. In the manilla evelope analogy from before, flock allowed you to post notes on the folder, while lockf allows you to tag individual pages inside the folder. The fcntl command (which stands for "file control") is even more powerful than lockf, and is used to control all aspects of open files. Both of these are beyond the scope of this document: for file locking, use flock.

Other ways to lock files

There are other ways to lock files besides flock, lockf, and fcntl. Many operating systems have their own ways of locking files, but most of this will not concern the perl programmer. There are also ways to do file locking in perl (such as creating and removing a temporary file), but none are as good as flock.

Precautions

All of this assumes one thing - that everyone is playing by the same set of rules. In other words, there is nothing in locking a file with flock that prevents another process from ignoring all your locks. Flock provides an "advisory" locking method. This means another process can come along and open the file at will, ignoring any file locks. All the processes that access the file must use flock for it to work correctly.

Also, beware of command line editing. In the example above, let's say that "Lex" has added his name to your friends.txt file. Well, you don't consider Lex to be a friend, and you do not want his name in your file. So, you telnet it, call up emacs, and edit the friends.txt file directly. Watch out! What if Hank tries to add his name in while you have the file loaded? He could add his name, and then you would overwrite his changes when you save the file. (emacs will actually warn you when the contents of the disk have changed in this case. Another reason to use it!) Here are some simple ways to work around this problem, from best solution to worst:

Finally, file locking may not work across NFS or other file sharing systems. Some systems (e.g. NT) may not even allow advisory locking. Some systems do not have file locking at all (at least as far as anything that perl can use). When in doubt, check your system documentation. This is not an issue on most systems.

Replies are listed 'Best First'.
RE: File Locking
by KM (Priest) on May 22, 2000 at 07:17 UTC
    woops, went beyond what I could, I will put this in a few replies... change of having corrupt data.

    One good strategy is to move the locking away from the data file altogether by using a semaphore file. Consider the following:

    use strict;
    use Fcntl qw(:flock);
    
    my $file = 'data.file';
    my $SEMAPHORE = "$file.lck";
    
    open(S, ">$SEMAPHORE) || die "Foo ($!)";
    flock(S, LOCK_EX);
    open(FH, "+<$file") || die "Foofoo ($!)";
    # Muck with file
    close FH;
    close S;
    

    This solves a few problems. First, this allows you to ensure you get a lock when you open a file only for reading. Some systems will not give an exclusive lock on files (which is why you open the semaphore file for write) which were opened only for write only. This gets around that issue. And, by removing the issue of locking completely off the actual data file, you can use one semaphore file to lock multiple files, or directories. And, by removing the locking from the data file you reduce the risk of saying one morning "My hit counter reset itself!".

RE: File Locking
by KM (Priest) on May 22, 2000 at 07:15 UTC
    Your examples on file locking to avoid race conditions all have race conditions themselves. Let's go over some of what you wrote:

    First, some people do say 'eff-lock' :) Most things looked ok until the All about flock section. The concepts of LOCK_* is somewhat explained, but these values are constants which should be retrieved from the system header file which defines them. You should not define them in the script or create subroutines to do this. On an aside, you showed:

    sub LOCK_SH { 1 } ## shared lock
    sub LOCK_EX { 2 } ## exclusive lock
    sub LOCK_NB { 4 } ## non-blocking
    sub LOCK_UN { 8 } ## unlock
    If you want to define a constant, you should include the prototype () like:

    sub LOCK_SH () { 1 } ## shared lock
    etc...

    That's one way to make a constant, but.. back to the show..

    One of the best ways to get these values is with the Fcntl module like:

    use Fcntl qw(:flock);

    This will import your systems constants for those values. The next example given contains a race condition. Actually, the way you are explaining to get around a race condition, is really also a race condition. Not to worry, many people make this mistake and think that a simple flock() will make everything watertight. One of the problems is that the system itself may not finish writing to the file when the actual lock is released (remember, locks are only advisory, a file isn't actually frozen in some way), and another process reads the file before the write is comeplete. This may be a rare occasion, however. Another problem can occur when another process updates the file between the time the file is opened, and the lock is obtained. This is ano

RE: File Locking
by KM (Priest) on May 22, 2000 at 07:19 UTC
    To see an example of flock() in action, try running this script in a few consoles side by side.

    #!/usr/bin/perl -wT
    use Fcntl qw(:flock);
    my $file = 'test_lock.txt';
    my $SEMAPHORE = $file . '.lck';
    open(S, ">$SEMAPHORE") or die "$SEMAPHORE: $!";
    flock(S, LOCK_EX) or die "flock() failed for $SEMAPHORE: $!";
    open (FH, ">>$file") or die "Can't open $file: $!";
    print "About to write\n";
    print FH "I have written ($$)\n";
    print "Written\n";
    close FH;
    print "Going to sleep...\n";
    sleep 10;
    print "Woken up...\n";
    close S;
    

    I just wanted to go through this because I see a lot of mistakes that people make with locking files, and show another way of making it all more secure. To make it even more secure, use a database :)

    Cheers,
    KM

RE: File Locking
by ishamael (Beadle) on Apr 26, 2000 at 19:22 UTC
    it should be noted that sendmail (all versions, to my knowledge) ignores flock.
    just blatently ignores it if you flock an mbox.
    however, it does recognize its own locking system, which consists of a mboxname.lock file in the same directory as the mbox.
    this does create problems with write access and such, and is just a horrible workaround in my opinion, however will work fine.

    charlie schmidt
    ishamael@themes.org
    www.diablonet.net/~ishamael/
      This is not really the fault of sendmail in modern versions. sendmail has no knowledge about the user's mailbox. This is the task of mail.local, which is the one responsible of dealing with mailboxes.

      mail.local does require some form of file locking to prevent races in multiple message delivery scenarios. It attempts to use a number of schemes (fcntl(), lockf, flock, no locks at all) depending on the particular OS in which it is being installed.

      Unfortunetely, locking does not work reliably in all the scenarios on all the OSes, so multiple schemes must be relied on if attempting to be portable.

Re: File Locking
by sligi (Sexton) on Nov 16, 2001 at 02:11 UTC
    Hmm... Seems like you have empty @friends there. The loop should probably be something like:
    while(<MYFILE>) { if (m/^$newfriend$/) { print "You are already on the list!\n"; exit; } push @friends, $_; }
    ...or one could of course concatenate to the file.

    --
    Jaska

Re: File Locking
by Anonymous Monk on Jul 03, 2002 at 06:48 UTC

    seek(MYFILE, 0, 0); truncate(MYFILE, 0);
    These bring us to the end of the file, and then sets the length to zero.

    The camelbook on seek tells me that seek(MYFILE, 0, 0); brings the file pointer to the beginning of a file.

      And what do you think truncate(MYFILE,0) does?

      How can you just ignore that part?

          truncate FILEHANDLE,LENGTH
          truncate EXPR,LENGTH
                  Truncates the file opened on FILEHANDLE, or named by EXPR, to
                  the specified length. Produces a fatal error if truncate isn't
                  implemented on your system. Returns true if successful, the
                  undefined value otherwise.
      


      MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
      ** The Third rule of perl club is a statement of fact: pod is sexy.

        His point is still valid. The seek command brings you to the beginning and the description say it brings you to the end. Don't be a jerk.
Re: File Locking
by Anonymous Monk on Dec 12, 2003 at 07:19 UTC
    I really like to thank the person or people who wrote this "File Locking" tutorial because the text has given me the reason for opening a file in read-write mode before calling flock function. Now I feel more confident to implement flock function in my scripts.

    Thanks.

    Azim.
      This was well written and informative, but I think I still see a bug in this code. After closing the file and opening it for read/write access but before putting the exclusive lock on it, someone else could swoop in and read the file...which would be minus your current entry, as it hasn't been added yet. After adding your entry, the previous persons then add theirs, minus yours. So the file is corrupted. Two alternatives: Append to the file instead of clearing it out OR open it for read/write from the very beginning with exclusive control. Thoughts?
        If you're pretty sure the friend is not on the list, then locking the file exclusive for the entire operation is a good idea.

        However, locking the file in LOCK_EX (instead of LOCK_SH) will limit the concurrency of the operation to 1. So if you just want to check if they are (not) on the list, a shared lock will scale better. Check again after you get an exclusive lock, to avoid race conditions. (For even better scalability, use a database.)

        Thank you for your comment!!! I saw this bug also! but wasn't sure if it's a bug. read/write from the very beginning must be a good solution if it works, I haven't tried that out
Re: File Locking
by Anonymous Monk on Apr 06, 2003 at 04:28 UTC
    How I make to this function 'flock' works into activeperl on windows?
      flock is available on windows, at least modern windows (NT, 2000,2003, XP). For older windows you'll have to use File::FlockDir
        The LOCK_NB mode in flock function available with Activeperl does not work rightly on either NT,2000 or XP.