http://qs321.pair.com?node_id=7058

File Locking

Introduction

This document will describe what file locking is, when you should use it, and how it is done in perl. To lock a file in perl, use the flock command (pronounced as a flock of sheep, not "eff lock"). For the impatient, here is a quick example:

open(MYFILE, ">>$myfile") || die; flock(MYFILE, 2) || die; print MYFILE "Cottleston Pie\n"; close(MYFILE);
(Okay, now that the impatient ones have left, let us look at things in a bit more detail)

What is file locking, and why should you use it?

File locking is a way of ensuring the integrity of files. It allows many people (actually, processes) to share a file in a safe way, without stepping on each other's toes. Sometimes, file locking is not needed - if only one process is working on the file, then there is no need to worry about anybody else changing it. However, when a single file is trying to be changed by two or more processes, conflicts can arise, and some sort of file locking is needed.

For example, let us say that you wish to create a simple text file (named "friends.txt") that has a list of all your friends, one per line. Now let's supppose you have written a very basic web page that allows your friends to add their name to your file through a very simple cgi script. Here is what you have come up with:

#!/usr/bin/perl print "Content-type: text/html\n\n"; $myfile = "friends.txt"; $newfriend = $ENV{'QUERY_INFO'}; open(MYFILE, "$myfile") || die; while(<MYFILE>) { if (m/^$newfriend$/) { print "You are already on the list!\n"; exit; } } close(MYFILE); push(@friends, $newfriend); open(MYFILE, ">$myfile") || die; print MYFILE @friends; close(MYFILE); print "You are now in my list, $newfriend!\n" exit;

Not a very complicated script, but we do have a problem.
Check out this line:

open(MYFILE, ">$myfile") || die;

When perl opens the file for writing like this, it "erases" the file first, by basically setting the size to zero, in anticipation of you writing something.

By way of example, let us say that your file contains the following two names:

Now let us imagine that two of your friends, Diana and Robin, are trying to add their names to your list at the same time. Diana gets their a split-second before Robin, so she is the first to open the file. She opens the file, reads in the two names already there (which are stored in the @friends array), and then closes the file. She adds her name to @friends, reopens the file for writing, puts the three names from @friends into the file, and closes it again. However, after she opens the file for writing, but before she writes anything to the file, Robin comes along and tries to read in the names. Since the file is empty at that exact moment, he reads in no names, and @friends is empty. He closes the file. Then he adds his name to the list, which now contains only his name, and reopens the file for writing. He then puts into it the single name from @friends, and closes the file again. At this point the file contains only Robin's name: Clark, Bruce, and Diana are lost forever.

Here is a timeline of what happens:

  1. Diana opens the file, reads in two names, and closes the file.
  2. Diana adds her name to the list.
  3. Diana reopens the file for writing, setting the length to zero.
  4. Robin reads the file right then, reads in NO names, and closes the file.
  5. Diana writes three names to the file and closes it.
  6. Robin adds his name to the (empty) list.
  7. Robin reopens the file for writing, setting the length to zero.
  8. Robin writes one name to the file and closes it.
  9. The file now contains only one name.

It may seem as though there is a very small chance of this happening, but the point is that there is a chance. Instead of this simple example, imagine a giant file with hundreds of people reading and writing to it at the same time. No matter the odds, nobody wants to have their file messed up.

All about flock

Here (finally!) is where file locking comes in. File locking is done at the system level, meaning that the actual details of applying the lock itself is not something you have to worry about.

File locking is done, in perl, with the flock command. The basic format for flock is:

flock FILEHANDLE, OPERATION

The OPERATION is actually a number, either 1, 2, 4, or 8. They are also commonly written in another form, as LOCK_SH, LOCK_EX, LOCK_NB, and LOCK_UN. Perl does not know what these mean, so you can use the numbers, or do something like this:

sub LOCK_SH { 1 } ## shared lock sub LOCK_EX { 2 } ## exclusive lock sub LOCK_NB { 4 } ## non-blocking sub LOCK_UN { 8 } ## unlock
Each is described later, for now, let's just fix up our example script to include some file locking:
#!/usr/bin/perl print "Content-type: text/html\n\n"; $myfile = "friends.txt"; $newfriend = $ENV{'QUERY_INFO'}; open(MYFILE, "$myfile") || die; flock(MYFILE, 1); while(<MYFILE>) { if (m/^$newfriend$/) { print "You are already on the list!\n"; exit; } } close(MYFILE); push(@friends, $newfriend); open(MYFILE, "+< $myfile") || die; flock(MYFILE, 2); seek(MYFILE, 0, 0); truncate(MYFILE, 0); print MYFILE @friends; close(MYFILE); print "You are in my list, $newfriend!\n" exit;

Notice that we have added two flock commands. The first one adds a shared lock, and the second one adds an exclusive lock. Looking back, we see that the number "1" represents "LOCK_SH", which stands for "lock, shared." Similarly, the number "2" corresponds to "LOCK_EX", or "lock, exclusive."

The difference between a shared lock and an exclusive lock is an important one. A shared lock is usually applied when you simply want to read the file, and it is okay if others read the file while you do. An exclusive lock is used when you want to make changes to the file. Only one exclusive lock can be on a file, so that only one process at a time can make changes. If your file is a large manilla envelope full of papers, then a shared lock slaps a little "Hey! I'm reading this!" note on the front. An exclusive note slips a note saying "Hey! I'm might make some changes to this, so look but don't touch until I'm done!."

Unlocking a file is not necessary, as long as you remember to close it. Closing the file automatically unlocks it as well - that is why we do not need any specific unlock commands in our example script.

Let's look at our example script again, at the first flock line:

flock(MYFILE, 1);

This does more than it first appears. Not only does it set a lock, but it checks for other locks first. In the case of a shared lock, it checks to see if there is an exclusive lock on the file. If there is, it waits until the exclusive lock is gone, and only then will it add its shared lock. It does not care if there are other shared locks on it. What this basically does is to say "I want to read this file, but only if I'm sure that nobody is in the middle of making changes to it, and I want to let everyone know that I am reading it."

Now look at the second flock command:

flock(MYFILE, 2);

This one sets an exclusive lock, because we want to make changes to the file. To set an exclusive lock, you must have write access to the file (a shared lock only needs read access). With an exclusive lock, the rule is "there can be only one." The flock command in this case will check to see whether there are *ANY* other locks on the file, shared or exclusive, and will wait until they are all removed. When they are, it locks the file. What this basically says is "Hands off! I might make some changes to this file, so nobody mess with it until I am done"

So, in our example above with Diana and Robin, the new script would clear up the problem. We also made some other small changes. This line:

open(MYFILE, "+< $myfile") || die;
tells us to open the file in read/write mode. In other words, the file is NOT set to zero-lengh, because we do not want to mess with the contents until after we have locked it. Once we have locked it, we need two other commands:
seek(MYFILE, 0, 0); truncate(MYFILE, 0);
These bring us to the end of the file, and then sets the length to zero. This is basically what happens when we open a file in write only mode (i.e. "> $myfile") but we could not do that here because we want to lock it before truncating it.

Here is another timeline, with file locking:

  1. Diana opens the file, locks it (shared), reads in two names, and closes the file (which removes the lock).
  2. Diana adds her name to the list.
  3. Diana reopens the file for read/write, locks it (exclusive), and sets the length to zero.
  4. Robin opens the file, tries to get a shared lock, but cannot, becase the file is locked. So he is blocked until Diana is done.
  5. Diana writes three names to the file and closes it, which also removes her lock.
  6. Robin sees that the lock is gone, locks it himself (shared), and reads in the three names.
  7. Robin adds his name to the list.
  8. Robin reopens the file for read/write, locks it (exclusive), and sets the length to zero.
  9. Robin writes four names to the file and closes it, which also removes his lock.
  10. The file now contains four names!

The other two values, LOCK_NB and LOCK_UN are not used as often. The LOCK_NB means "NON_BLOCKING" and tells the system not to wait for other locks to come off the file, but to return right away with an error if there is already another lock on the file. The LOCK_UN means "unlock", but, as mentioned above, is not usually needed as close does the job for you.

flock vs. lockf

You may have also heard about lockf, flock's cousin. lockf can do everything that flock can, and then some. It can actually apply locks to *part* of a file, as well as applying advisory and mandatory locks. Flock only does advisory locks. In the manilla evelope analogy from before, flock allowed you to post notes on the folder, while lockf allows you to tag individual pages inside the folder. The fcntl command (which stands for "file control") is even more powerful than lockf, and is used to control all aspects of open files. Both of these are beyond the scope of this document: for file locking, use flock.

Other ways to lock files

There are other ways to lock files besides flock, lockf, and fcntl. Many operating systems have their own ways of locking files, but most of this will not concern the perl programmer. There are also ways to do file locking in perl (such as creating and removing a temporary file), but none are as good as flock.

Precautions

All of this assumes one thing - that everyone is playing by the same set of rules. In other words, there is nothing in locking a file with flock that prevents another process from ignoring all your locks. Flock provides an "advisory" locking method. This means another process can come along and open the file at will, ignoring any file locks. All the processes that access the file must use flock for it to work correctly.

Also, beware of command line editing. In the example above, let's say that "Lex" has added his name to your friends.txt file. Well, you don't consider Lex to be a friend, and you do not want his name in your file. So, you telnet it, call up emacs, and edit the friends.txt file directly. Watch out! What if Hank tries to add his name in while you have the file loaded? He could add his name, and then you would overwrite his changes when you save the file. (emacs will actually warn you when the contents of the disk have changed in this case. Another reason to use it!) Here are some simple ways to work around this problem, from best solution to worst:

Finally, file locking may not work across NFS or other file sharing systems. Some systems (e.g. NT) may not even allow advisory locking. Some systems do not have file locking at all (at least as far as anything that perl can use). When in doubt, check your system documentation. This is not an issue on most systems.