Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

avoiding a race

by westy032001 (Novice)
on Sep 28, 2010 at 13:52 UTC ( [id://862415]=perlquestion: print w/replies, xml ) Need Help??

westy032001 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks

I'm after some advice in avoiding a race condition for reading and possibly modifying a file or an alternate way of doing it.

The scenario is this , i have 300 processes connecting to a database at the same time every 15 mins, if these encounter a database error i want them to check a file to see if this error has been encountered before, if it hasn't it adds the error code with a time stamp, if it has been encountered before and the time stamp is greater than an hour ago it will mail the admin. The idea is to avoid getting 300 mails every 15 mins when somthing goes wrong. As far as i can see i need to flock the file then read it, modify if necessary and then unlock, but this causes a bottle neck for my 300 processes is there a better way of doing things ? cheers. I just wanted to thank everybody for taking the time and effort to read, think and post !! thanks.

Replies are listed 'Best First'.
Re: avoiding a race (read locks)
by tye (Sage) on Sep 28, 2010 at 14:20 UTC

    You get a READ lock (LOCK_SH), read the file, make an initial determination as to whether you need to write to it. If that is "yes", then you release the read lock and request a write lock. When you get it, you read from the position in the file that was the previous end of the file and update your decision as to whether you need to write. If so, append your update. Then release the lock.

    Update: Note that under other circumstances, this scheme has the potential for the classic problem of readers starving writers. If there is never a break in read locks getting held, then the request for a write lock will just wait forever. Given the schedule you outlined, it seems likely that all of the readers will finish before the next batch of readers start up. However, if your batches start taking 15 minutes to finish, then you might never get e-mail because the writers never get their locks.

    You should check how Perl's flock() is implemented on your system. It may be that a pending request for a write lock will cause new requests for a read lock to block, preventing starvation.

    You should also time out and send the e-mail if you can't get the write lock after, say, 15 minutes.

    The next race is when you want to purge the growing accumulation of log lines. I'd probably just include the date and hour in the log file name. Then you only need to read this hour's and last hour's log files and you can delete log files for longer ago on whatever schedule you desire without worry.

    - tye        

      Thanks for the reply.

      If I understand you correctly, isnt there still a potential for a race condition ?

      If the database goes down and all 300 procs get a db error.

      process 123 open file and places a shared lock

      process 321 opens file and places a shared lock

      process 123 decides it is going to modify the file and so waits for 123 to unlock then places an exclusive lock modifies file and closes

      process 321 decides it is going to modify the file and so places an exclusive lock modifies file and closes.

      If both are changing the file as a result of the same error (i.e database is down) you will get 2 of the same error codes recorded. and 2 emails sent to admins .

      thanks.

        See the following sentence in tye's scheme:

        When you get [the write lock], you read from the position in the file that was the previous end of the file and update your decision as to whether you need to write.

        So, in your example case, Process 321 would notice that the file changed since it last checked and that another process already sent the notification.

Re: avoiding a race
by lostjimmy (Chaplain) on Sep 28, 2010 at 14:18 UTC
    I was going to suggest something similar to what gman said, but then I thought if you were having database issues, one of those issues might be that the DB is down, so recording errors in the DB might not work out so well.

    I came up with two ideas: 1) Have all processes write their errors to log files named by process ID; then have a separate process create a report from those logs and email the admin. 2) Have a daemon listening on a socket (maybe a unix socket for simplicity) and have all processes write their errors to that socket. This process will make the decisions about which errors to report.

    Both options have the advantage of the reporting logic being in one place, and there shouldn't be any race conditions.

Re: avoiding a race
by gman (Friar) on Sep 28, 2010 at 14:12 UTC

    This seems like a good job for some sort of database, mysql, SQLite...

    If all 300 processes run at the same time, you might still have to stagger them, follow this up with another script that produces a report that is emailed.

Re: avoiding a race
by ig (Vicar) on Sep 28, 2010 at 18:46 UTC
Re: avoiding a race
by BrowserUk (Patriarch) on Sep 28, 2010 at 16:10 UTC

    A simpler, non-locking mechanism would be:

    1. You receive error 123. You stat for a file named ERROR.123.
    2. If the file doesn't exist you create it (empty), and move on.
    3. If the file does exist, you check the time stamp.
      1. If it is older than 1 hour: you delete the file; send an email; then move on.
      2. If it is less than 1 hour; you just move on.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That is not a non-locking mechanism. It just hands off the locking to the kernel which locks the directory when it reads from it or writes to it. It has the advantage of the kernel locking implementation being very well tested.

      Of course, errors might not all have such nice, unique, numeric identifiers so the files might have to be named more like "ERROR.Error inserting record into table WIDGET, unique key violation on column UPC". And even that won't work if the comparison for "same error" isn't easily reduced to "string equality".

      But, most importantly, your solution (as described) has a race condition between stat and creating a file. You can probably fix that a couple of different ways.

      - tye        

        That is not a non-locking mechanism. It just hands off the locking to the kernel which locks the directory when it reads from it or writes to it. It has the advantage of the kernel locking implementation being very well tested.

        The kernel is going to do it's locking whatever file operations you do. Re-using it is good.

        So, I guess you could call it a "no-extra, no-effort(or risk of getting it wrong)" locking mechanism.

        Of course, errors might not all have such nice, unique, numeric identifiers ...

        If you can't reduce the errors to something easily comparible in the filesystem, you'll have similar problems locating similar errors in the file itself. And globbing is capable of much more that just "string equality".

        But, most importantly, your solution (as described) has a race condition between stat and creating a file. You can probably fix that a couple of different ways.

        If I knew how to do open(CREATE_NEW(*)) in Perl, I would suggest that. If the open() fails, it must have 'just' been created, so there's nothing else to do, so you just move on anyway.

        But realistically, it's probably a "problem" not worth the effort of solving. The idea is to avoid 300 emails. Getting 2 or even 3 shouldn't be a problem.

        Update: The "race condition", whether this process creates a new file; or some other process does it for you a few milliseconds before you do, doesn't trigger extra emails.

        Nor does it delay their being sent at the appropriate time. the time window is probably less than the resolution of the file system timestamps. So. NO race condition!

        Very simple. Very effective. Perfection is the enemy of "good enough".

        (*)Ie. Create a new file; fail if it already exists.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: avoiding a race
by kennethk (Abbot) on Sep 28, 2010 at 14:15 UTC
    Assuming you generally don't expect errors, why don't you put your flock into the error handling code? That way, under normal circumstances there is no bottle neck. There necessarily must be a bottle neck somewhere if you want to check 300 error messages against each other.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://862415]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-16 13:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found