avoiding a race

westy032001 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: avoiding a race (read locks) by tye (Sage) on Sep 28, 2010 at 14:20 UTC
You get a READ lock (LOCK_SH), read the file, make an initial determination as to whether you need to write to it. If that is "yes", then you release the read lock and request a write lock. When you get it, you read from the position in the file that was the previous end of the file and update your decision as to whether you need to write. If so, append your update. Then release the lock. Update: Note that under other circumstances, this scheme has the potential for the classic problem of readers starving writers. If there is never a break in read locks getting held, then the request for a write lock will just wait forever. Given the schedule you outlined, it seems likely that all of the readers will finish before the next batch of readers start up. However, if your batches start taking 15 minutes to finish, then you might never get e-mail because the writers never get their locks. You should check how Perl's flock() is implemented on your system. It may be that a pending request for a write lock will cause new requests for a read lock to block, preventing starvation. You should also time out and send the e-mail if you can't get the write lock after, say, 15 minutes. The next race is when you want to purge the growing accumulation of log lines. I'd probably just include the date and hour in the log file name. Then you only need to read this hour's and last hour's log files and you can delete log files for longer ago on whatever schedule you desire without worry. - tye	[reply]
Re^2: avoiding a race (read locks) by westy032001 (Novice) on Sep 28, 2010 at 15:31 UTC
Thanks for the reply. If I understand you correctly, isnt there still a potential for a race condition ? If the database goes down and all 300 procs get a db error. process 123 open file and places a shared lock process 321 opens file and places a shared lock process 123 decides it is going to modify the file and so waits for 123 to unlock then places an exclusive lock modifies file and closes process 321 decides it is going to modify the file and so places an exclusive lock modifies file and closes. If both are changing the file as a result of the same error (i.e database is down) you will get 2 of the same error codes recorded. and 2 emails sent to admins . thanks.	[reply]
Re^3: avoiding a race (read locks) by Corion (Patriarch) on Sep 28, 2010 at 15:48 UTC
See the following sentence in tye's scheme: When you get [the write lock], you read from the position in the file that was the previous end of the file and update your decision as to whether you need to write. So, in your example case, Process 321 would notice that the file changed since it last checked and that another process already sent the notification.	[reply]
Re^4: avoiding a race (no longer EOF) by tye (Sage) on Sep 28, 2010 at 17:28 UTC
Re^4: avoiding a race (read locks) by westy032001 (Novice) on Sep 29, 2010 at 15:07 UTC
Re: avoiding a race by lostjimmy (Chaplain) on Sep 28, 2010 at 14:18 UTC
I was going to suggest something similar to what gman said, but then I thought if you were having database issues, one of those issues might be that the DB is down, so recording errors in the DB might not work out so well. I came up with two ideas: 1) Have all processes write their errors to log files named by process ID; then have a separate process create a report from those logs and email the admin. 2) Have a daemon listening on a socket (maybe a unix socket for simplicity) and have all processes write their errors to that socket. This process will make the decisions about which errors to report. Both options have the advantage of the reporting logic being in one place, and there shouldn't be any race conditions.	[reply]
Re: avoiding a race by gman (Friar) on Sep 28, 2010 at 14:12 UTC
This seems like a good job for some sort of database, mysql, SQLite... If all 300 processes run at the same time, you might still have to stagger them, follow this up with another script that produces a report that is emailed.	[reply]
Re: avoiding a race by ig (Vicar) on Sep 28, 2010 at 18:46 UTC
I would log all the errors, probably using syslog / Sys::Syslog then scan the resulting log once an hour or maybe use http://www.crypt.gen.nz/logsurfer/man_logsurfer.html to look for the errors and generate email.	[reply]
Re: avoiding a race by BrowserUk (Patriarch) on Sep 28, 2010 at 16:10 UTC
A simpler, non-locking mechanism would be: You receive error 123. You stat for a file named `ERROR.123`. If the file doesn't exist you create it (empty), and move on. If the file does exist, you check the time stamp. If it is older than 1 hour: you delete the file; send an email; then move on. If it is less than 1 hour; you just move on. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^2: avoiding a race (does lock, still racy) by tye (Sage) on Sep 28, 2010 at 16:31 UTC
That is not a non-locking mechanism. It just hands off the locking to the kernel which locks the directory when it reads from it or writes to it. It has the advantage of the kernel locking implementation being very well tested. Of course, errors might not all have such nice, unique, numeric identifiers so the files might have to be named more like `"ERROR.Error inserting record into table WIDGET, unique key violation on column UPC"`. And even that won't work if the comparison for "same error" isn't easily reduced to "string equality". But, most importantly, your solution (as described) has a race condition between stat and creating a file. You can probably fix that a couple of different ways. - tye	[reply]
Re^3: avoiding a race ("No extra", "no-user" locking--miniscule race of no importance) by BrowserUk (Patriarch) on Sep 28, 2010 at 17:28 UTC
That is not a non-locking mechanism. It just hands off the locking to the kernel which locks the directory when it reads from it or writes to it. It has the advantage of the kernel locking implementation being very well tested. The kernel is going to do it's locking whatever file operations you do. Re-using it is good. So, I guess you could call it a "no-extra, no-effort(or risk of getting it wrong)" locking mechanism. Of course, errors might not all have such nice, unique, numeric identifiers ... If you can't reduce the errors to something easily comparible in the filesystem, you'll have similar problems locating similar errors in the file itself. And globbing is capable of much more that just "string equality". But, most importantly, your solution (as described) has a race condition between stat and creating a file. You can probably fix that a couple of different ways. If I knew how to do open(`CREATE_NEW`()) in Perl, I would suggest that. If the open() fails, it must have 'just' been created, so there's nothing else to do, so you just move on anyway. ~~But realistically, it's probably a "problem" not worth the effort of solving. The idea is to avoid 300 emails. Getting 2 or even 3 shouldn't be a problem.~~ Update: The "race condition", whether this process creates a new file; or some other process does it for you a few milliseconds before you do, doesn't trigger extra emails. Nor does it delay their being sent at the appropriate time. the time window is probably less than the resolution of the file system timestamps. So. NO race condition! Very simple. Very effective. Perfection is the enemy of "good enough". ()Ie. Create a new file; fail if it already exists. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^4: avoiding a race ("No extra", "no-user" locking--miniscule race of no importance) by Kanji (Parson) on Sep 28, 2010 at 20:17 UTC
Re^4: avoiding a race (much ado) by tye (Sage) on Sep 29, 2010 at 19:05 UTC
Re^5: avoiding a race (the ado, you do, so well:) by BrowserUk (Patriarch) on Sep 29, 2010 at 19:56 UTC
Some notes below your chosen depth have not been shown here
Re: avoiding a race by kennethk (Abbot) on Sep 28, 2010 at 14:15 UTC
Assuming you generally don't expect errors, why don't you put your flock into the error handling code? That way, under normal circumstances there is no bottle neck. There necessarily must be a bottle neck somewhere if you want to check 300 error messages against each other.	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks