Testing Race Conditions

ropey has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Testing Race Conditions by GrandFather (Saint) on Jul 03, 2007 at 11:29 UTC
Inside of a single test open two db handles p and q. Initiate a transaction on p. Initiate a transaction on q. Commit the q transaction and check that it succeeds. Commit the p transaction and check that it fails. Trying to reproduce the actual race condition is likely to be very difficult, but checking that you have fixed the behavior that results from the race is deterministic and should be easy. If this doesn't solve your issue you need to tell us a little more about how the race condition arises, how likely it is to happen and how you propose to fix it. DWIM is Perl's answer to Gödel	[reply]
Re: Testing Race Conditions by BrowserUk (Patriarch) on Jul 03, 2007 at 15:05 UTC
How big is the 'window of opportunity' for the race condition? Ie. What are the approximate minimum and maximum times between the start transaction/end transaction brackets? Your server logs should be able to supply that information. If the minimum time between start and end of transaction is (say) more than 1 second ... When you do your testing, is the server? In the same box. On the same subnet. The otherside of the world connected via 2 cups and a bit of string. :) ... and you run your tests against a server in the same box or even on the same subnet ... Then arranging for two requests to arrive within that 1 second window using system, fork or threads shouldn't be hugely difficult. Other implications about thread/fork safety of your test tools is an open question. I've seen several reports here of forking test suites failing under Win32 fork emulation for example. Using threads it might look something like: `use threads; use Time::HiRes qw[ time ]; use Test::Whatever; ... ## Pick a time in the future my $go = time() + 0.1; async { ## Make sure client 2 goes off last (and so should fail) $go += 0.005; sleep 0 while time() < $go; nok( doXML_RPCrequest( 'some request' ) == SUCCESS ); sleep 0; }->detach; sleep 0 while time() < $go; ok( doXML_RPCrequest( 'some request' ) == SUCCESS ); ... rest of tests.` [download] On my single processor system using threads, I can arrange for two tcp requests to arrive at the server in the desired order (success before failure), within ~10 milliseconds of each other, ~997 times out of 1000. And 100% reliability (several runs of 1000 tests), if I widen the acceptable window to 20 milliseconds. So, reliably hitting a window of (say) > 1/10th of a second wouldn't be too difficult if your tests and server run within the same box. If your test box has multiple cpus, arranging finer timing may be possible. Obviously, the bigger the window the easier the task. An additional possibility is to go the white box route and and artificially widen the window of opportunity by building a test mode into your server. If a particular form of request is received, arrange to have a short sleep (1 or 2 seconds should be ample), to occur within the transaction brackets. That would make arranging to hit the window much easier, at the cost of modifying the code under test in a subtle way. Using this method you could even test the two cups and a bit of string scenario from above by just widening the window to some ludicrous amount. Whether the risk that someone might accidentally 'remove' or otherwise bypass the enablement of DB transactions at some future point, is worth the extra complexity to bother testing for it, only your judgement can decide. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Testing Race Conditions by ropey (Hermit) on Jul 04, 2007 at 11:59 UTC
Excellent advice - thanks all who answered Testing itself is on the same server - in most cases I bypass any RPC communication and do the RPC calls directly Window of opportunity is < 1 sec and I like the ideas of using threads to test this. For other saying why apply a test for this - from my experience every time I have a bug report I like to add a new test before its fixed, see it fail.. fix it and see it pass... leads to a greater question is when to test and when not to test but thanks all ++	[reply]
Re: Testing Race Conditions by jbert (Priest) on Jul 03, 2007 at 12:14 UTC
Another approach is to try a load test and then pick over the system to make sure all actions completed correctly. e.g. on an email server you could try sending 10,000 messages with 10 senders in parallel, wait a while and then ensure that all 10,000 messages where successfully delivered (and not corrupted). This isn't a reliable test against race conditions, but tests like this can help expose them. You can also get a feel for the efficacy of the test by running it against your unfixed code and seeing how reliably it reproduces the problem. This isn't intended as a replacement for the more deterministic testing suggested above, but it can be a useful tool to help shake out bugs in code which allows parallel processing. It would perhaps not be part of your unit tests, but perhaps your system tests (those run prior to release).	[reply]
Re: Testing Race Conditions by aufflick (Deacon) on Jul 04, 2007 at 05:40 UTC
The above advice is all excellent. I would suggest that you don't need to incorporate this testing into your 'unit testing' for two reasons: 1. you want unit testing to be so painless that developers run it very regularly (ie. multiple times a day); and 2. unit tests should be deterministic. This sort of test, if you can't guarantee to cheat the database time window, will sometimes pass even if the issue is there. The sort of load testing that would turn up all these sorts of errors should really take some time (and perhaps some preparation if you wanted to eg. run multiple clients). So you might want to add a unit test to confirm that, say, auto-commit is turned off for the relevant db handle (if that's how you implemented it), but proving that there's no time based race condition is a job for stress testing which you should perform regularly, but not as often as unit testing.	[reply]


There's more than one way to do things
	PerlMonks