Non-deterministic Testing

moot has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Non-deterministic Testing by davido (Cardinal) on Jan 28, 2005 at 16:34 UTC
Jot your algorithm down in the simplest layman terms possible, and have a statistician look at it. It sounds like you're saying that you want a degree of randomness, but need tight control over the standard deviation of frequency of non-expected results. In other words, if you plot the frequency of non-expected results using a bell curve, you want a really steep curve with very short tails. This sort of thing is what statisticians understand best. Without seeing the algorithm it'll be hard for us to suggest how to achieve a steep curve with short tails. And I'm not a statistician myself, so my helpfulness may be limited in that regard, though someone else may have the required understanding. This also sounds a lot like the production / defect problem. Companies that use statistical analysis in their manufacturing will set a tollerance for defect, and a confidence interval which may be one or two standard deviations from their tollerance level. Once their test samples start falling outside of the predetermined standard deviation from tollerance, they have a high degree of confidence that not just the samples, but also the entire production run is edging toward unacceptible levels of defects. This is analyzed so that they will know when it's time to recalibrate machinery, rethink the quality of the raw materials being used, etc. Because this is a pretty advanced dicipline, you may find a lot of good info by searching for production statistical analysis. Dave	[reply]
Re: Non-deterministic Testing by osunderdog (Deacon) on Jan 28, 2005 at 16:21 UTC
It seems like if you know that it will produce unpredictable results around 20% of the time, you could determine the number of tests necessary to verify that boundary. I guess I would suggest trying to encapsulate the fuzzy bit into a function that can be swapped out with a less fuzzy (more predictable) function during testing. Then, test the core fuzzy bit separately using sample sizes that are statistically significant. <tiny>Geeze, I sound like one of my old professers.</tiny> *"Look, Shiny Things!"* is not a better business strategy than compatibility and reuse.	[reply]
Re: Non-deterministic Testing by itub (Priest) on Jan 28, 2005 at 16:30 UTC
I would first try to test the deterministic components as much as possible in isolation. Then, for the stochastic part, if you really run enough tests and/or your tolerance is large enough, you can be reasonably sure that the tests will "virtually never" fail. For example, if you expect 80,000 out of 100,000, I would estimate a standard deviation of about 120 (assuming independent random events with p=0.8). That means that you will very rarely see deviations greater than 400, and virtually never greater than 1000 (which is half of the sample tolerance you gave).	[reply]
Re: Non-deterministic Testing (srand) by tye (Sage) on Jan 28, 2005 at 16:35 UTC
Force the starting seed (srand) during the unit tests. - tye	[reply]
Re^2: Non-deterministic Testing (srand) by itub (Priest) on Jan 28, 2005 at 17:26 UTC
Caveat: this will make the tests reproducible only as long as you keep using the same platform (or perhaps even perl version, or compilation option). For example, the following program `perl -le "srand(0); print rand"` [download] gives me the following results: 0.00115966796875 v5.8.4 built for MSWin32-x86-multi-thread 0.17082803610629 v5.8.0 built for i686-linux	[reply] [d/l]
Re^3: Non-deterministic Testing (srand) by BUU (Prior) on Jan 28, 2005 at 19:16 UTC
And of course we can fix this with the following code: `BEGIN { my @list = 1..100; #or whatever; *CORE::GLOBAL::rand = sub { return shift @list } #or something }` [download] and now your rand will return what ever you want it to.	[reply] [d/l]
Re^2: Non-deterministic Testing (srand) by Anonymous Monk on Jan 31, 2005 at 10:55 UTC
But then you aren't testing the randomness of your function anymore. The OP wants to test whether about 20% of the tests return unpredictable results. By setting the seed of srand, or supplying the return values of rand as others suggest, you're no longer testing what you should be testing.	[reply]
Re: Non-deterministic Testing by xdg (Monsignor) on Jan 28, 2005 at 23:03 UTC
In short -- your inputs may be non-deterministic (or effectively so) but your program is, by definition, deterministic. Isolate the input stream and replace it with a known stream and test that. If you're using the build in rand to introduce random behavior, I humbly suggest my own Test::MockRandom. With it you can intercept the built-in rand function to return values that you preset in advance and can test for correct behavior across the range of potential results If the source of randomness is truly out of your control (e.g. network packet timings), then my suggestion is to break up your code so that the external dependency is isolated and wrapped in a class. Then use Test::MockObject to simulate the dependency with known results and test across the full range of likely inputs. -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re: Non-deterministic Testing by tmoertel (Chaplain) on Jan 29, 2005 at 05:43 UTC
Even if you cannot determine whether each result returned by your algorithm is correct, you probably can determine whether the distribution of results is correct. (This is what I had to do to test the random-data generators for Test::Lectrotest.) First, analyze your algorithm to determine the expected distribution of the results. Then run your algorithm in a large number of random trials to generate a set of observed results. Finally, use statistical tests to determine whether the results conform to the expected distribution (for the level of confidence you desire). There is a trade off between the degree of confidence you require and the "power" of statistical tests to detect small deviations from the norm. The more confidence you want, the less power the tests will have. The way to increase the power is to run more trials. Unless you have a really bizarre distribution, however, a few seconds of compute time will probably provide more than enough data to provide ample confidence and power. (But still you should do the math to verify this for your situation.) That is the idea. Now, to implement it, you will need to (1) determine the distribution of your expected results, (2) write a test driver to run the trials and collect the output results, and (3) write code to test the collected output against the expected distribution. Steps (1) and (3) require some familiarity with statistics. (If you need a refresher, take a look at the resources on the Statistics on the Web page maintained by Clay Helberg.) If you are not writing unit tests to be automated, you can perform step (3) by hand in a statistics program like R. Otherwise, you will need to code up some statistical tests or borrow them (e.g., Statistics::ChiSquare) and call them from your unit tests. I hope that this helps. Cheers, Tom Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply]
Re^2: Non-deterministic Testing by osunderdog (Deacon) on Jan 31, 2005 at 14:29 UTC
I'm glad someone mentioned R. It is a very powerful language. Dare I say R is the Yin to Perls Yang. It's also equally hard to master *"Look, Shiny Things!"* is not a better business strategy than compatibility and reuse.	[reply]
Re: Non-deterministic Testing by lachoy (Parson) on Jan 28, 2005 at 21:44 UTC
Rather than coding them up yourself you might try generating lots of tests with something like Test::LectroTest and then checking whether the results fall in your expected range. Chris M-x auto-bs-mode	[reply]
Re: Non-deterministic Testing by fergal (Chaplain) on Jan 28, 2005 at 19:25 UTC
Can you pick some good tests ("good" meaning they involve edge cases) and then just run each 100 times and see if the percentages come out OK. This seems more reliable than randomly generating 100 tests and running each of them.	[reply]