Re: Testing methodology

Replies are listed 'Best First'.
Re^2: Testing methodology by BrowserUk (Patriarch) on Mar 05, 2012 at 02:23 UTC
Replying to the update only at this time: It would be handy to have ... length() -- The module already has method `n()`. max_length() -- You supplied this information to me at creation time. It never changes. If you need it, remember it. is_full() -- Show me a use-case? One that doesn't involve you polling this method to decide when to push a new value. That polling will require the queue to be locked while the value is calculated, and unlocked prior to returning the value to you. That polling will slow down every other producer and consumer. And the value returned will be out of date by the time you get it. Therefore there could be no guarantee that if it returns not-full, and you immediate `nq()`, that it won't block. The information is therefore useless to you. Conversely, if you just go ahead and `nq()`, and it needs to block, it will, and will consume no cpu until the OS wakes it when room is available. (Via cond_signal). I doubt you will ever find a realistic use case that will persuade me to add this. peek() -- Again, you could try to convince me with a use-case, but you are unlikely to succeed. By the time `pq()` returned the next value to you, some other thread may have `dq()`'s it. Then what? peek_all -- There is no possible use-case for this. There already is the private method: `_state()` which effectively does this. It returns the entire internal structure as a string. Its intended use is a debugging aid. Indeed, I added it to allow me to track down a timing issue. But even then, for it to be useful, I had to serialise all state transitions, to make it a usable diagnostic. And doing that, by necessity, slowed the throughput to a crawl. Not only would the above make the module more testable, they'd also make it more useful. Sorry, but I disagree completely with both halves of that statement. A queue has one purpose in life. Take things in at one end and let them out at the other as efficiently as possible. Compromising the function to make testing easier is not going to happen. Adding could-dos without use-cases, for their own sake, is not going to happen. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^3: Testing methodology by tobyink (Canon) on Mar 05, 2012 at 08:17 UTC
Re `max_length`. Let's say I'm writing a module Amusement::Park which has a list of Amusement::Rides. Each Amusement::Ride has an associated Q. (This is a popular amusement park.) I'm writing the Amusement::Park but have no control over the Amusement::Rides, so don't initialise the Q objects. Good amusement park management says I need to point customers at rides with the least busy queues. To do this I need to calculate length ÷ max_length. A use case for `is_full` is that say I generate events of multiple levels of importance: say errors, warnings and debugging messages. If the queue is full, I might choose to silently drop debugging messages rather than being blocked. Or there might be several queues I could potentially add an item to, and I wish to choose one that is not full.. `peek`/`peek_all` - there may be several producers but only one consumer, so no danger of peeking a value but somebody else dequeueing it.	[reply] [d/l] [select]
Re^4: Testing methodology by BrowserUk (Patriarch) on Mar 05, 2012 at 11:17 UTC
Okay. I'll (briefly) join you in analogy land. I'm writing the Amusement::Park but have no control over the Amusement::Rides, so don't initialise the Q objects. Then query the information from Amusement::Rides. If Amusement::Rides creates the queues, then good OO dictates that you shouldn't be going direct to the Qs anyway. If he chooses to expose that information to you, that's his call. Amusement::Park ... max_length If you need to try and balance your queues, then the calculation `n/maxN` is completely useless to you. Different rides take different times to run, and accommodate different numbers of riders. Therefore percentage fullness is an completely useless measure of flow. And if you try to use it to control flow, you are going to end up with the well-known supermarket checkout/motorway traffic holdup scenario of whichever queue you switch to, the others will always suddenly flow faster. Getting back to the real world. Even if your guests wouldn't rebel at being made to queue for the lame-ass World Showcase when they came to go to Spaceship Earth, using multiple queues is completely the wrong way -- and a rookie mistake -- to manage distributing resources to multiple consumers. The correct, and only workable way is a single queue. The analogy here -- which may or may not make sense wherever you are in the world -- is the bank or post-office (even some enlightened supermarkets now), queuing system. A single queue and "Go to window(checkout) 8"... And finally, I bring you back to the fact that the moment you get your hands on this information, it is out of date. Between the return statement in the method, and you performing your division, you can loose your time-slice on the processor. And by the time you get your next one, the queue(s) you are instrumenting may have emptied (and filled and emptied again if you are not the only producer). Its is a simple fact of life with trying to instrument concurrency, that as soon as you have made a measurement, it is out of date. And the worst thing you can do is to try and synchronise threads in order to try and regulate them. It is the very act of trying to turn stochastic systems into deterministic systems that creates all the "evils" of threading -- dead-locking; live-locking; priority inversion et al. And the 'cure' for almost all of them is queues. But to work properly, they have to be free-running, bounded and self-regulating. It is that very self-regulation that permits the guarantees -- that producers and consumers will make steady forward progress and, eventually, finish -- that allow the programmer to ignore synchronisation and locking and fairness, because they -- in conjunction with the OS scheduler -- will take care of it for him. And the real beauty of the free-running queue is that regardless of whether you have single consumer and multiple-producers; or vice versa; or one of each; and regardless of whether the jobs are of equal size (cpu or elapsed time); or widely disparate sizes; they will be processed in a fair and timely manor. About the only mistake -- beyond using multiple queues for the same work items -- is to try and regulate them. Other than tailoring their bounds -- the size of the queue. This is usually done very pragmatically by trying a few different sizes on small test sets to find what works best. It can be done more scientifically, but the instrumentation required is not fullness. It is: 'time spent blocking when full' versus 'time spent blocking when empty'. (Producer bound or consumer bound.) It is also possible to use these statistics to adjust the size automatically, but it requires the statistics be measured and evaluated internally (ie. under locks). The downside is that gathering the statistics imposes a considerable penalty A use case for is_full is ... errors, warnings and debugging .... If full, drop debugging .... My personal reaction is, if you can safely drop them, why are you producing them. That's not a question. The point is, don't do what you do not have to do. And once you've limited to doing only what you have to do, the option to drop goes away. So then, you are either able to run sufficient consumers to ensure that the producer doesn't block too often; or you need a bigger box; or you need to redesign your system. The CompSci solution to your scenario is a priority queue. These are a completely different beast to free-running queues and must be implemented in an entirely different way. Perhaps the simplest implementation is to use multiple, free-running queues, one for each level of priority. Producers queue to the appropriate queue. Consumers try the highest priority queue first and drop down until they get work. But in reality they do not work well at all. The trouble is that as soon as a consumer has discovered that the high priority queue is empty, it can become non-empty. But the consumer has dropped down to the lowest priority level and is working on an item that is going to take ages, and so the high priority queue starts to go unserviced and fills up. One approach to correcting this is to have the consumers poll the high-priority queue for a short while before dropping down, but it fixes nothing because Sod's Law tells you that the nanosecond it stops polling, is when the high priority job arrives. The archetypical example of how priority queues fail is IBM's long defunct SNA. It had priority levels for transmission packets. The trouble is, that if you allow the priority to be user elective, everybody marks their stuff with the highest priority. There is no way to establish actual relative priority -- even if you could have a constant round-table discussion between all parties, they'd never reach an agreement. So, you resort to heuristics. Small means high cos it take no time; big means low because it lots of time; at busy times big just never gets through, so then you get people (me in real life) writing programs that break the huge files I need to send up into zillions of iddy-biddy chunks so they go through fast and can be reassembled at the other end. The only half-good solution is to use dedicated queues for each priority and dedicated consumers for each of those queues, and manage them through OS thread priority settings. It won't stop low-priority traffic from building up, but with any modern OS with dynamic (thread) priorities, low-priority queues that are starved of cpu slowly raise their priority until they get a time-slice before dropping back thus ensuring some forward progress for all threads (queues). Any other mechanism fails to ensure that forward progress will be made by all priority levels. Or there might be several queues I could potentially add an item to, and I wish to choose one that is not full.. Another rookie mistake I'm afraid. It is always a mistake to have multiple paths (queues) by which a work item may make its way to its consumer(s). If those multiple queues lead to the same (pool of) consumer(s), then the general effect of having multiples is of ensuring that one or more queues will fill and be ignored. How will the consumers decide which queue to draw from? Even if they pick randomly, they'll eventually pick the empty one and block when the others are all full. If they do it by fullness, all the consumers will pick the fullest queue at the same time, and the last ones in will find it emptied by those that got in first and block, despite that other queues are non empty. And the harder you try to avoid these deadlocks, the worse they will get. (Sod's Law again.) And if, by some mechanism, you succeed in getting both your producers and consumers to treat the multiple queues between them utterly impartially and fairly and evenly, then all you have done is create the exact effect of having a single longer queue, except FIFO is no longer guaranteed. You've broken the basic tenant of a queue. peek/peek_all - there may be several producers but only one consumer, so no danger of peeking a value but somebody else dequeueing it. Okay. What is that one consumer going to do if it peeks the queue and doesn't like what it sees there? Ignore it until it goes away? Oh. That's right. It is the only consumer, so it won't go away until it consumes it. So, now you need the peek_all() so that it can cherry pick the items it wants. Except that how can it get its preferred item(s) out of the queue, without removing the top item? So now you need a random access dq. Hm. That's sounding a lot like an array! If a consumer can tell from inspecting a dq'd item that it doesn't want to process it immediately, then it has the option of storing it somewhere in its unshared memory -- where locking isn't required for further access; where its not occupying a shared memory slot; where its decisions will have no impact upon other threads; and where it doesn't mean screwing with the simplicity and efficiency of the queue structure. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^2: Testing methodology by BrowserUk (Patriarch) on Mar 06, 2012 at 03:23 UTC
Firstly, as I already said. Thank you for stepping up. I note that the 'big guns' have ducked & covered, presumably keeping their power dry. I've spent the best part of yesterday responding to your test suite section by section, and I just discarded most of it because it would be seen as picking on you, rather than targeting the tools I am so critical off. # Test that Q.pm actually compiles. `BEGIN { use_ok 'Q' };` [download] What is the purpose of this test? What happens if the test is included and the Q module is not loadable? The Test::* modules trap the fatal error from Perl, and so the test suite continues to run, failing every test. Not useful. What happens if we do a simple `use Module;` instead. We get two lines of output instead of 6. The lines aren't preceded by comment cards so my editor does not ignore them. The test stops immediately rather than running on testing stuff that cannot possibly pass; or is in error if it does. Useful. What is actually being tested here? That perl can load a module? If it couldn't the Test::* tools wouldn't load. Not useful. That the tarball unpacked correctly? Perl would tell me that just as reliably. No more useful than Perl. That the module installed correctly? No. Because when the test suite is run, the module isn't installed. It is still in the blib structure. Not useful. And why does it force me to put it in a `BEGIN{}`? Because without it, I'd have to use parens on all my method calls otherwise they may be taken as filehandles or barewords. Worse than non-useful. Detrimental. Extra work because of changed behaviour. # new_ok `my $q = new_ok Q => [5];` [download] This apparently tests whether the return object is of the same class as the name supplied for the class. Why? That prevents me from doing: `package Thing; use if $^O eq 'MSWin32' ? 'Thing::Win32' : 'Thing::nix'; sub new { $^O eq 'MSWin32' ) ? &Thing::Win32->new() : &Thing::nix->new +(); }` [download] Detrimental. Extra work; limits options. # Test that the API as documented still exists. `can_ok $q => 'nq'; can_ok $q => 'dq'; can_ok $q => 'n';` [download] What do get if we use this and it fails? `not ok 1 - async::Q->can('pq') # Failed test 'async::Q->can('pq')' # at -e line 1. # async::Q->can('pq') failed` [download] Four lines, three of which just repeat the same thing in different words. And the tests continue despite that any that use that method will fail. No benefit. verbose. Repetitive. And if we let Perl detect it? `Can't locate object method "pq" via package "async::Q" at -e line 1.` [download] One line, no comment card. No repetition. Pointless extra work for no benefit. The rest elided. Again, thank you for being a willing subject. Now's your chance for revenge :) Take it! Here is my module complete with its test suite: #! perl -slw use strict; package async::Q; use async::Util; use threads; use threads::shared; use constant { NEXT_WRITE => -2, N => -1, }; sub new { # twarn "new: @_\n"; my( $class, $Qsize ) = @_; $Qsize //= 3; my @Q :shared; $#Q = $Qsize; @Q[ NEXT_WRITE, N ] = ( 0, 0 ); return bless \@Q, $class; } sub nq { # twarn "nq: @_\n"; my $self = shift; lock @$self; for( @_ ) { cond_wait @$self until $self->[ N ] < ( @$self-2 ); $self->[ $self->[ NEXT_WRITE ]++ ] = $_; ++$self->[ N ]; $self->[ NEXT_WRITE ] %= ( @$self - 2 ); cond_signal @$self; } } sub dq { # twarn "dq: @_\n"; my $self = shift; lock @$self; cond_wait @$self until $self->[ N ] > 0; my $p = $self->[ NEXT_WRITE ] - $self->[ N ]--; $p += @$self -2 if $p < 0; my $out = $self->[ $p ]; cond_signal @$self; return $out; } sub n { # twarn "n: @_\n"; my $self = shift; lock @$self; return $self->[ N ]; } sub _state { # twarn "_state: @_\n"; no warnings; my $self = shift; lock @$self; return join '\|', @{ $self }; } return 1 if caller; package main; use strict; use warnings; use threads ( stack_size => 4096 ); use threads::shared; use async::Util; use Time::HiRes qw[ time sleep ]; our $SIZE //= 10; our $N //= 1e5; our $T //= 4; ++$T; $T &= ~1; my $Q1_n = new async::Q( $SIZE ); my $Qn_n = new async::Q( $SIZE ); my $Qn_1 = new async::Q( $SIZE ); my @t1 = map async( sub{ $Qn_n->nq( $_ ) while defined( $_ = $Q1_n->dq + ); } ), 1 .. $T/2; my @t2 = map async( sub{ $Qn_1->nq( $_ ) while defined( $_ = $Qn_n->dq + ); } ), 1 .. $T/2; my $bits :shared = chr(0); $bits x= $N/ 8 + 1; my $t = async{ while( defined( $_ = $Qn_1->dq ) ) { die "value duplicated" if vec( $bits, $_, 1 ); vec( $bits, $_, 1 ) = 1; } }; my $start = time; $Q1_n->nq( $_ ) for 1 .. $N; $Q1_n->nq( (undef) x ($T/2) ); $_->join for @t1; $Qn_n->nq( (undef) x ($T/2) ); $_->join for @t2; $Qn_1->nq( undef ); $_->join for $t; my $stop = time; my $b = unpack '%32b*', $bits; die "NOK: $b : \n" . $Q1_n->_state, $/, $Qn_n->_state, $/, $Qn_1->_sta +te unless $b == $N; printf "$N items by $T threads via three Qs size $SIZE in %.6f seconds +\n", $stop - $start; __END__ C:\test>perl async\Q.pm -N=1e4 -T=2 -SIZE=40 1e4 items by 2 threads via three Qs size 40 in 5.768000 seconds C:\test>perl async\Q.pm -N=1e4 -T=20 -SIZE=40 1e4 items by 20 threads via three Qs size 40 in 7.550000 seconds C:\test>perl async\Q.pm -N=1e4 -T=200 -SIZE=400 1e4 items by 200 threads via three Qs size 400 in 8.310000 seconds [download] You'll notice that in addition to performing a default test, it can be configured through command line parameters to vary the key parameters of the test. The actual test consists of setting up 3 queues. One thread feeding data via the first queue to a pool of threads (1 to many). That pool dequeues the input and passes on to a second pool of threads via the second queue (many to many). And finally those threads pass the data back to the main thread via the third queue (many to 1). The data for a run consists of a simple list of integers. Once they make it back to the main thread, they are checked off against a bitmap tally to ensure that nothing is dequeued twice, nor omitted. All in one file; no extraneous modules; no extraneous output; completely compatible with any other test tools available, because it is nothing more than a simple perl script. Feel free to rip it to shreds. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^3: Testing methodology by tobyink (Canon) on Mar 06, 2012 at 10:17 UTC
What happens if the test is included and the Q module is not loadable? Depends why loading fails. If Q.pm parses OK and executes OK but returns false, then the use_ok test will fail but the other twenty tests will still pass. None of your tests cover the situation where Q.pm returns false because you never attempt to "use Q" or "require Q". Nothing compels you to put a BEGIN { ... } block around it, but as a matter of style (in both test suites and regular code) I tend to make sure all modules load at compile time unless I'm purposefully deferring the load for a specific reason. This apparently tests whether the return object is of the same class as the name supplied for the class. Why? No it doesn't. It checks that the returned object is of the same class as the name supplied, or a descendent class in an inheritance heirarchy. This still allows you to return Q::Win32 or Q::Nix objects depending on the current OS, provided that they both inherit from Q. To have a class method called "new" in Q, which returns something other than an object that "isa" Q would be bizarre and likely to confuse users. Bizarreness is worth testing against. Tests don't just have to catch bugs - they can catch bad ideas. But it can catch bug anyway. Imagine Q/Win32.pm contains: `my @ISA = ('Q');` Ooops! That should be `our @ISA`. This test catches that bug. can_ok Notice none of my further tests touch the "n" method? Well, at least its existence is tested for. If for some reason during a refactor it got renamed, this test would fail, and remind me to update the documentation. I don't think any of your tests check the "n" method either. If you accidentally removed it during a refactor, end users might get a nasty surprise. A can_ok test is essentially a pledge that you're not going to remove a method, or not without some deliberate decision process. Use of a formalised testing framework can act as a contract - not necessarily in the legal sense - between the developer and the end users. It's a statement of intention: this is how my software is expected to work; if you're relying on behaviour that's not tested here, then you're on dangerous ground; if I deviate from this behaviour in future versions, it will only have been after careful consideration, and hopefully documentation of the change. ☆ ☆ ☆ Overall most of your complaints around Test::More seem to revolve around three core concerns: Verbosity of output; That is continues after a failure has been detected rather than bailing out; and It apparently "forcing you to jump through hoops". Verbosity of output has never been as issue for me. The "prove" command (bundled with Perl since 5.8.x) gives you control over the granularity of result reporting: one line per test, one line per file, or just a summary for the whole test suite. Yes, you get more lines when a test fails, but as a general rule most of your tests should not be failing, and when they do, you typically want to be made aware of it as loudly as possible. The fact that test running continues after a failure I regard as a useful feature. Some test files are computationally expensive to run. If lots of calculations occur, then a minor test of limited importance fails, I still want to be able to see the results of the tests following it, so if there are any more failures I can fix them all before re-running the expensive test file. If a particular test is so vital that you think the test file should bailout when it fails, it's not especially difficult to add `or BAIL_OUT($reason)` to the end of the test. `my $q = new_ok Q => [5] or BAIL_OUT("too awful");` [download] Test::Most offers the facility to make all tests bail out on failure, but I've never really used Test::Most. One man's "forced to jump through hoops" is another man's "saved from writing repetitive code". `new_ok` saves me from writing: `my $q = Q->new(5); unless (blessed $q and $q->isa('Q')) { warn "new did not return an object which isa Q"; # and note that the line number reported by "warn" here # is actually two lines after the real error occurred. }` [download] Ultimately if I did ever feel like a particular set of tests wasn't a natural fit for Test::More, there would be nothing to stop me sticking a few non-TAP scripts into my distro's "t" directory, provided I didn't name them with a ".t" at the end. They can live in the same directory structure as my other tests; they just won't get run by "prove" or "make test", and won't be reported on by CPAN testers. It doesn't have be an either/or situation.	[reply] [d/l] [select]
Re^4: Testing methodology by BrowserUk (Patriarch) on Mar 06, 2012 at 12:03 UTC
Okay, you're a cool-aid drinker. S'cool. I'm not. I don't like the taste. The only thing I'll respond to is: I don't think any of your tests check the "n" method either. True, for a reason: I cannot think of a good use for it. As such it may well go away if I don't find a use for it between now and releasing it. If I ever do. The only use I make of Thread::Queue`::pending()`, relates to preventing the Q from attaining an unbounded size. My queue addresses that internally, so that use goes away. Another possible use would be to prevent code from calling `dq()` when it would block. But as discussed elsewhere, that use is a bust because the information can be out-of-date by the time I get it. If there was a use-case for a dq_nb() or similar, then it would have to be implemented internally -- with the test under locks. If I find myself wanting that facility then I'll add it (under some name) and probably drop `n()`. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^3: Testing methodology (TAP++) by tye (Sage) on Mar 06, 2012 at 05:26 UTC
Yeah, I eventually realized that there was almost nothing in Test::More that I was actually using or even found to be a wise thing to use. isa_ok()? I just can't imagine that finding a real mistake. I can certainly see it complaining about an implementation detail that I might change. I also don't want to use is_deeply(). I think a test suite should complain about what it cares about. If there is extra stuff that it doesn't care about, then it shouldn't complain. And I find Test::Simple to just be a stupid idea. But I do use and value Test (though I use my own wrapper around it for a few minor reasons and to provide decent Lives() and Dies() implementations -- better than the several modules that purport to provide such that I've seen). I certainly make frequent use of skip()ing. The fact that Test doesn't use Test::Builder is a minor side benefit that becomes a major benefit every so often when I feel the need to look at the source code. Test::More's skip() is so bizarrely defined that I can't actually use it correctly without reading the code that implements it and the act of trying to find said code is so terribly aggravating since Test::Builder is involved, so I'm happy to have realized that I never have to go through that again. There are tons of tools built on top of TAP (and other testing schemes such as used by some of our Ruby-based tests). It is actually useful in the larger context for each individual test to get numbered so we can often correlate different failure scenarios and to make concise reports easy. And we have more than one test file per code file in many cases. This is especially useful when there are interesting set-up steps required for some tests. Testing leaf modules is the easiest case and usually doesn't really stress one's testing chops. Many of my test files abstract a few patterns of test and then run lots of simple tests that are specified with a small amount of data. So, for example, I might have a few dozen lines where each line specifies an expected return value, a method name, and an argument list (and maybe a test description). Also, having the test code in the same file as the code being tested would complicate coverage measurement, easily distinguishing commits that are fixing code from commits that are fixing tests, searching for real uses of a specific feature while ignoring tests that make use of it, ... But, no, I'm not interested in "stepping up" to your challenge. Many of my reasons would just come across as a personal attack so I'll not go into them. But most of what I'm talking about I can't demonstrate well by pasting a bit of code. I have no interest in trying such a feat. - tye	[reply]
Re^4: Testing methodology (UPDATED!) by BrowserUk (Patriarch) on Mar 06, 2012 at 13:24 UTC
There are tons of tools built on top of TAP. I just don't get what people get from TAP. As a (module/application) user, I don't give a monkeys what passed or failed. Either it passed or it didn't. Nor do I (as a Win32 user) give a flying fig for whether you skipped a thousand tests because I'm not on nix. As a (module/application) programmer, if 90% passed is acceptable, then 10% of the tests are useless. If I wrapped ok() around my 'has this value been dequeued before' test, I'd be producing 100,000 (or a 1,000,000 or 100,000,000) OKs. Even if the user has configured a tool to suppress or summarise that useless information, it still means 100,000 (...) calls to a function to produce useless output; and 100,000 ( ... ) IOs to the screen or pipe, and 100,000 ( ... ) checks in the harness to throw away what I don't want to start with. My testing therefore takes 10 times as long for no benefit. Why do you care about the performance of tests, I can hear some somebodies asking -- especially as I dissed their time/cpu usage statistics. But the problem is, IO goes through the kernel and is (often) serialised. And that completely screws with the statical legitimacy of my testing strategy. I have at least half a dozen different implementations of a bounded Q. Some pure perl like this one. Some (in XS) that bypass Perl's Win32 emulation of nix cond_* calls and use (Win32) kernel locking and synching constructs direct. Some (in C/assembler) that bypass even those and implement locking using cpu primitives. Many of them are, or have been at some points, incorrectly coded and will deadlock or live lock. But in almost every case when that happens, if I introduce a few `printf()`'s into the key routines, they perform perfectly. Until I remove them again or (for example) redirect that trace output to NULL. And then they lock again. The reason is that the multi-threaded C-runtime performs it own internal locking to prevent it from corrupting its own internal structures. And those locks can and do prevent the timing conditions that cause the hangs. So, for me at least, not only do I not see any benefit in what TAP does, the output it requires can completely corrupt my testing. It is actually useful in the larger context for each individual test to get numbered so we can often correlate different failure scenarios and to make concise reports easy. As the developer receiving an error report, the first thing I'm going to want to do is convert the 'test number' to the file/linenumber. Why bother producing test numbers in the first place? Just give the user file&line and have him give that back to me. The only plausible benefit would be if the test number were somehow unique. That is, if the number of the test didn't change when new tests were added or old ones were removed. Then I might be able to respond to reports from old versions. But that isn't the case. And we have more than one test file per code file in many cases. This is especially useful when there are interesting set-up steps required for some tests. Hm. Unit tests, test the unit. System, integration and regression tests are different and live in a different place. I'm having a hard time envisaging the requirement for "interesting set-ups" for unit testing. Many of my test files abstract a few patterns of test and then run lots of simple tests that are specified with a small amount of data. Isn't that exactly what my 'has this value been dequeued before' test is doing? (I re-read the para many times and I'm still unsure what you mean?) `my $bits :shared = chr(0); $bits x= $N/ 8 + 1; my $t = async{ while( defined( $_ = $Qn_1->dq ) ) { die "value duplicated" if vec( $bits, $_, 1 ); vec( $bits, $_, 1 ) = 1; } };` [download] I see no benefit at all in counting those as individual tests. Much less in allowing the test suite to continue so that the one failure gets lost in the flood of 99,999: `D'ok 1 - got 1 from queue D'ok 2 - got 2 from queue D'ok 3 - got 3 from queue D'ok 4 - got 4 from queue D'ok 5 - got 5 from queue D'ok 6 - got 6 from queue D'ok 7 - got 7 from queue D'ok 8 - got 8 from queue D'ok 9 - got 9 from queue ... D'ok 99996 - got 99996 from queue D'ok 99997 - got 99997 from queue D'ok 99998 - got 99998 from queue D'ok 99999 - got 99999 from queue D'ok 100000 - got 100000 from queue` [download] (D'oh! Preachin' agin. Sorry! :) Also, having the test code in the same file as the code being tested would complicate coverage measurement, Maybe legit on a large collaborative project. But I still maintain that if I need a tool to verify my coverage, the module is too damn big. Update: split this quote out from the previous one; and responded separately easily distinguishing commits that are fixing code from commits that are fixing tests, searching for real uses of a specific feature while ignoring tests that make use of it, ... And I do not see the distinction here either. Test code is code. You have to write it, test it and maintain it. The bug fix that fixed the incorrectly coded test that was reporting spurious errors, is just as legitimate and important as the one that fixed the code under test that was reporting legitimate errors. Treating them in some way (actually, anyway) different is a nonsense. And this, (dare I say it?), is my biggest problem with TDD:"The Franchise". It actively encourages and rewards the writing of reams and reams of non-production code. And in most cases, does not factor that code into the costs and value of the production product. Try going to your National Project Coordinator, (due in Parliament the following week to explain to the Prime Minister why the project is late and over budget), that the reason everything worked during in-house testing and went belly-up on the first day during the high-profile, closely monitored, €18 million pilot study, was because all the in-house tests had been run with the debug-logging enabled, and that so completely distorted the timing requirements that nobody believed you that in critical areas, the overzealous use of over-engineered OO techniques meant that there was no way that it could keep up with full-production scale loading. The logging was effectively serialising inbound state changes, so nothing broke. But, no, I'm not interested in "stepping up" to your challenge. From what you've said about (at least some) of the test tools I'm critiquing, you would not have been the right 'big gun' for my purpose anyway. Many of my reasons would just come across as a personal attack so I'll not go into them. That is a shame. (For me!) I don't feel that I respond 'hurt' to critiques of my code. I may argue with conclusions and interpretations; but (I like to think), because of my disagreement with your technical assessment of that code. But when you start pseudo-psychoanalysing me on the basis of my code -- or words -- and start attributing their deficiencies (as you see them) to some personality trait indicative of some inherited mental condition, rather than as typos, misunderstandings or dog forbid, mistakes I will take umbrage and will respond in kind. This is where we have always clashed. But most of what I'm talking about I can't demonstrate well by pasting a bit of code. I have no interest in trying such a feat. And, as is so often the case, the most interesting part of your response leaves me with a million questions and wanting more... With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^5: Testing methodology (TAP++) by tye (Sage) on Mar 06, 2012 at 15:32 UTC
Re^6: Testing methodology (TAP++) by BrowserUk (Patriarch) on Mar 06, 2012 at 17:09 UTC


Perl: the Markov chain saw
	PerlMonks