There are tons of tools built on top of TAP.
I just don't get what people get from TAP.
As a (module/application) user, I don't give a monkeys what passed or failed. Either it passed or it didn't. Nor do I (as a Win32 user) give a flying fig for whether you skipped a thousand tests because I'm not on *nix.
As a (module/application) programmer, if 90% passed is acceptable, then 10% of the tests are useless.
If I wrapped ok() around my 'has this value been dequeued before' test, I'd be producing 100,000 (or a 1,000,000 or 100,000,000) OKs.
Even if the user has configured a tool to suppress or summarise that useless information, it still means 100,000 (...) calls to a function to produce useless output; and 100,000 ( ... ) IOs to the screen or pipe, and 100,000 ( ... ) checks in the harness to throw away what I don't want to start with. My testing therefore takes 10 times as long for no benefit.
Why do you care about the performance of tests, I can hear some somebodies asking -- especially as I dissed their time/cpu usage statistics. But the problem is, IO goes through the kernel and is (often) serialised. And that completely screws with the statical legitimacy of my testing strategy.
I have at least half a dozen different implementations of a bounded Q. Some pure perl like this one. Some (in XS) that bypass Perl's Win32 emulation of *nix cond_* calls and use (Win32) kernel locking and synching constructs direct. Some (in C/assembler) that bypass even those and implement locking using cpu primitives.
Many of them are, or have been at some points, incorrectly coded and will deadlock or live lock. But in almost every case when that happens, if I introduce a few printf()'s into the key routines, they perform perfectly. Until I remove them again or (for example) redirect that trace output to NULL. And then they lock again.
The reason is that the multi-threaded C-runtime performs it own internal locking to prevent it from corrupting its own internal structures. And those locks can and do prevent the timing conditions that cause the hangs.
So, for me at least, not only do I not see any benefit in what TAP does, the output it requires can completely corrupt my testing.
It is actually useful in the larger context for each individual test to get numbered so we can often correlate different failure scenarios and to make concise reports easy.
As the developer receiving an error report, the first thing I'm going to want to do is convert the 'test number' to the file/linenumber. Why bother producing test numbers in the first place? Just give the user file&line and have him give that back to me.
The only plausible benefit would be if the test number were somehow unique. That is, if the number of the test didn't change when new tests were added or old ones were removed. Then I might be able to respond to reports from old versions. But that isn't the case.
And we have more than one test file per code file in many cases. This is especially useful when there are interesting set-up steps required for some tests.
Hm. Unit tests, test the unit. System, integration and regression tests are different and live in a different place.
I'm having a hard time envisaging the requirement for "interesting set-ups" for unit testing.
Many of my test files abstract a few patterns of test and then run lots of simple tests that are specified with a small amount of data.
Isn't that exactly what my 'has this value been dequeued before' test is doing? (I re-read the para many times and I'm still unsure what you mean?)
my $bits :shared = chr(0); $bits x= $N/ 8 + 1;
my $t = async{
while( defined( $_ = $Qn_1->dq ) ) {
die "value duplicated" if vec( $bits, $_, 1 );
vec( $bits, $_, 1 ) = 1;
}
};
I see no benefit at all in counting those as individual tests. Much less in allowing the test suite to continue so that the one failure gets lost in the flood of 99,999: D'ok 1 - got 1 from queue
D'ok 2 - got 2 from queue
D'ok 3 - got 3 from queue
D'ok 4 - got 4 from queue
D'ok 5 - got 5 from queue
D'ok 6 - got 6 from queue
D'ok 7 - got 7 from queue
D'ok 8 - got 8 from queue
D'ok 9 - got 9 from queue
...
D'ok 99996 - got 99996 from queue
D'ok 99997 - got 99997 from queue
D'ok 99998 - got 99998 from queue
D'ok 99999 - got 99999 from queue
D'ok 100000 - got 100000 from queue
(D'oh! Preachin' agin. Sorry! :)
Also, having the test code in the same file as the code being tested would complicate coverage measurement,
Maybe legit on a large collaborative project. But I still maintain that if I need a tool to verify my coverage, the module is too damn big.
Update: split this quote out from the previous one; and responded separately
easily distinguishing commits that are fixing code from commits that are fixing tests, searching for real uses of a specific feature while ignoring tests that make use of it, ...
And I do not see the distinction here either. Test code is code. You have to write it, test it and maintain it. The bug fix that fixed the incorrectly coded test that was reporting spurious errors, is just as legitimate and important as the one that fixed the code under test that was reporting legitimate errors. Treating them in some way (actually, anyway) different is a nonsense.
And this, (dare I say it?), is my biggest problem with TDD:"The Franchise". It actively encourages and rewards the writing of reams and reams of non-production code. And in most cases, does not factor that code into the costs and value of the production product.
Try going to your National Project Coordinator, (due in Parliament the following week to explain to the Prime Minister why the project is late and over budget), that the reason everything worked during in-house testing and went belly-up on the first day during the high-profile, closely monitored, €18 million pilot study, was because all the in-house tests had been run with the debug-logging enabled, and that so completely distorted the timing requirements that nobody believed you that in critical areas, the overzealous use of over-engineered OO techniques meant that there was no way that it could keep up with full-production scale loading. The logging was effectively serialising inbound state changes, so nothing broke.
But, no, I'm not interested in "stepping up" to your challenge.
From what you've said about (at least some) of the test tools I'm critiquing, you would not have been the right 'big gun' for my purpose anyway.
Many of my reasons would just come across as a personal attack so I'll not go into them.
That is a shame. (For me!)
I don't feel that I respond 'hurt' to critiques of my code. I may argue with conclusions and interpretations; but (I like to think), because of my disagreement with your technical assessment of that code.
But when you start pseudo-psychoanalysing me on the basis of my code -- or words -- and start attributing their deficiencies (as you see them) to some personality trait indicative of some inherited mental condition, rather than as typos, misunderstandings or dog forbid, mistakes I will take umbrage and will respond in kind.
This is where we have always clashed.
But most of what I'm talking about I can't demonstrate well by pasting a bit of code. I have no interest in trying such a feat.
And, as is so often the case, the most interesting part of your response leaves me with a million questions and wanting more...
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
|