Re^5: Testing methodology (TAP++)

If I wrapped ok() around my 'has this value been dequeued before' test, I'd be producing 100,000 (or a 1,000,000 or 100,000,000) OKs.

What a stupid idea. And not one that I saw anybody suggest.

Hm. Unit tests, test the unit. System, integration and regression tests are different and live in a different place.

I'm having a hard time envisaging the requirement for "interesting set-ups" for unit testing.

Well, I guess you haven't done much interesting unit testing? As I said, testing leaf modules is relatively trivial. Unit testing non-leaf modules can get tricky and there can be interesting set-up required so you can mock out things that the non-leaf module depends on so that you test the unit, not the whole system it employs.

But even something relatively trivial and a leaf module like File::FindLib required several interesting set-ups that would be a huge pain to do from a single file. So I'm not particularly moved by your failure of imagination on that point.

Many of my test files abstract a few patterns of test and then run lots of simple tests that are specified with a small amount of data.
Isn't that exactly what my 'has this value been dequeued before' test is doing? (I re-read the para many times and I'm still unsure what you mean?)

Perhaps you should have moved on to the next sentence? "So, for example, I might have a few dozen lines where each line specifies an expected return value, a method name, and an argument list (and maybe a test description)." That is so very much not "call the same method 10,000 times expecting the same result each time, reporting 'ok' separately for each call". I don't see how one can confuse the two so I won't waste time trying to restate that.

But then, I don't really consider what you keep talking about as a unit test. It is a functional test that has no reproducibility and relies on external interruptions in hopes of randomly inducing a problem. Yes, if I had to write a thread queue module, it is a test I would run but it would not be in the main "unit tests".

The unit tests would cover the parts of the unit that can be tested in a controlled manner. Does trying to dequeue from an empty queue block? Does trying to enqueue to a full queue block? If I enqueue two items, do they dequeue in the expected order? They'd catch those common off-by-one errors, for example.

Part of the point of the unit tests is that they get run nightly and whenever a commit is pushed and before a revision gets passed to QA. A test that requires a CD be played so we get some real hardware interrupts isn't usually part of that mix.

And, yes, I have written thread queues and written tests for them. And, in trying to test for the interesting failures particular to such code, I wrote tests similar to what you wrote, tests that more resemble a load test than a unit test. But, in my experience, the load-like tests were pretty useless at finding bugs, even when running lots of other processes to try to add random interruptions to the mix. Running multiple types of load mixes with no failures would not mean that we wouldn't run into bugs in Production. And a bug being introduced was usually more usefully pointed out by the reproducible tests than by the "when I run the load test it fails".

Unit testing sucks at helping with the interesting failures of things like thread queues. But that also goes back to why I don't use threads much any more. I prefer to use other means that have the benefit of being easier to test reliably.

But I still maintain that if I need a tool to verify my coverage, the module is too damn big.

I don't technically need a coverage tool to tell me which parts of one module isn't covered. But it is very convenient. And, yes, it saves a ton of work when dealing with hundreds of modules that a dozen developers are changing every day. Coverage just provides useful reminders about specific lines of code or specific subroutines that got completely missed by the test suite (sometimes developers get rushed, as hard as that is to imagine) and nothing more.

I rarely have enough time on my hands that I consider reading through hundreds of modules and hundreds of unit tests trying to notice which parts of the former got missed by the latter. And when I'm working on a tiny leaf module, I still figure out "which part did I not test at all yet" by running a command that takes maybe a few seconds to tell me rather than taking a minute or few to swap in every tiny feature of the module and every tiny step tested and perform a set difference in my head.

Particular bad ideas with coverage include: thinking that 100% (or 99%) coverage really means that you've got good test coverage; shooting for 100% (or 99%) coverage as a goal in itself rather than as a tool for pointing out specific missed spots that either shouldn't be tested or should be carefully considered for how they should be tested; adding a stupid test because it causes a particular line to get 'covered'.

Update's response:

And I do not see the distinction here either. Test code is code. You have to write it, test it and maintain it.

No, I don't write tests for my test code. I run my test code. That has the side effect of testing that the test code actually runs. If you call that testing the test code, then you must have some confusing conversations.

And I don't have to maintain code if it isn't being used any longer. But all of my code is used by at least one test file. So, if I don't distinguish then I can't tell that a feature is no longer used (other than by being tested) and so can just be dropped.

And no change to test code is going to cause a failure in Production nor surprise a customer. So who cares about which changes and in what ways is very different between changes to real code and changes to test code.

- tye

Comment on Re^5: Testing methodology (TAP++)

Replies are listed 'Best First'.

Re^6: Testing methodology (TAP++)
by BrowserUk (Patriarch) on Mar 06, 2012 at 17:09 UTC

Well, I guess you haven't done much interesting unit testing?

Of course. That explains it. (yes. I can be just as sarcastic and dismissive as you. You know that. Why go there? <smaller>Knew it was too good to last.</smaller>)

But then, I don't really consider what you keep talking about as a unit test. It is a functional test ...

I tried to find definitions of 'unit testing' & 'functional verification testing' that I thought we might both agree on. As is, I couldn't find any from a single source that I could agree with. And cherry picking two from different sources to make a point would be pointless.

So, I'll state my contention in my terms and let you disagree with it in yours.

Your style of unit testing -- in my terms; laborious, verbose and disjointed -- will not discover anything that my style of unit testing -- in your terms perhaps; functional verification -- will fail to highlight.

But my style of UT will discover every failure that your style might. And much, much more. Therefore, your style of UT is incomplete without some of my style of UT.

Therefore, your style of UT is redundant. A cost for no benefit. Make-work.

You will (have) argue that your unit tests help you track down trivial programming errors -- your cited example off-by-one errors. My contention is that, with the right configuration, my style of UT allows me to track them down just as effectively. Eg.

C:\test>perl async\Q.pm -N=10 -T=2 -SIZE=10
1
2
3
4
5
6
7
8
9
10
10 items by 2 threads via three Qs size 10 in 0.030158 seconds
[download]

I added a single print to the dq loop. (Actually put back; as it was there to start and was removed once proven.)

And I configured the test for 2 threads. Which means that each of the two "pools" gets one each. Thus, the ordering from Q1_n via Qn_n and Qn_1 is deterministic.

So, I started with the simple case, and only increased the workload once the basic functionality was working. I removed the print to kill the (now redundant) noise.

One set of tests to write (and maintain!) that serves both purposes. Cheaper and more cost effective.

And here is the kicker.

The code I posted contains a (quite serious) bug -- put back especially for the purpose.

And the challenge -- which you won't take -- is that no amount of your style of UT testing will ever detect it!

My style of UT makes it trivial to find. (And no, it is not a subtle timing issue or freak lock up or anything else that you can blame on "threading").

Just a plain ol' coding bug.

Betcha can't? (Know you won't! :)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

[reply]
[d/l]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks