Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Testing methodology

by tobyink (Canon)
on Mar 04, 2012 at 17:34 UTC ( [id://957792]=note: print w/replies, xml ) Need Help??


in reply to Testing methodology

You don't provide any documentation for the package, but a rigorous test suite would ideally start by testing that the API for the package does actually exist. i.e. the constructor constructs an object, blessed into the right package; and each public method is callable.

Then you test the basic functionality of the package. In this case, it's a queue, so FIFO. So you'd put some stuff into it, and test that it comes out in the correct order.

A feature of the package is that it has a limited size, so you'd want to test that this limit is enforced - i.e. try enqueuing more items than the size limit, and checking that it blocks. I haven't done any threaded programming in Perl for years, but in my limited experience I'm not sure it's possible to do that test reliably as it may introduce race conditions. Keeping the items enqueued simple (e.g. integers), and sleeping for a second before testing that the queue is blocked seems to be sufficient protection against race conditions.

Here's a test script using no non-core modules (apart from Q.pm of course!)

use Test::More tests => 21; # Test that Q.pm actually compiles. BEGIN { use_ok 'Q' }; # Test that the API as documented still exists. my $q = new_ok Q => [5]; can_ok $q => 'nq'; can_ok $q => 'dq'; can_ok $q => 'n'; # Thread to add some numbers to the queue. my $enqueue = threads->create(sub { $q->nq($_) for 90..99; }); # Vulnerable to race conditions :-( sleep 1; ok !$enqueue->is_joinable, '$q->nq is waiting'; # This breaks encapsulation by peeking at the internals of $q. # But we want to figure out if $q is waiting at '95'. ok !(grep { $_==95 } @$q), '95 is not on $q yet'; # Numbers come out of the thread in the correct order: is $q->dq, $_, "got $_ from queue" for 90..99; # Queue should now be empty, so not waiting for anything. sleep 1; ok $enqueue->is_joinable, '$q->nq is no longer waiting'; $enqueue->join; # Test that "dq" blocks too. my $dequeue = threads->create(sub { # Add up the numbers we get from the queue. my $sum; $sum += $q->dq for 1..10; return $sum; }); # We've not added any numbers to the queue yet, so the queue # should be waiting. sleep 1; ok !$dequeue->is_joinable, '$q->dq is waiting'; # Push some numbers into the queue. These sum to 55. $q->nq($_) for 1..10; # Queue should have recieved all the numbers. sleep 1; ok $dequeue->is_joinable, '$q->dq is no longer waiting'; my $sum = $dequeue->join; is $sum, 55, 'result of calculation performed in $dequeue is correct';

Update: comments on the implementation are welcome...

It would be handy to have a few additional methods:

  • length - the current number of items in the queue.
  • max_length - the maximum number of items allowed in the queue.
  • is_full - sub is_full { $_[0]->length == $_[0]->max_length }
  • peek - return the item at the head of the queue, but without dequeueing.
  • peek_all - return the entire queue as a list, without dequeueing.

Many of the above are trivial to implement by inspecting @$q, however implementing them outside the package itself breaks encapsulation. If people using your module start relying on the internal details of how Q.pm works (that it uses an arrayref, and keeps its stats in the last two array elements, etc), this leaves you less freedom to refactor Q.pm in the future if you discover a more efficient way of doing it.

Not only would the above make the module more testable, they'd also make it more useful. As I said, I don't know an awful lot about Perl threading, but I know a bit about parsing, which also tends to operate on a FIFO queue. Parsers for pretty much any non-trivial language have a peek_token method or two hidden away somewhere, for using tokens further up the stream to disambiguate the current token.

Replies are listed 'Best First'.
Re^2: Testing methodology
by BrowserUk (Patriarch) on Mar 05, 2012 at 02:23 UTC

    Replying to the update only at this time:

    It would be handy to have ...
    • length() -- The module already has method n().

    • max_length() -- You supplied this information to me at creation time.

      It never changes. If you need it, remember it.

    • is_full() -- Show me a use-case?

      One that doesn't involve you polling this method to decide when to push a new value.

      That polling will require the queue to be locked while the value is calculated, and unlocked prior to returning the value to you. That polling will slow down every other producer and consumer. And the value returned will be out of date by the time you get it.

      Therefore there could be no guarantee that if it returns not-full, and you immediate nq(), that it won't block. The information is therefore useless to you.

      Conversely, if you just go ahead and nq(), and it needs to block, it will, and will consume no cpu until the OS wakes it when room is available. (Via cond_signal).

      I doubt you will ever find a realistic use case that will persuade me to add this.

    • peek() -- Again, you could try to convince me with a use-case, but you are unlikely to succeed.

      By the time pq() returned the next value to you, some other thread may have dq()'s it. Then what?

    • peek_all -- There is no possible use-case for this.

      There already is the private method: _state() which effectively does this. It returns the entire internal structure as a string.

      Its intended use is a debugging aid. Indeed, I added it to allow me to track down a timing issue. But even then, for it to be useful, I had to serialise all state transitions, to make it a usable diagnostic. And doing that, by necessity, slowed the throughput to a crawl.

    Not only would the above make the module more testable, they'd also make it more useful.

    Sorry, but I disagree completely with both halves of that statement.

    A queue has one purpose in life. Take things in at one end and let them out at the other as efficiently as possible.

    Compromising the function to make testing easier is not going to happen. Adding could-dos without use-cases, for their own sake, is not going to happen.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Re max_length. Let's say I'm writing a module Amusement::Park which has a list of Amusement::Rides. Each Amusement::Ride has an associated Q. (This is a popular amusement park.) I'm writing the Amusement::Park but have no control over the Amusement::Rides, so don't initialise the Q objects. Good amusement park management says I need to point customers at rides with the least busy queues. To do this I need to calculate length ÷ max_length.

      A use case for is_full is that say I generate events of multiple levels of importance: say errors, warnings and debugging messages. If the queue is full, I might choose to silently drop debugging messages rather than being blocked. Or there might be several queues I could potentially add an item to, and I wish to choose one that is not full..

      peek/peek_all - there may be several producers but only one consumer, so no danger of peeking a value but somebody else dequeueing it.

        Okay. I'll (briefly) join you in analogy land.

        I'm writing the Amusement::Park but have no control over the Amusement::Rides, so don't initialise the Q objects.

        Then query the information from Amusement::Rides.

        If Amusement::Rides creates the queues, then good OO dictates that you shouldn't be going direct to the Qs anyway. If he chooses to expose that information to you, that's his call.

        Amusement::Park ... max_length

        If you need to try and balance your queues, then the calculation n/maxN is completely useless to you. Different rides take different times to run, and accommodate different numbers of riders. Therefore percentage fullness is an completely useless measure of flow.

        And if you try to use it to control flow, you are going to end up with the well-known supermarket checkout/motorway traffic holdup scenario of whichever queue you switch to, the others will always suddenly flow faster.

        Getting back to the real world. Even if your guests wouldn't rebel at being made to queue for the lame-ass World Showcase when they came to go to Spaceship Earth, using multiple queues is completely the wrong way -- and a rookie mistake -- to manage distributing resources to multiple consumers.

        The correct, and only workable way is a single queue. The analogy here -- which may or may not make sense wherever you are in the world -- is the bank or post-office (even some enlightened supermarkets now), queuing system. A single queue and "Go to window(checkout) 8"...

        And finally, I bring you back to the fact that the moment you get your hands on this information, it is out of date. Between the return statement in the method, and you performing your division, you can loose your time-slice on the processor. And by the time you get your next one, the queue(s) you are instrumenting may have emptied (and filled and emptied again if you are not the only producer).

        Its is a simple fact of life with trying to instrument concurrency, that as soon as you have made a measurement, it is out of date. And the worst thing you can do is to try and synchronise threads in order to try and regulate them. It is the very act of trying to turn stochastic systems into deterministic systems that creates all the "evils" of threading -- dead-locking; live-locking; priority inversion et al.

        And the 'cure' for almost all of them is queues. But to work properly, they have to be free-running, bounded and self-regulating. It is that very self-regulation that permits the guarantees -- that producers and consumers will make steady forward progress and, eventually, finish -- that allow the programmer to ignore synchronisation and locking and fairness, because they -- in conjunction with the OS scheduler -- will take care of it for him.

        And the real beauty of the free-running queue is that regardless of whether you have single consumer and multiple-producers; or vice versa; or one of each; and regardless of whether the jobs are of equal size (cpu or elapsed time); or widely disparate sizes; they will be processed in a fair and timely manor.

        About the only mistake -- beyond using multiple queues for the same work items -- is to try and regulate them.

        Other than tailoring their bounds -- the size of the queue. This is usually done very pragmatically by trying a few different sizes on small test sets to find what works best. It can be done more scientifically, but the instrumentation required is not fullness. It is: 'time spent blocking when full' versus 'time spent blocking when empty'. (Producer bound or consumer bound.) It is also possible to use these statistics to adjust the size automatically, but it requires the statistics be measured and evaluated internally (ie. under locks). The downside is that gathering the statistics imposes a considerable penalty

        A use case for is_full is ... errors, warnings and debugging .... If full, drop debugging ....

        My personal reaction is, if you can safely drop them, why are you producing them. That's not a question. The point is, don't do what you do not have to do. And once you've limited to doing only what you have to do, the option to drop goes away. So then, you are either able to run sufficient consumers to ensure that the producer doesn't block too often; or you need a bigger box; or you need to redesign your system.

        The CompSci solution to your scenario is a priority queue. These are a completely different beast to free-running queues and must be implemented in an entirely different way. Perhaps the simplest implementation is to use multiple, free-running queues, one for each level of priority. Producers queue to the appropriate queue. Consumers try the highest priority queue first and drop down until they get work.

        But in reality they do not work well at all. The trouble is that as soon as a consumer has discovered that the high priority queue is empty, it can become non-empty. But the consumer has dropped down to the lowest priority level and is working on an item that is going to take ages, and so the high priority queue starts to go unserviced and fills up. One approach to correcting this is to have the consumers poll the high-priority queue for a short while before dropping down, but it fixes nothing because Sod's Law tells you that the nanosecond it stops polling, is when the high priority job arrives.

        The archetypical example of how priority queues fail is IBM's long defunct SNA. It had priority levels for transmission packets. The trouble is, that if you allow the priority to be user elective, everybody marks their stuff with the highest priority. There is no way to establish actual relative priority -- even if you could have a constant round-table discussion between all parties, they'd never reach an agreement. So, you resort to heuristics. Small means high cos it take no time; big means low because it lots of time; at busy times big just never gets through, so then you get people (me in real life) writing programs that break the huge files I need to send up into zillions of iddy-biddy chunks so they go through fast and can be reassembled at the other end.

        The only half-good solution is to use dedicated queues for each priority and dedicated consumers for each of those queues, and manage them through OS thread priority settings. It won't stop low-priority traffic from building up, but with any modern OS with dynamic (thread) priorities, low-priority queues that are starved of cpu slowly raise their priority until they get a time-slice before dropping back thus ensuring some forward progress for all threads (queues). Any other mechanism fails to ensure that forward progress will be made by all priority levels.

        Or there might be several queues I could potentially add an item to, and I wish to choose one that is not full..

        Another rookie mistake I'm afraid.

        It is always a mistake to have multiple paths (queues) by which a work item may make its way to its consumer(s).

        If those multiple queues lead to the same (pool of) consumer(s), then the general effect of having multiples is of ensuring that one or more queues will fill and be ignored.

        How will the consumers decide which queue to draw from? Even if they pick randomly, they'll eventually pick the empty one and block when the others are all full. If they do it by fullness, all the consumers will pick the fullest queue at the same time, and the last ones in will find it emptied by those that got in first and block, despite that other queues are non empty. And the harder you try to avoid these deadlocks, the worse they will get. (Sod's Law again.)

        And if, by some mechanism, you succeed in getting both your producers and consumers to treat the multiple queues between them utterly impartially and fairly and evenly, then all you have done is create the exact effect of having a single longer queue, except FIFO is no longer guaranteed. You've broken the basic tenant of a queue.

        peek/peek_all - there may be several producers but only one consumer, so no danger of peeking a value but somebody else dequeueing it.

        Okay. What is that one consumer going to do if it peeks the queue and doesn't like what it sees there? Ignore it until it goes away? Oh. That's right. It is the only consumer, so it won't go away until it consumes it.

        So, now you need the peek_all() so that it can cherry pick the items it wants. Except that how can it get its preferred item(s) out of the queue, without removing the top item? So now you need a random access dq. Hm. That's sounding a lot like an array!

        If a consumer can tell from inspecting a dq'd item that it doesn't want to process it immediately, then *it* has the option of storing it somewhere in *its* unshared memory -- where locking isn't required for further access; where its not occupying a shared memory slot; where its decisions will have no impact upon other threads; and where it doesn't mean screwing with the simplicity and efficiency of the queue structure.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re^2: Testing methodology
by BrowserUk (Patriarch) on Mar 06, 2012 at 03:23 UTC

    Firstly, as I already said. Thank you for stepping up. I note that the 'big guns' have ducked & covered, presumably keeping their power dry.

    I've spent the best part of yesterday responding to your test suite section by section, and I just discarded most of it because it would be seen as picking on you, rather than targeting the tools I am so critical off.

    1. # Test that Q.pm actually compiles.
      BEGIN { use_ok 'Q' };

      What is the purpose of this test?

      • What happens if the test is included and the Q module is not loadable?

        The Test::* modules trap the fatal error from Perl, and so the test suite continues to run, failing every test.

        Not useful.

      • What happens if we do a simple use Module; instead.

        We get two lines of output instead of 6. The lines aren't preceded by comment cards so my editor does not ignore them. The test stops immediately rather than running on testing stuff that cannot possibly pass; or is in error if it does.

        Useful.

      • What is actually being tested here?

        That perl can load a module? If it couldn't the Test::* tools wouldn't load.

        Not useful.

        That the tarball unpacked correctly? Perl would tell me that just as reliably.

        No more useful than Perl.

        That the module installed correctly? No. Because when the test suite is run, the module isn't installed. It is still in the blib structure.

        Not useful.

      • And why does it force me to put it in a BEGIN{}? Because without it, I'd have to use parens on all my method calls otherwise they may be taken as filehandles or barewords.

        Worse than non-useful. Detrimental. Extra work because of changed behaviour.

    2. # new_ok
      my $q = new_ok Q => [5];

      This apparently tests whether the return object is of the same class as the name supplied for the class. Why?

      That prevents me from doing:

      package Thing; use if $^O eq 'MSWin32' ? 'Thing::Win32' : 'Thing::nix'; sub new { $^O eq 'MSWin32' ) ? &Thing::Win32->new() : &Thing::nix->new +(); }

      Detrimental. Extra work; limits options.

    3. # Test that the API as documented still exists.
      can_ok $q => 'nq'; can_ok $q => 'dq'; can_ok $q => 'n';
      • What do get if we use this and it fails?
        not ok 1 - async::Q->can('pq') # Failed test 'async::Q->can('pq')' # at -e line 1. # async::Q->can('pq') failed

        Four lines, three of which just repeat the same thing in different words. And the tests continue despite that any that use that method will fail.

        No benefit. verbose. Repetitive.

      • And if we let Perl detect it?
        Can't locate object method "pq" via package "async::Q" at -e line 1.

        One line, no comment card. No repetition.

      Pointless extra work for no benefit.

    4. The rest elided.

    Again, thank you for being a willing subject. Now's your chance for revenge :) Take it!

    Here is my module complete with its test suite:

    #! perl -slw use strict; package async::Q; use async::Util; use threads; use threads::shared; use constant { NEXT_WRITE => -2, N => -1, }; sub new { # twarn "new: @_\n"; my( $class, $Qsize ) = @_; $Qsize //= 3; my @Q :shared; $#Q = $Qsize; @Q[ NEXT_WRITE, N ] = ( 0, 0 ); return bless \@Q, $class; } sub nq { # twarn "nq: @_\n"; my $self = shift; lock @$self; for( @_ ) { cond_wait @$self until $self->[ N ] < ( @$self-2 ); $self->[ $self->[ NEXT_WRITE ]++ ] = $_; ++$self->[ N ]; $self->[ NEXT_WRITE ] %= ( @$self - 2 ); cond_signal @$self; } } sub dq { # twarn "dq: @_\n"; my $self = shift; lock @$self; cond_wait @$self until $self->[ N ] > 0; my $p = $self->[ NEXT_WRITE ] - $self->[ N ]--; $p += @$self -2 if $p < 0; my $out = $self->[ $p ]; cond_signal @$self; return $out; } sub n { # twarn "n: @_\n"; my $self = shift; lock @$self; return $self->[ N ]; } sub _state { # twarn "_state: @_\n"; no warnings; my $self = shift; lock @$self; return join '|', @{ $self }; } return 1 if caller; package main; use strict; use warnings; use threads ( stack_size => 4096 ); use threads::shared; use async::Util; use Time::HiRes qw[ time sleep ]; our $SIZE //= 10; our $N //= 1e5; our $T //= 4; ++$T; $T &= ~1; my $Q1_n = new async::Q( $SIZE ); my $Qn_n = new async::Q( $SIZE ); my $Qn_1 = new async::Q( $SIZE ); my @t1 = map async( sub{ $Qn_n->nq( $_ ) while defined( $_ = $Q1_n->dq + ); } ), 1 .. $T/2; my @t2 = map async( sub{ $Qn_1->nq( $_ ) while defined( $_ = $Qn_n->dq + ); } ), 1 .. $T/2; my $bits :shared = chr(0); $bits x= $N/ 8 + 1; my $t = async{ while( defined( $_ = $Qn_1->dq ) ) { die "value duplicated" if vec( $bits, $_, 1 ); vec( $bits, $_, 1 ) = 1; } }; my $start = time; $Q1_n->nq( $_ ) for 1 .. $N; $Q1_n->nq( (undef) x ($T/2) ); $_->join for @t1; $Qn_n->nq( (undef) x ($T/2) ); $_->join for @t2; $Qn_1->nq( undef ); $_->join for $t; my $stop = time; my $b = unpack '%32b*', $bits; die "NOK: $b : \n" . $Q1_n->_state, $/, $Qn_n->_state, $/, $Qn_1->_sta +te unless $b == $N; printf "$N items by $T threads via three Qs size $SIZE in %.6f seconds +\n", $stop - $start; __END__ C:\test>perl async\Q.pm -N=1e4 -T=2 -SIZE=40 1e4 items by 2 threads via three Qs size 40 in 5.768000 seconds C:\test>perl async\Q.pm -N=1e4 -T=20 -SIZE=40 1e4 items by 20 threads via three Qs size 40 in 7.550000 seconds C:\test>perl async\Q.pm -N=1e4 -T=200 -SIZE=400 1e4 items by 200 threads via three Qs size 400 in 8.310000 seconds

    You'll notice that in addition to performing a default test, it can be configured through command line parameters to vary the key parameters of the test.

    The actual test consists of setting up 3 queues. One thread feeding data via the first queue to a pool of threads (1 to many). That pool dequeues the input and passes on to a second pool of threads via the second queue (many to many). And finally those threads pass the data back to the main thread via the third queue (many to 1).

    The data for a run consists of a simple list of integers. Once they make it back to the main thread, they are checked off against a bitmap tally to ensure that nothing is dequeued twice, nor omitted.

    All in one file; no extraneous modules; no extraneous output; completely compatible with any other test tools available, because it is nothing more than a simple perl script.

    Feel free to rip it to shreds.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      What happens if the test is included and the Q module is not loadable?

      Depends why loading fails. If Q.pm parses OK and executes OK but returns false, then the use_ok test will fail but the other twenty tests will still pass.

      None of your tests cover the situation where Q.pm returns false because you never attempt to "use Q" or "require Q".

      Nothing compels you to put a BEGIN { ... } block around it, but as a matter of style (in both test suites and regular code) I tend to make sure all modules load at compile time unless I'm purposefully deferring the load for a specific reason.

      This apparently tests whether the return object is of the same class as the name supplied for the class. Why?

      No it doesn't. It checks that the returned object is of the same class as the name supplied, or a descendent class in an inheritance heirarchy. This still allows you to return Q::Win32 or Q::Nix objects depending on the current OS, provided that they both inherit from Q.

      To have a class method called "new" in Q, which returns something other than an object that "isa" Q would be bizarre and likely to confuse users. Bizarreness is worth testing against. Tests don't just have to catch bugs - they can catch bad ideas.

      But it can catch bug anyway. Imagine Q/Win32.pm contains:

      my @ISA = ('Q');

      Ooops! That should be our @ISA. This test catches that bug.

      can_ok

      Notice none of my further tests touch the "n" method? Well, at least its existence is tested for. If for some reason during a refactor it got renamed, this test would fail, and remind me to update the documentation.

      I don't think any of your tests check the "n" method either. If you accidentally removed it during a refactor, end users might get a nasty surprise.

      A can_ok test is essentially a pledge that you're not going to remove a method, or not without some deliberate decision process.

      Use of a formalised testing framework can act as a contract - not necessarily in the legal sense - between the developer and the end users. It's a statement of intention: this is how my software is expected to work; if you're relying on behaviour that's not tested here, then you're on dangerous ground; if I deviate from this behaviour in future versions, it will only have been after careful consideration, and hopefully documentation of the change.

      ☆ ☆ ☆

      Overall most of your complaints around Test::More seem to revolve around three core concerns:

      1. Verbosity of output;
      2. That is continues after a failure has been detected rather than bailing out; and
      3. It apparently "forcing you to jump through hoops".

      Verbosity of output has never been as issue for me. The "prove" command (bundled with Perl since 5.8.x) gives you control over the granularity of result reporting: one line per test, one line per file, or just a summary for the whole test suite.

      Yes, you get more lines when a test fails, but as a general rule most of your tests should not be failing, and when they do, you typically want to be made aware of it as loudly as possible.

      The fact that test running continues after a failure I regard as a useful feature. Some test files are computationally expensive to run. If lots of calculations occur, then a minor test of limited importance fails, I still want to be able to see the results of the tests following it, so if there are any more failures I can fix them all before re-running the expensive test file.

      If a particular test is so vital that you think the test file should bailout when it fails, it's not especially difficult to add or BAIL_OUT($reason) to the end of the test.

      my $q = new_ok Q => [5] or BAIL_OUT("too awful");

      Test::Most offers the facility to make all tests bail out on failure, but I've never really used Test::Most.

      One man's "forced to jump through hoops" is another man's "saved from writing repetitive code".

      new_ok saves me from writing:

      my $q = Q->new(5); unless (blessed $q and $q->isa('Q')) { warn "new did not return an object which isa Q"; # and note that the line number reported by "warn" here # is actually two lines *after* the real error occurred. }

      Ultimately if I did ever feel like a particular set of tests wasn't a natural fit for Test::More, there would be nothing to stop me sticking a few non-TAP scripts into my distro's "t" directory, provided I didn't name them with a ".t" at the end. They can live in the same directory structure as my other tests; they just won't get run by "prove" or "make test", and won't be reported on by CPAN testers. It doesn't have be an either/or situation.

        Okay, you're a cool-aid drinker. S'cool.

        I'm not. I don't like the taste.

        The only thing I'll respond to is:

        I don't think any of your tests check the "n" method either.

        True, for a reason: I cannot think of a good use for it. As such it may well go away if I don't find a use for it between now and releasing it. If I ever do.

        The only use I make of Thread::Queue::pending(), relates to preventing the Q from attaining an unbounded size. My queue addresses that internally, so that use goes away.

        Another possible use would be to prevent code from calling dq() when it would block. But as discussed elsewhere, that use is a bust because the information can be out-of-date by the time I get it.

        If there was a use-case for a dq_nb() or similar, then it would have to be implemented internally -- with the test under locks. If I find myself wanting that facility then I'll add it (under some name) and probably drop n().


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      Yeah, I eventually realized that there was almost nothing in Test::More that I was actually using or even found to be a wise thing to use. isa_ok()? I just can't imagine that finding a real mistake. I can certainly see it complaining about an implementation detail that I might change.

      I also don't want to use is_deeply(). I think a test suite should complain about what it cares about. If there is extra stuff that it doesn't care about, then it shouldn't complain.

      And I find Test::Simple to just be a stupid idea.

      But I do use and value Test (though I use my own wrapper around it for a few minor reasons and to provide decent Lives() and Dies() implementations -- better than the several modules that purport to provide such that I've seen). I certainly make frequent use of skip()ing.

      The fact that Test doesn't use Test::Builder is a minor side benefit that becomes a major benefit every so often when I feel the need to look at the source code. Test::More's skip() is so bizarrely defined that I can't actually use it correctly without reading the code that implements it and the act of trying to find said code is so terribly aggravating since Test::Builder is involved, so I'm happy to have realized that I never have to go through that again.

      There are tons of tools built on top of TAP (and other testing schemes such as used by some of our Ruby-based tests). It is actually useful in the larger context for each individual test to get numbered so we can often correlate different failure scenarios and to make concise reports easy.

      And we have more than one test file per code file in many cases. This is especially useful when there are interesting set-up steps required for some tests. Testing leaf modules is the easiest case and usually doesn't really stress one's testing chops.

      Many of my test files abstract a few patterns of test and then run lots of simple tests that are specified with a small amount of data. So, for example, I might have a few dozen lines where each line specifies an expected return value, a method name, and an argument list (and maybe a test description).

      Also, having the test code in the same file as the code being tested would complicate coverage measurement, easily distinguishing commits that are fixing code from commits that are fixing tests, searching for real uses of a specific feature while ignoring tests that make use of it, ...

      But, no, I'm not interested in "stepping up" to your challenge. Many of my reasons would just come across as a personal attack so I'll not go into them. But most of what I'm talking about I can't demonstrate well by pasting a bit of code. I have no interest in trying such a feat.

      - tye        

        There are tons of tools built on top of TAP.

        I just don't get what people get from TAP.

        As a (module/application) user, I don't give a monkeys what passed or failed. Either it passed or it didn't. Nor do I (as a Win32 user) give a flying fig for whether you skipped a thousand tests because I'm not on *nix.

        As a (module/application) programmer, if 90% passed is acceptable, then 10% of the tests are useless.

        If I wrapped ok() around my 'has this value been dequeued before' test, I'd be producing 100,000 (or a 1,000,000 or 100,000,000) OKs.

        Even if the user has configured a tool to suppress or summarise that useless information, it still means 100,000 (...) calls to a function to produce useless output; and 100,000 ( ... ) IOs to the screen or pipe, and 100,000 ( ... ) checks in the harness to throw away what I don't want to start with. My testing therefore takes 10 times as long for no benefit.

        Why do you care about the performance of tests, I can hear some somebodies asking -- especially as I dissed their time/cpu usage statistics. But the problem is, IO goes through the kernel and is (often) serialised. And that completely screws with the statical legitimacy of my testing strategy.

        I have at least half a dozen different implementations of a bounded Q. Some pure perl like this one. Some (in XS) that bypass Perl's Win32 emulation of *nix cond_* calls and use (Win32) kernel locking and synching constructs direct. Some (in C/assembler) that bypass even those and implement locking using cpu primitives.

        Many of them are, or have been at some points, incorrectly coded and will deadlock or live lock. But in almost every case when that happens, if I introduce a few printf()'s into the key routines, they perform perfectly. Until I remove them again or (for example) redirect that trace output to NULL. And then they lock again.

        The reason is that the multi-threaded C-runtime performs it own internal locking to prevent it from corrupting its own internal structures. And those locks can and do prevent the timing conditions that cause the hangs.

        So, for me at least, not only do I not see any benefit in what TAP does, the output it requires can completely corrupt my testing.

        It is actually useful in the larger context for each individual test to get numbered so we can often correlate different failure scenarios and to make concise reports easy.

        As the developer receiving an error report, the first thing I'm going to want to do is convert the 'test number' to the file/linenumber. Why bother producing test numbers in the first place? Just give the user file&line and have him give that back to me.

        The only plausible benefit would be if the test number were somehow unique. That is, if the number of the test didn't change when new tests were added or old ones were removed. Then I might be able to respond to reports from old versions. But that isn't the case.

        And we have more than one test file per code file in many cases. This is especially useful when there are interesting set-up steps required for some tests.

        Hm. Unit tests, test the unit. System, integration and regression tests are different and live in a different place.

        I'm having a hard time envisaging the requirement for "interesting set-ups" for unit testing.

        Many of my test files abstract a few patterns of test and then run lots of simple tests that are specified with a small amount of data.

        Isn't that exactly what my 'has this value been dequeued before' test is doing? (I re-read the para many times and I'm still unsure what you mean?)

        my $bits :shared = chr(0); $bits x= $N/ 8 + 1; my $t = async{ while( defined( $_ = $Qn_1->dq ) ) { die "value duplicated" if vec( $bits, $_, 1 ); vec( $bits, $_, 1 ) = 1; } };

        I see no benefit at all in counting those as individual tests. Much less in allowing the test suite to continue so that the one failure gets lost in the flood of 99,999:

        D'ok 1 - got 1 from queue D'ok 2 - got 2 from queue D'ok 3 - got 3 from queue D'ok 4 - got 4 from queue D'ok 5 - got 5 from queue D'ok 6 - got 6 from queue D'ok 7 - got 7 from queue D'ok 8 - got 8 from queue D'ok 9 - got 9 from queue ... D'ok 99996 - got 99996 from queue D'ok 99997 - got 99997 from queue D'ok 99998 - got 99998 from queue D'ok 99999 - got 99999 from queue D'ok 100000 - got 100000 from queue

        (D'oh! Preachin' agin. Sorry! :)

        Also, having the test code in the same file as the code being tested would complicate coverage measurement,

        Maybe legit on a large collaborative project. But I still maintain that if I need a tool to verify my coverage, the module is too damn big.

        Update: split this quote out from the previous one; and responded separately

        easily distinguishing commits that are fixing code from commits that are fixing tests, searching for real uses of a specific feature while ignoring tests that make use of it, ...

        And I do not see the distinction here either. Test code is code. You have to write it, test it and maintain it. The bug fix that fixed the incorrectly coded test that was reporting spurious errors, is just as legitimate and important as the one that fixed the code under test that was reporting legitimate errors. Treating them in some way (actually, anyway) different is a nonsense.

        And this, (dare I say it?), is my biggest problem with TDD:"The Franchise". It actively encourages and rewards the writing of reams and reams of non-production code. And in most cases, does not factor that code into the costs and value of the production product.

        Try going to your National Project Coordinator, (due in Parliament the following week to explain to the Prime Minister why the project is late and over budget), that the reason everything worked during in-house testing and went belly-up on the first day during the high-profile, closely monitored, €18 million pilot study, was because all the in-house tests had been run with the debug-logging enabled, and that so completely distorted the timing requirements that nobody believed you that in critical areas, the overzealous use of over-engineered OO techniques meant that there was no way that it could keep up with full-production scale loading. The logging was effectively serialising inbound state changes, so nothing broke.

        But, no, I'm not interested in "stepping up" to your challenge.

        From what you've said about (at least some) of the test tools I'm critiquing, you would not have been the right 'big gun' for my purpose anyway.

        Many of my reasons would just come across as a personal attack so I'll not go into them.

        That is a shame. (For me!)

        I don't feel that I respond 'hurt' to critiques of my code. I may argue with conclusions and interpretations; but (I like to think), because of my disagreement with your technical assessment of that code.

        But when you start pseudo-psychoanalysing me on the basis of my code -- or words -- and start attributing their deficiencies (as you see them) to some personality trait indicative of some inherited mental condition, rather than as typos, misunderstandings or dog forbid, mistakes I will take umbrage and will respond in kind.

        This is where we have always clashed.

        But most of what I'm talking about I can't demonstrate well by pasting a bit of code. I have no interest in trying such a feat.

        And, as is so often the case, the most interesting part of your response leaves me with a million questions and wanting more...


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://957792]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (1)
As of 2024-04-25 19:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found