Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks"

I think you are more likely to get feedback if you move the code on your scratchpad to the body of your RFC. Just surround it by <readmore> tags so it doesn't distract attention from your introductory question. The comments here won't make any sense if you delete or modify your scratch pad contents at some point in the future. Posting the draft here will make it easier for people coming after you to learn from the dialog in these replies.

The metaphor of a circuit board with test points appeals to me, but I'm having trouble envisioning situations where I would actually use this. I think a tutorial would be significantly strengthened by a more in depth discussion of when this is approach is most valuable, especially in light of alternative design, debugging and testing techniques.

The main difficulty is that one needs to know in advance where one would want to have permanent test points. To make something part of the class itself, it needs to be something one is testing for long term reasons. That rules out testing motivated by "I'm not sure I understand how this Perl function/syntax works" and testing motivated by the hunt for logic errors.

Once logic errors are fixed, there isn't much point in keeping the code used to find it sitting around. What one needs instead is preventive measures. An in-code comment discussing any tricky points of logic will hopefully keep someone from erasing the fix and regressing the code back to its buggy state. A regression test should also be defined as a double check to ensure that future maintainers do in fact read the comments and don't reintroduce the error.

A permanent test usually looks at object state or the interaction of object state and the environment. This raises the quetion: is there a way you can design your object so that test points aren't needed, or at least needed less frequently? I suspect the answer is sometimes yes and sometimes no. A tutuorial should discuss the situations where design cannot eliminate the need for test points.

I would tend to think that the times when one most needs permanent test points is when one needs to test state in real time during a process that interacts with a changing environment. For example, if a user complains that hitting a certain key causes the software to spew garbage, one might want to have them run the software in a debugging mode and see what output gets generated by embedded test points.

However, if one is using test points as part of a static test suite, I'd suspect the need for test points is related more to the design than an actual need for a test point. Sometimes, if we are working with a legacy design, we may not have any option but to stick with it, so test points might also be helpful there. However, if one can come up with a design that eliminates the need for test points, I think that is good. Such designs tend to be more modular and easier to maintain.

In my own case, I try to define objects so that any information I need to test the state of the object is publicly accessible at least in read only form. I also try to break down long complex functions into small testable units. I test each of the component steps and not just the entire mega routine. This reduces the need for intermediate test points within a routine, except when I'm trying to track down a logic error. As discussed above, tracking down a logic error doesn't justify a permanent test point.

I also work hard to build my software from immutable objects wherever possible. Instead of littering each class with setter methods, I'll write a class that gathers input, passes it to a constructor and spits out the immutable object. The input gathering class will need careful tests on state, but the other objects can be checked after creation and then we're done. However, I wouldn't need a test point to check state because my state is normally exposed via the aforementioned readonly methods.

Exposing object state via a readonly method, can't hurt the integrity of the data. The only possible downside is that some developers are fond of hunting down and using undocumented methods. Most likely if a method exposes private data I don't want it to be part of the public API because the organization of private data can change over time.

Maybe there are better or actually more standard "best practice" ways of doing this?

I don't know that there "better" techniques but there are related techniques. This technique bears some similarity to assertions. Assertions are also permanent additions to the code that let one potential peek at problem areas deep within the code. There are however two key differences - one that adds options and another that takes them away:

An assertion is a hard coded check. Your test point is a fill-in-the-blank check. Depending on your needs you can change its content.
An assertion is part of a method's code and so can check automatic ("my") variables private to the method. A testpoint subroutine can only check class /object state.

You could, of course, modify your technique slightly and enable testing of automatic data. Just define your testpoint routine to accept parameters passed in by the method. However, that means that you would have to anticipate what parameters were being passed in and design separate test point methods for each combination of passed in paramters. At that point you are getting awfully close to the very problem you are trying to avoid - having to define and sprinkle special case debug/warning statements all through your code.

You would still retain one benefit: the ability to cleanly encapsulate your debugging code. However, there are alternatives, especially for temporary debugging code. I often surround lengthy temporary debugging code with

   #DEBUG - BEGIN
   .... code here ...
   #DEBUG - END
[download]

When I am ready to remove it or comment it out, I just grep my source files for #DEBUG.

Update Expanded comments on design vs. test points.

Comment on Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" Select or Download Code

Replies are listed 'Best First'.
Re^2: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" by ait (Hermit) on Jan 02, 2011 at 17:48 UTC
I think you are more likely to get feedback if you move the code on your scratchpad to the body of your RFC.... Makes perfect sense, although I was trying this time to faithfully follow the instructions here Tutorials. But I will do as you recommend because it seems to make more sense. I really appreciate you have taken the time to write this. The circuit-board analogy is because I come from a Electronic Engineering background, but the inspiration and implementation techniques actually came from stuff I had picked up from Mark Jason Dominus' HOP. Anyway, the topic at hand is interesting because it's related to balancing design, testing, assertions (as you well pointed out), diagnostics and debugging. Much like in electronics, you run the tests (whether it's a test-bench or embedded self-tests) but if a board fails, you have to test the circuit at each stage to see what is going on. Newer circuits that use I2C actually perform very granular self-tests in an automated fashion and also auto-diagnose themselves (e.g. most modern Phillips appliances do this). So whether this is testing, diagnosis or debugging seems to be actually dependent on the granularity of the test, and ultimately in the eye of the beholder. A test will tell you that something failed, a diagnosis may dig deeper to help you pin-point the problem. In the Perl testing harnesses, you could probably accomplish this by conditioning more granular tests only if a higher level test failed (regardless if integral or unit test). More commonly as you point out, by unit testing each low-level component and then testing the aggregates, assuming of course, that the code is sufficiently factorized to accomplish this, and this is a again by design, as you also point out. Back to the circuit analogy: you may have a distinct overall function, say the horizontal drive in a TV set (maybe analogous to a class). This in turn might be divided into 3 distinct parts: a) the h-sync circuitry, b) the middle-stage drivers, and c) the horizontal transistor that drives the fly-back primary (a,b,c maybe analogous to methods/functions or could also be modeled as an association [i.e. simple, aggregation or composition] depending on design needs). The I2C self-test (analogous to integral or unit tests) may indicate a problem with the voltage feeding the middle stage drivers, but within that part of the circuit you have several other tests points (not visible to the I2C) that are used to further diagnose the problem by measuring voltages or by looking at reference wave patterns (analogous to manual debugging). As to how far/deep the unit tests go, is again by design, and design in turn is based on needs, so I don't think there is any definite correct design rule here. Some I2C tests are so particular, that will many times immediately tell you which particular component needs to be replaced, other times it just points out the circuit/section and you have to place your oscilloscope lead in the test-points to figure out which component is failing. So how far/deep the test must also go to diagnose the failure depends on the granularity of the code, which is a design constraint, in turn based on real-world needs. This may be similar to how much normalization is actually needed, or practical, in an RDBMS model, which many times depends on the real-world needs of that particular system. Furthermore, real-world performance issues may also force you to de-normalize an RDBMS model, or to rethink a complete layer entirely. For example, you may arrive at the conclusion the RBDMs suck at querying real-world objects, so you incorporate another element, say for example CouchDB, that better handles the de-normalized real-world object queries, completely separating the write and read paths. Now, before I divert too far OT, maybe I should start by explaining why this need arose in the first place, and perhaps this may shed some light on whether the test-point analogy makes any sense, or is of any use for Perl automated testing or not. This idea came to me because I was having a hard time debugging a long function that scores the matching of text fields using Wagner-Fisher. The function in question calculates the individual scores of several text fields which are compared to several text fields in several different groups (from an XML source). Then, the best scores from each group are selected and narrowed down to the individual best match of each group. The function itself is a helper of 2 other functions that take these results and select the best match based on the average score. There is no recursion, but a lot of sub-processing of XML data using XPath and iterating through the different groups to calculate the individual scores for each field in each group and feeding that result to the other functions that in turn aggregate the results and help narrow down to the best match. So you see, the code is already sufficiently factorized into 3 functions but the scoring function although long, makes little sense to break up into smaller functions, or to encapsulate in smaller classes (though with enough time and budget this may not necessarily hold true). The reason I implemented this debugging/testing technique is to make sure that we were scoring correctly at every step, and that when we add new logic, I could make sure that the programmer who later touched this code (or myself) would not screw up the not so complex but lengthy scoring algorithms. This is because previsions were left in the code for additions or modification of groups, fields, scoring rules and these have been changing since we've put the product through beta testing with key customers. I agree with you, this code is far from perfect, but this test point / granular testing (diagnosing or debugging) technique has proved very useful at this stage of development, and it's probably more a question of choosing the right title for it. Your comments, and this comment Re: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" by swon made me re-think this if this is actually a testing or a diagnosing/debugging technique. In my particular case, I think it's both, and that may lead to some interesting discussions and conclusions here. On your comments in particular I think that you have a strong point that this could be avoided with better code design, and that probably holds very true. On the other hand, budget and time constraints many times don't allow to design every class and library perfectly up front, so we must all take a more iterative approach to this, and fine grain testing like this has proven instrumental in iterating and evolving this particular class. Another interesting fact is that many times architectural constraints don't allow for "ideal" modeling in any single paradigm, this case being a particularly good example. This library is a Model class for an application in Catalyst, so a basic design constraint is to try to leave as much code pre-loaded in a Catalyst "instance" and only create new objects that are specific to an individual HTTP request. This means that even though the overall design pattern is OO with Moose, the "instance" classes are more functional libraries than anything else. Also, we have to account for different deployment techniques such as mod_perl with pre-forked processes or using threads with mod_worker where the non-mutable data of the interpreter is shared amongst threads. In this case, 'ideal' object modeling would represent a huge performance penalty in having to instantiate objects with a lot of code, so the objects in this design are light-weight objects that have a per-request lifespan. The instance code on the other hand, has to make sure we don't have any global data (object attributes) that would create a problem when using the methods (functions) of these long-lived objects that are more akin to an OS shared object (aka dll). This of course does not excuse the fact that these model "instances" could benefit from better design choices to begin with, and I agree. Which reminds me of when I wrote RFC Mocking an Accessor with Test::MockObject (also inspired by test code of a Catalyst application) chromatic said "If you have to mock an accessor, you're probably mocking too much.", and he was right. Because after giving it further thought, I realized that it was better to completely separate my Model classes from those of Catalyst (eliminating much of the need to mock in the first place), and then integrate them back to Catalyst using a very thin layer of "Catalyst Model" classes. Of course, if I would have carefully RTFM on "Extending Catalyst", I would have noticed this recommendation clearly spelled out ;-). Then again, the mocking of accessors technique proved to be equally useful later on. At this point my conclusion is that a change of title and a bit of generalization might better classify this technique, although in the end it may prove not to be very useful after all, who knows. Maybe something like "Using Test-Points for Better Diagnostics", "Adding Diagnostics to your Tests using Test-Points", "Adding Granularity to your Tests with Test-Points", or something along those lines.	[reply]
Re^3: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" by ELISHEVA (Prior) on Jan 02, 2011 at 18:25 UTC
There is no recursion, but a lot of sub-processing of XML data using XPath and iterating through the different groups to calculate the individual scores for each field in each group and feeding that result to the other functions that in turn aggregate the results and help narrow down to the best match. You sound like you might be limited in the amount of refactoring you can do, but there is one technique I'd like to share, just in case it might fit and you find some opportunity to do some refactoring. Oftentimes, when a function turns big long and ugly and it doesn't make sense to refactor, the chief culprit is a large amount of state data shared among the different steps of a function. The classic example of such a function is a giant loop that is pushing and popping a stack in lieu of recursion, but there are other examples as well. In these cases I have often found it quite helpful to design and build a functor object. This is an object whose data is all that ugly automatic data shared throughout the function (in your case, it might be something like the current state of your scoring variables). Then I define a set of small focused functions each reading and setting the shared data as needed. This eliminates the need to pass lots of data between methods, but still allows me to break the long ugly function into small conceptual chunks. Nearly every time I do this, I watch the algorithm and all its problem areas unfold before my eyes. When each chunk of the algorithm has the breathing room to live in its own subroutine, I often become aware of a set of edge cases and conditions that ought to have been checked for but weren't. Something about code living in its own little home seems to invite a closer look at the algorithm and all of its associated if, ands, and buts. Furthermore, there is no longer a concern about all of those hairy conditions clouding up the overall logic because they are nicely encapsulated in a subroutine. Another advantage of this approach it that it becomes much easier to run just a part of the algorithm. If you want to run everything, you call the functor's run method. Since little data is being passed from subroutine to subroutine, this "run" method starts looking like a list of steps. Depending on your granularity needs you can sometimes break the list into a part A, B, and C and make each of those a subroutine and then run just a part of the algorithm, do some tests, then run another part. This would be much much harder to do if A, B, and C were wired together by data passed from one to another via parameters rather than shared from within the object. Just a thought. Update: explaining how a functor object can affect the ability to stop and start a process.	[reply]
Re^4: RFC: Tutorial "Introspecting your Moose code using Test Point Callbacks" by ait (Hermit) on Jan 03, 2011 at 00:26 UTC
Thanks again for your valuable input. I will definitely consider your functor recommendation. Regarding the use of callbacks to test or diagnose specific spots inside a function, do you see any situation where it can be useful? Or do you consider that it always boils down to a design or refactoring issue? Lastly, did my comments on these sub-tests as a way to diagnose a failed test make any sense?	[reply]


go ahead... be a heretic
	PerlMonks