Thursday, February 24, 2011

reactions to unit test failures

I suspect that deepest dedication to a suite of automated unit tests isn't proven by having a test that touches each public class. Nor by a test for each public method. Nor by a test coverage tool that reports a proportion as close to 100% as is reasonable for that codebase. Nor even by strict test-first methodology.

The deepest dedication is only proven later in the code's life, when changes happen. Code changes of significance should lead to test failure or stark test compilation (or interpreter-parse) failure. That moment is when dedication takes effect. What will be the programmer's reaction? If it's a tiny test that broke, then a rapid and focused adjustment to the test or the code is likely to be the happy outcome. (Sidebar: this argues for not skipping trivial tests that probably are redundant to big tests. A failing trivial test is easier to interpret than a failing big test!) If it's a complex test or a large subset of tests that failed, then the reaction might not be as, er, placid.
  • The worst reaction is to impulsively eliminate the failing test(s). Better but not by much is to turn the test(s) into comments or otherwise force a skip. A disabled/skipped test is always a temporary measure of compromise to reduce mental clutter during the thick of an intense task. It carries an implicit promise to enable the test at the next possible opportunity. Excessive distracting nagging is awful but permanently removing a safety net clearly falls into the "cure worse than disease" category.
  • Assuming the motivation for the code change was a real change in requirements rather than code refactoring and improvement, then direct elimination may be correct. Before doing so, remember that unit tests act like executable specifications for that unit, and ask yourself "Does this test correspond to a code specification that still applies to the changed code but in a different form?" When the answer is "yes", the test should be replaced with a corresponding test for the transformed specification. Consider previous tests that caught corner cases and boundary conditions. If an object previously contained a singular member, but due to changes in the problem domain it now contains a collection, then the test for handling a NullObject singular member might correspond to a replacement test for an empty member collection. 
  • On the other hand, whenever the change's purpose is to improve the code while leaving intact all existing functions/interfaces of importance, elimination or fundamental rewrites aren't the right course. The test stays, regardless of its inconvenience in pointing out the shortcomings of the redesign. The right answer may be to rethink part of the code redesign or in a pinch to add on to it in some small way with some unfortunate adapter code until other modules finish migrating. Sometimes a big fat legitimate test failure is the endpoint and "smoking gun" of an evolutionary mistake of the code, and the professional reaction is to disregard personal/emotional attachment by cutting off or reshaping the naive changes. Never forget that to users the code is a semi-mysterious black box that fills specific needs. Sacrificing its essential features (rather than unused feature bloat) is too high a price for code that's more gorgeous to programmers. Granted, skillful negotiators can counter by pledging sophisticated future features that the redesigned code will support, in which case the pledges must turn out to be more than vaporware for the trick to ever work again.
  • With any luck, the ramifications are not so dire. A confusing unit test failure may not be a subtle lesson for the design; it may be nothing more than a lesson to write more (small) tests and/or test assertions. It seems counterintuitive to throw tests at failing tests, yet it makes a lot of sense given that tests are coded expectations. In effect, confront the failing test by asking "What did I expect?" immediately followed by "Why did I expect that?" Expectations build on simpler expectations. Attack the expectation in top-down step-wise analysis. The expected final outcome was 108, because the expected penultimate outcome was 23, because the expected count was 69, etc. Write tests for those other, lesser expectations. Now the tests narrow down the problem for you at the earliest point of error, as if being an automatic debugger with predefined breakpoints and watch-expressions.
  • It's a well-known recommendation to write an additional unit test for a bug discovered "in the wild". This test confirms that the bug is fixed and then reconfirms that the bug doesn't resurface, assuming frequent runs of the entire suite. After a few unsuccessful tries at passing this novel test, don't be too rigid in your thought habits to ponder the possibility that the untested test is buggy! In the prior items my encouragement was to not react by blaming the tests, since an unmodified test that passed before a code change and fails afterward logically indicates that what changed, i.e. the code, must be to blame. Philosophically man is the measure of all things and a unit's tests are the measure of the unit. Not so during the introduction of a test. At this special time, the test isn't a fixed ruler for measuring code errors. It's its own work in progress in a co-dependent relationship with the code it measures. Initially the code and the test are at danger of dragging the other down through bugs. A buggy test is a false premise that can lead to a false conclusion of fine code that appears to be buggy or worse buggy code that appears to be fine. Be careful to write tests as minimal, unassuming, and straightforward as is practical. Complex tests that check for complex behavior are acceptable (and hugely important!). Complex tests that are intended to check for simple behavior are less justifiable and trustworthy. Tests are miniature software projects. The more convoluted and intricate and lengthy a test becomes, the greater opportunity for bugs to sneak in and set up shop.

No comments:

Post a Comment