Sunday, June 5, 2016

Abstraction Considered Harmful (In Unit Tests)

If you were building an ALU, how would you test addition?  You'd probably write tests like

# Figure 1.
self.assertEqual(1 + 2, 3)
self.assertEqual(1 + (-2), -1)

etc. etc.  But those tests are exactly for having those numbers.  It's very specific to the use case.  What your ALU really does is add numbers, and so maybe that's what it should check?  So why not write a test like this:

# Figure 2.
for x in range(MIN_INT, MAX_INT):
    for y in range(MIN_INT, MAX_INT):
        self.assertEqual(x + y, x + y)

Now that test tests what your code does, not just specific use cases.  But there's an obvious problem: that test does absolutely nothing.  If there's a bug in the ALU, both expressions will return the same wrong answer, and the test will pass.  So how about a test like this?


# Figure 3.
def my_alternative_addition_function(x, y):
    # The best software arithmetic code ever

def test_addition():
    for x in range(MIN_INT, MAX_INT):
        for y in range(MIN_INT, MAX_INT):
            self.assertEqual(
                x + y, 
                my_alternative_addition_function(x, y)
            )

That code most definitely is testing something.  And it’s definitely better than the code in Figure 2.  It’s testing what the ALU does, not a specific use case.  So it looks like a pretty good test, right?

Maybe, but I would argue that there’s still a problem.  The trouble is that it’s very tempting to use similar procedures in my_alternative_addition_function(), so that the code in your “alternative” function looks very similar to the code that you’re effectively using in your ALU.  If that’s the case, then the code in Figure 3 is not really much better than the code in Figure 2; the bugs in your ALU are likely to be present as well in your “alternative” function.

Sure, if you (or someone else) later changes the ALU, then this test might catch the regression.  On the other hand, this test won’t catch the bugs in the first version of the ALU; also, when the test fails in the future, the code maintainer will be tempted to fix the test by “fixing” the alternative function.  After all, tests need to be updated when code changes, right?

What if you make sure that the code for my_alternative_addition_function() is totally different than the code effectively used by your ALU?  What if you use a different addition algorithm, or hand the test off to a third party who’s never seen your ALU design?  Would that be sufficient?

Maybe, but maybe not.  If the test fails, you don’t know whether the bug is in the ALU or in the alternative addition function.  And the temptation is always there to fix the test by “fixing” the alternative, even when the real bug is in the ALU, thereby breaking them both, and effectively putting us back where we started, in Figure 2.

That’s why the code in Figure 1 is probably the best.  Unit testing is about test cases.  Production code is abstract, because it’s supposed to implement a single abstract contract for an effectively infinite domain of inputs.  What unit tests do is to select a meaningful, representative set of sample inputs, where the expected output is known in advance.  Most good unit tests specify that expected output precisely; that is to say, the output is “hard coded”.

Some suites of unit tests might have a mix of abstractions and concrete cases.  Sometimes, expressing expected outputs concretely can actually be much more complex than making abstract assertions, and is not worth the effort.  But we shouldn’t lose sight of the fact that when it comes to unit tests, hard coding is good, and abstraction is at best a mixed blessing.

(Note: if you missed the literary reference from the title, see https://en.wikipedia.org/wiki/Considered_harmful)

No comments:

Post a Comment