Coding Rationally—Test Driven Development

Computer programming can be a lot of fun, or it can be brain-rendingly frustrating. The transition between these two states often goes something like this:

Paula the Programmer: Computer, using the “Paula’s_Neat_Geometry” library, draw a triangle near the top of the screen once per frame.
Cronus the Computer: Sure, no problem.
P: After drawing that triangle, draw a rectangle 50 units below it.
C: Will do, boss.
P: Sweet. Alright, after the rectangle, draw a circle 20 units to the right, then another 20 units to the left.
C: GLARBL GLARBL GLARBL I hear it’s amazing when the famous purple stuff wormed in flap-jaw space with the tuning fork does a raw blink on Hari Kiri Rock! I need scissors! 61!1 System error.
P: Crap! Crap crap crap. Um, okay, let’s see...

And then Paula must spend the next 45 minutes turning the circle drawing code on and off and figuring out where the wackiness originates from. When the circle code is off, she sees everything work fine. When she turns it back on, she sees everything that she thought she understood so well, that she was previously able to manipulate with the calm joyful deftness of a virtuoso playing a violin, turn into a world of mystery and ice-slick confusion. Something about that request to draw that circle at that particular time and place is exposing a difference between Paula’s model of the computer and the computer’s reality.

When this happens to a programmer often enough, they begin to realize that even when things seem to be working fine, these differences still probably lurk unseen beneath the surface, waiting invisibly to strike. This is an unsettling feeling. As a technique of rationality, or just because being uncomfortable is unpleasant, they seek diligently to avoid creating these cross-model inconsistencies (known colloquially as “bugs”) in their own code, so as to avoid subjecting themselves to GLARBL GLARBL GLARBL moments.

Having a sincere desire to be less wrong in one’s thinking is fine, but not enough. One also needs an effective process to follow, a system for making it harder to fool oneself, or at least for noticing when it’s happened. Test Driven Development is one such system; not the only one, and not without its practical problems (which will be at most briefly glossed over in this introductory article), but one of my personal favorites, primarily because of the way it makes me feel confident about the quality of my work.

Why Computer Programming Requires Rationality

Computer programming is the process of getting a messy, incomplete, often self-contradictory, and overall badly organized idea out of one’s head and explaining it completely and thoroughly to a quite stupid machine that has no common sense whatsoever. This is beneficial for the users of the program, but also for the programmer, because the computer does not have a programmer’s human biases, such as mistaking the name of an idea with an understanding of how that idea works.

It has been said that you only truly understand how to do something when you can teach a computer how to do it for you. This doesn’t mean that you have to understand the thing perfectly before you can begin programming; the process of programming itself will change and refine the idea in the programmer’s mind, chipping away rotten bits and smoothing connections as the idea moves piece-by-piece from the programmer’s mind into a harsh reality that doesn’t care about how neat something sounds, just whether or not it works.

Through the process of explaining the problem and solution to the computer, the programmer is also explaining it to themselves, checking that that explanation is correct as they go, and adjusting it in their own minds as necessary to make it match.

In a typical single-person development process, a programmer will think about the problem as a whole, mentally sketch out a framework of the tools and structures they will have to write to make the problem solvable, then begin implementing those tools in whatever order seems most intuitive. At this point, great loud alarm bells should be ringing in the heads of Less Wrong readers, indicating that this is a problematically far-mode way to go about things.

Why Test Driven Development Is Rational

The purpose of Test Driven Development is to formalize and divide into tiny pieces that part right before a programmer starts writing code: the part where they think about what they are expecting the code to do. They are then encouraged to think about each of those small pieces individually, in near-mode, using the following steps:

RED: Figure out what feature you want to add next; make it a small feature, like “draw a triangle”. Write a test, a tiny test, a test that only checks for the one new feature, and that will only pass if the feature is working properly. This part can be hard if you didn’t really have a clear idea of the feature in the first place, but at least you’re dealing with that difficulty now and not when 20 other things in the program already depend on your slightly flawed understanding. Anyways, once you’ve written the test, run it and make sure it fails in the expected manner, since the feature hasn’t actually been implemented yet.

GREEN: Now actually go and write the code to make the test pass. Write as little code as possible, with minimum cleverness, to make this one immediate goal happen. Don’t write any code that isn’t necessary for making the test pass.

REFACTOR: Huzzah, the test passes! But the code has some bad smells: it’s repetitious, it’s hard to read, it generally creates a feeling of creeping unease. Make it clean, remove all the duplicated parts, both in the test and the implementation.

BLISS: Run all the prior tests; they should still be green. Feel a sense of serene satisfaction that all your expectations continue to be met; be confident your mental model of the whole program continues to be a pretty close match. If you have a version control system (and you really should), commit your changes to it now with a witty yet descriptive message.

Working piece by tiny piece, your code will become as complicated as you need it to be, but no more so. You are not as likely to waste time creating vast wonderful code castles made of crystal and silver that turn out to be pointless and useless because you were thinking of the wrong abstraction. You are more likely to notice right away if you accidentally break something, because that something shouldn’t be there in the first place unless it had a test to justify it, and that test will complain.

TDD is a good anti-akrasia technique for writing tests. Classically, tests are written after the program is working, but such tests are rarely very thorough, because it feels superfluous to write a test that already tells you what you (think that you) know, that the program works.

TDD is also helpful broadly fighting against programming akrasia in general. You receive continuous feedback that what you are doing is accomplishing something and not breaking anything. It becomes more difficult to dawdle, since there’s always an immediate short-term goal to focus on.

Finally, for me and for many other people who’ve tried it, TDD makes programming more fun, and more satisfying. There’s nothing quite like the feeling of confidence that comes from knowing that your program does just what you think it does.

Or, well, thinking that you know.

Why Test Driven Development Isn’t Perfect

Basking innocently in the feeling of the BLISS stage, you check your email and get an angry bug report: when the draw color is set to turquoise, instead of rectangles your program is drawing something that looks vaguely like a silhouette of Carl Friedrich Gauss engaged in a swordfight against a trout. What’s going on here? Why wasn’t this bug caught by the tests? There’s a “Rectangles_Are_Drawable” test, and a “Turquoise_Things_Are_Drawable” test, and they both pass, so how can drawing turquoise rectangles fail?

Something about turqouiseness and rectangleness is lining up just right and causing things to fall apart, and this outcome is certainly not predicted by the programmer’s mental model of the program. This means that either that something in the program is not actually being tested at all, or (more likely) that one of the tests doesn’t test everything the programmer thinks it does. TDD (among its other benefits) does reduce the chance of bugs being created, but doesn’t eliminate it, because even within the short near-mode phases of Red-Green-Refactor-Bliss there’s still opportunity for us to foul things up. Eliminating all bugs is a grand dream, but not likely to happen in reality as long as the program isn’t dead simple (or formally verifiable, but that’s a technique for another day).

However, because we can express bugs as testable assumptions, TDD applies just as well to creating bugfixes as it does to adding new features:

RED: Write a new test “Turquoise_Rectangles_Are_Drawable”, which sets the color to turquoise, tells the library to draw a rectangle, and makes sure a rectangle and not some other shape was drawn. Run the test, it should fail. If it doesn’t, then the bug report was incomplete, and the situation that needs to be setup before Gauss is drawn is more elaborate.

GREEN: Figure out what’s making the bug happen. Fix it. Test passes.

REFACTOR: Make the fix pretty.

BLISS: The rest of the program still works as expected (to the degree that your expectations were expressed, anyways). Also, this particular bug will never come back, because if someone does accidentally reintroduce it then the test that checks this newly created expectation will complain. Commit changes with a joke about Gaussian blurring.

Why Test Driven Development Isn’t Always Appropriate

A word of warning: this article is intended to be readable for people who are unfamiliar with programming, which is why simple, easily visualized examples like drawing shapes were used. Unfortunately, in real life, graphics-drawing is just the sort of thing that’s hardest to write tests for.

As an extreme example, consider CAPTCHA, software that tries to detect whether a human being or a spambot is trying to get an account on your site by asking them to read back an image of squirrelly-looking text. TDD would at best be minimally useful for this; you could bring in the best OCR algorithms you have available and pass the test if they *cannot* pull text out of the image… but it would be hard to tell if that was because the program was producing properly hard-to-scan images, or because it was producing useless nonsense!

It’s part of a larger category of things which are hard to automatically test because their typical operation involves working with a human, and we can’t simulate humans very well at all (yet). Any program that’s meant to interact with a human, and depend upon that human behaving in a sophisticated human way (or in other words, any program that has a user interface which isn’t incredibly simple), will have difficulty being thoroughly tested in a non-brittle way. This problem is exacerbated because user interfaces tend to change significantly as they are subjected to usability testing and rethought, necessitating tedious changes in any tests that depend on their specifics. That doesn’t mean TDD isn’t applicable to such programs, just that it is more useful when working on their inner machinery than their user-facing shell.

(There are also ways of minimizing this problem in certain sorts of user interface scenarios, but that’s beyond the scope of this article.)

Test Driven $BEHAVIOR

It is unfortunate that this technique is not more widely applicable to situations other than computer programming. As a rationalist, the process of improving my beliefs should be like TDD: doing one specific near-mode thing at a time, doing checks they can definitively pass or fail, and building up through this process a set of tests/​experiments that thoroughly represent and drive changes to the program implementation, aka my model of the world.

The major disadvantage my beliefs have compared to a computerized test suite is that they won’t hold still and be counted. I cannot do an on-demand enumeration through every single one of my beliefs and test them individually to make sure they all still hold up; I have to rely on my memories of them, which might well be shifting and splitting up and making a mess of themselves whenever I’m not looking. I can do RED and GREEN phases on particular ideas when they come to mind, but I’m unfortunately unable to do anything like a thorough and complete BLISS phase.

This article has partly been about introducing a coding technique which I think is pretty neat and of relevance to rationalists, but it’s also about leading up to this question that I’d like to ask Less Wrong: how can I improve my ability to do Test Driven Thinking?


1. This bit of wonderfully silly text is from Konami’s Metal Gear Solid 2.