Rocket science and big money—a cautionary tale of math gone wrong

The 2006 report from NASA’s “Independent Verification and Validation Facility” makes some interesting claims. Turning to page 6, we learn that thanks to IV&V, “NASA realized a software rework risk reduction benefit of $1.6 Billion in Fiscal Year 2006 alone”. This is close to 10% of NASA’s overall annual budget, roughly equal to the entire annual budget of the International Space Station!

If the numbers check out, this is an impressive feat for IV&V (the more formal big brother of “testing” or “quality assurance” departments that most software development efforts include). Do they?

Flaubert and the math of ROI

Back in 1841, to tease his sister, Gustave Flaubert invented the “age of the captain problem”, which ran like this:

A ship sails the ocean. It left Boston with a cargo of wool. It grosses 200 tons. [...] There are 12 passengers aboard, the wind is blowing East-North-East, the clock points to a quarter past three in the afternoon. It is the month of May. How old is the captain?

Flaubert was pointing out one common way people fail at math: you can only get sensible results from a calculation if the numbers you put in are related in the right ways. (Unfortunately, math education tends to be excessively heavy on the “manipulate numbers” part and to skimp on the “make sense of the question” part, a trend dissected by French mathematician Stella Baruk who titled one of her books after Flaubert’s little joke on his sister.)

Unfortunately, NASA’s math turns out on inspection to be “age-of-the-captain” math. (This strikes me as a big embarrassment to an organization literally composed mainly of rocket scientists.)

The $1.6 billion claimed by NASA’s document is derived by applying a ROI calculation: NASA spent $19 million on IV&V services in 2006, and the Report further claims that IV&V can be shown to have a 83:1 ROI (Return on Investment) ratio. Thus, $19M times 83 gives us the original $1.6 billion. (The $19M is pure personnel cost, and does not include e.g. the costs of the building where IV&V is housed.)

What is Return on Investment? Economics defines it as the gain from an investment, minus the cost of investment, divided by (again) the cost of investment. An investment is something you spend so as to obtain a gain, and a gain is something caused by the investment. This isn’t rocket science but basic economics.

But how does NASA arrive at this 83:1 figure?

NASA IV&V’s math

NASA relies on the widespread claim that in software efforts, “fixing a bug later costs more”. Specifically, it focuses on the costs of fixing software defects (as they’re more formally known) at the various “phases” often said to compose a project: requirements, design, coding, unit test, system test, or “in the field”. For instance, it supposedly costs on average 200 times as much to fix a defect in the field than it does at the Requirements stage. (I have debunked that claim elsewhere, but it does yet enjoy a relatively robust status within academic software engineering, so we can’t fault NASA for relying on it back in 2006.)

NASA counted 490 “issues” that IV&V discovered at the requirements stage of the Space Shuttle missions, during some unspecified period between 1993 (the founding of the IV&V Facility) and 2006. (An “issue” is not the same as a defect, but for the time being we will ignore this distinction.) To this, NASA adds 304 issues found between 2004 and 2006 in other (“Science”) missions. (We are also told that this analysis includes only the most “severe” issues, i.e. ones for which a work-around cannot be found and which impair a mission objective.)

We can verify that (490+304)*200 = 158,000, which NASA counts as the “weighed sub-total” for Requirements; adding up the somewhat smaller totals from other phases, NASA finds a total of 186,505.

NASA also adds up the number of issues found during all phases, which is 2,239. We can again verify that 186,505 /​ 2,239 = 83 and some change.

How old is the captain?

Now, the immediate objection to this procedure is that an ROI calculation involves dollars, not numbers of “issues”. ROI is a ratio of money gained (or saved) over money invested, and while you can reasonably say you’ve “saved” some number of issues it’s silly to talk about “investing” some number of issues.

We will want to “steel-man” NASA’s argument. (This is the opposite of a “straw man”, an easily knocked down argument that your interlocutor is not actually advancing, but that you make up to score easy points.) We will be as generous with this math as we can and see if it has even a small chance of holding up.

To rescue the claim, we need to turn issues into dollars. Let us list the assumptions that need to hold for NASA’s calculations to be valid:

  • there is some determinate average cost to detecting an issue

  • there is some determinate average cost to fixing an issue

  • if an issue is not detected at the earliest opportunity, it always ends up being detected “in the field” and its repair cost is the maximum

The first two assumptions give our steelman attempt some leeway; not all issues need to cost the same to detect, but it has to make sense to talk about the “average cost of detecting an issue”. Mathematically, this implies that the cost of fixing an issue obeys some well-behaved function such as the famous “bell curve”. (However, there are some functions for which it makes no sense, mathematically, to speak of an average: for instance some “power law” curves. These are distributions often found to describe, for instance, the size of catastrophes such as avalanches or forest fires; no one would be very surprised to find that defect costs in fact follow a power law.)

The third assumption makes things even more problematic. NASA’s calculations are based on hypotheticals: what if we used different assumptions, for instance that an “issue” in Requirements has a good likelihood of being found by NASA’s diligent software engineers in the design phase? If all issues detected by IV&V in Requirements had been fixed in Design, then the ratio would only be about 5:1 (that is, the ratio between 200:1 and 40:1). Using a similar procedure for the other phases, we would find a “ROI” of less than 3:1. This isn’t to say that my assumption is better than NASA’s, but merely to observe that the final result is very sensitive to this kind of assumption.

However, we may grant that it is in NASA’s culture to always assume the worst case. And anyway “up to $1.6 billion” is almost as impressive as “$1.6 billion”, isn’t it?

Eighy-three! For some value of eighty-three.

If we do accept all of NASA’s claim, then an “issue” costs on average about $9K to detect. (As a common-sense check, note that this on the order of one person-month, assuming a yearly loaded salary in the $100K range. That seems a bit excessive; not a slur on NASA’s competence, but definitely a bad knock for the notion that “averages” make sense at all in this context.)

However, note that NASA’s data is absolutely silent on how much the same issues cost to fix. Detecting is IV&V’s job, but fixing is the job of the software engineers working on the project.1

NASA is therefore reporting on the results of the following calculation...

ROI = (Savings from IV&V—Actual cost of IV&V) /​ Actual cost of IV&V

where

Savings from IV&V = Hypothetical cost of fixing defects without IV&V—Actual cost of fixing defects

...and the above cannot be derived from the numbers used in the calculation—which are 1) counts of issues and 2) actual IV&V budget. Even if we do grant an 83:1 ratio between the hypothetical cost of fixing defects (had IV&V not been present to find them early) and the actual cost of fixing, we are left with an unknown variable—an unbound x in the equation—which is the average cost of fixing a defect.

This, then, is the fatal flaw in the argument, the one that cannot be steel-manned and that exposes NASA’s math for what it is—Flaubert-style, “age of the captain” math, not rocket science.


1 - Relatedly, an “issue” is just an observation that something is wrong, whereas a “defect” the thing software developers fix; it’s entirely possible for several “issues” related to one “defect” to be corrected simultaneously by the same fix; NASA’s conceptual model grossly oversimplifies the work relationship between those who “validate and verify” and those who actually write the software.

Acknowledgements

Thanks to Aleksis Tulonen, a reader of my book, for finding the NASA document in the first place, and spotting the absurdity of the ROI calculation.