Diseased disciplines: the strange case of the inverted chart

Imag­ine the fol­low­ing situ­a­tion: you have come across nu­mer­ous refer­ences to a pa­per pur­port­ing to show that the chances of suc­cess­fully treat­ing a dis­ease con­tracted at age 10 are sub­stan­tially lower if the dis­ease is de­tected later: some­what lower at age 20 to very poor at age 50. Every au­thor draws more or less the same bar chart to de­pict this situ­a­tion: the pic­ture be­low, show­ing ris­ing mor­tal­ity from left to right.

Rising mortality, left to right

You search for the origi­nal pa­per, which proves a long quest: the con­fer­ence pub­lisher have lost some of their archives in sev­eral moves, sev­eral peo­ple cit­ing the pa­per turn out to no longer have a copy, etc. You fi­nally lo­cate a copy of the pa­per (let’s call it G99) thanks to a helpful friend with great schol­arly con­nec­tions.

And you find out some in­ter­est­ing things.

The most strik­ing is what the au­thor’s origi­nal chart de­picts: the chances of suc­cess­fully treat­ing the dis­ease de­tected at age 50 be­come sub­stan­tially lower as a func­tion of age when it was con­tracted; mor­tal­ity is high­est if the dis­ease was con­tracted at age 10 and low­est if con­tracted at age 40. The chart show­ing this is the pic­ture be­low, show­ing de­creas­ing mor­tal­ity from top to bot­tom, for the same ages on the ver­ti­cal axis.

Decreasing mortality, top to bottom

Not only is the rep­re­sen­ta­tion topsy-turvy; the two di­a­grams can’t be about the same thing, since what is con­stant in the first (age dis­ease de­tected) is vari­able in the other, and what is vari­able in the first (age dis­ease con­tracted) is con­stant in the other.

Now, as you re­search the is­sue a lit­tle more, you find out that au­thors prior to G99 have of­ten used the first di­a­gram to re­port their find­ings; re­port­edly, sev­eral differ­ent stud­ies on differ­ent pop­u­la­tions (dat­ing back to the eighties) have yielded similar re­sults.

But when cit­ing G99, no­body re­pro­duces the ac­tual di­a­gram in G99, they all re­pro­duce the older di­a­gram (or some var­i­ant of it).

You are tempted to con­clude that the au­thors cit­ing G99 are cit­ing “from mem­ory”; they are aware of the ear­lier re­search, they have a vague rec­ol­lec­tion that G99 con­tains re­sults that are not to­tally at odds with the ear­lier re­search. Same differ­ence, they rea­son, G99 is one more con­fir­ma­tion of the ear­lier re­search, which is ad­e­quately sum­ma­rized by the stan­dard di­a­gram.

And then you come across a pa­per by the same au­thor, but from 10 years ear­lier. Let’s call it G89. There is a strong pre­sump­tion that the study in G99 is the same that is de­scribed in G89, for the fol­low­ing rea­sons: a) the re­searcher who wrote G99 was by then already re­tired from the in­sti­tu­tion where they ob­tained their re­sults; b) the G99 “pa­per” isn’t in fact a pa­per, it’s a Pow­erPoint sum­ma­riz­ing pre­vi­ous re­sults ob­tained by the au­thor.

And in G89, you read the fol­low­ing: “This study didn’t ac­cu­rately record the mor­tal­ity rates at var­i­ous ages af­ter con­tract­ing the dis­ease, so we will use av­er­age rates sum­ma­rized from sev­eral other stud­ies.”

So ba­si­cally ev­ery­one who has been cit­ing G99 has been build­ing cas­tles on sand.

Sup­pose that, far from some ex­otic dis­ease af­fect­ing a few in­di­vi­d­u­als each year, the dis­ease in ques­tion was one of the world’s ma­jor kil­lers (say, tu­ber­cu­lo­sis, the world’s leader in in­fec­tious dis­ease mor­tal­ity), and the rea­son why ev­ery­one is cit­ing ei­ther G99 or some of the ear­lier re­search is to lend sup­port to the stan­dard strate­gies for fight­ing the dis­ease.

When you look at the ear­lier re­search, you find noth­ing to al­lay your wor­ries: the ear­lier stud­ies are de­scribed only sum­mar­ily, in broad overview pa­pers or sec­ondary sources; the num­bers don’t seem to match up, and so on. In effect you are dis­cov­er­ing, about thirty years later, that what was taken for granted as a ma­jor find­ing on one of the prin­ci­pal top­ics of the dis­ci­pline in fact has “sloppy aca­demic prac­tice” writ­ten all over it.

If this story was true, and this was medicine we were talk­ing about, what would you ex­pect (or at least hope for, if you haven’t be­come too cyn­i­cal), should this story come to light? In a well-func­tion­ing dis­ci­pline, a wave of re­trac­ta­tions, pub­lic apolo­gies, gen­eral em­bar­rass­ment and a ma­jor re-eval­u­a­tion of pub­lic health poli­cies con­cern­ing this dis­ease would fol­low.

The story is sub­stan­tially true, but the field isn’t medicine: it is soft­ware en­g­ineer­ing.

I have trans­posed the story to medicine, tem­porar­ily, as an act of be­nign de­cep­tion, to which I now con­fess. My in­ten­tion was to bring out the struc­ture of this story, and if, while think­ing it was about health, you felt out­raged at this mis­car­riage of aca­demic pro­cess, you should still feel out­raged upon learn­ing that it is in fact about soft­ware.

The “dis­ease” isn’t some ex­otic odd­ity, but the soft­ware equiv­a­lent of tu­ber­cu­lo­sis—the cost of fix­ing defects (a.k.a. bugs).

The origi­nal claim was that “defects in­tro­duced in early phases cost more to fix the later they are de­tected”. The mis­quoted chart says this in­stead: “defects de­tected in the op­er­a­tions phase (once soft­ware is in the field) cost more to fix the ear­lier they were in­tro­duced”.

Any re­sult con­cern­ing the “dis­ease” of soft­ware bugs counts as a ma­jor re­sult, be­cause it af­fects very large frac­tions of the pop­u­la­tion, and ac­counts for a ma­jor frac­tion of the to­tal “mor­bidity” (i.e. lack of qual­ity, pro­ject failure) in the pop­u­la­tion (of soft­ware pro­grams).

The ear­lier ar­ti­cle by the same au­thor con­tained the fol­low­ing con­fes­sion: “This study didn’t ac­cu­rately record the en­g­ineer­ing times to fix the defects, so we will use av­er­age times sum­ma­rized from sev­eral other stud­ies to weight the defect ori­gins”.

Not only is this one ma­jor re­sult sus­pect, but the same pat­tern of “cito­ge­n­e­sis” turns up in­ves­ti­gat­ing sev­eral other im­por­tant claims.

Soft­ware en­g­ineer­ing is a dis­eased dis­ci­pline.

The pub­li­ca­tion I’ve la­beled “G99” is gen­er­ally cited as: Robert B. Grady, An Eco­nomic Re­lease De­ci­sion Model: In­sights into Soft­ware Pro­ject Man­age­ment, in pro­ceed­ings of Ap­pli­ca­tions of Soft­ware Mea­sure­ment (1999). The sec­ond di­a­gram is from a pho­to­graph of a hard copy of the pro­ceed­ings.

Here is one typ­i­cal pub­li­ca­tion cit­ing Grady 1999, from which the first di­a­gram is ex­tracted. You can find many more via a Google search. The “this study didn’t ac­cu­rately record” quote is dis­cussed here, and can be found in “Dis­sect­ing Soft­ware Failures” by Grady, in the April 1989 is­sue of the “Hewlett Packard Jour­nal”; you can still find one copy of the origi­nal source on the Web, as of early 2013, but link rot is threat­en­ing it with ex­tinc­tion.

A more ex­ten­sive anal­y­sis of the “defect cost in­crease” claim is available in my book-in-progress, “The Leprechauns of Soft­ware Eng­ineer­ing”.

Here is how the axes were origi­nally la­beled; first di­a­gram:

  • ver­ti­cal: “Rel­a­tive Cost to Cor­rect a Defect”

  • hori­zon­tal: “Devel­op­ment Phase” (val­ues “Re­quire­ments”, “De­sign”, “Code”, “Test”, “Oper­a­tion” from left to right)

  • figure la­bel: “Rel­a­tive cost to cor­rect a re­quire­ment defect de­pend­ing on when it is dis­cov­ered”

Se­cond di­a­gram:

  • ver­ti­cal: “Ac­tivity When Defect was Created” (val­ues “Speci­fi­ca­tions”, “De­sign”, “Code”, “Test” from top to bot­tom)

  • hori­zon­tal: “Rel­a­tive cost to fix a defect af­ter re­lease to cus­tomers com­pared to the cost of fix­ing it shortly af­ter it was cre­ated”

  • figure la­bel: “Rel­a­tive Costs to Fix Defects”