Improvement Without Superstition

When you make con­tin­u­ous, in­cre­men­tal im­prove­ments to some­thing, one of two things can hap­pen. You can im­prove it a lot, or you can fall into su­per­sti­tion. I’m not talk­ing about black cats or bro­ken mir­rors, but rather hu­mans be­com­ing ad­dicted to whichever steps were last seen to work, in­stead of whichever steps pro­duce their goal.

I’ve seen su­per­sti­tion de­velop first hand. It hap­pened in one of the places you might least ex­pect it – in a bio­chem­istry lab. In the sum­mer of 2015, I found my­self try­ing to un­der­stand which mu­tants of a cer­tain pro­tein were more sta­ble than the wild­type. Be­cause sci­ence is per­pet­u­ally un­der­funded, the com­puter that drove the equip­ment we were us­ing was an­cient and fre­quently crashed. Each crash wiped out an hour or two of painstak­ing, hur­ried labour and meant we had less time to use the in­stru­ment to col­lect ac­tual data. We re­ally wanted to avoid crashes! There­fore, over the course of that sum­mer, we came up with about 12 differ­ent things to do be­fore each ex­per­i­ment (in se­quence) to pre­vent them from hap­pen­ing.

We were sure that 10 out of the 12 things were prob­a­bly use­less, we just didn’t know which ten. There may have been no good rea­son that open­ing the in­stru­ment, clos­ing, it, then open­ing it again to load our sam­ple would pre­vent com­puter crashes, but as far as we could tell when we did that, the ma­chine crashed far less. It was the same for the other eleven. More self-aware than I, the grad­u­ate stu­dent I worked with joked to me: “this is how su­per­sti­tions get started” and I laughed along. Un­til I read two ar­ti­cles in The New Yorker.

In The Score (How Child­birth Went In­dus­trial), Dr. Atul Gawande talks about the in­fluence of the Ap­gar score on child­birth. Through a pro­cess of con­tin­u­ous com­pe­ti­tion and op­ti­miza­tion, doc­tors have found out ways to in­crease the Ap­gar scores of in­fants in their first five min­utes of life – and how to deal with difficult births in ways that max­i­mize their Ap­gar scores. The re­sult of this has been a shock­ing (six-fold) de­crease in in­fant mor­tal­ity. And all of this is de­spite the fact that ac­cord­ing to Gawande, “[in] a rank­ing of med­i­cal spe­cialties ac­cord­ing to their use of hard ev­i­dence from ran­dom­ized clini­cal tri­als, ob­stet­rics came in last. Ob­ste­tri­ci­ans did few ran­dom­ized tri­als, and when they did they ig­nored the re­sults.”

Similarly, in The Bell Curve (What hap­pens when pa­tients find out how good their doc­tors re­ally are), Gawande found that the differ­ences be­tween the best CF (cys­tic fibro­sis) treat­ment cen­tres and the rest turned out to hinge on how rigor­ously each cen­tre fol­lowed the guidelines es­tab­lished by big clini­cal tri­als. That is to say, those that fol­lowed the ac­cepted stan­dard of care to the let­ter had much lower sur­vival rates than those that hared off af­ter any po­ten­tially life­sav­ing idea.

It seems that ob­ste­tri­ci­ans and CF spe­cial­ists were able to get in­cred­ible re­sults with­out too much in the way of su­per­sti­tions. Even things that look at first glance to be minor su­per­sti­tions of­ten turned out not to be. For ex­am­ple, when Gawande looked deeper into a se­ries of stud­ies that showed for­ceps were as good as or bet­ter than Cae­sar­ian sec­tions, he was told by an ex­pe­rienced ob­ste­tri­cian (who was him­self quite skil­led with for­ceps) that these tri­als prob­a­bly benefit­ted from se­ri­ous se­lec­tion effects (in gen­eral, only doc­tors par­tic­u­larly con­fi­dent in their for­ceps skills vol­un­teer for stud­ies of them). If for­ceps were used on the same in­dus­trial scale as Cae­sar­ian sec­tions, that doc­tor sus­pected that they’d end up worse.

But I don’t want to give the im­pres­sion that there’s some­thing about medicine as a field that al­lows doc­tors to make these sorts of im­prove­ments with­out su­per­sti­tion. In The Em­peror of all Mal­adies, Dr. Sid­dhartha Mukher­jee spends some time talk­ing about the now dis­con­tinued prac­tices of “su­per-rad­i­cal” mas­tec­tomy and “rad­i­cal” chemother­apy. In both treat­ments, doc­tors be­lieved that if some amount of a treat­ment was good, more must be bet­ter. And for a while, it seemed bet­ter. Cancer sur­vival rates im­proved af­ter these pro­ce­dures were in­tro­duced.

But ran­dom­ized con­trol­led tri­als showed that there was no benefit to those in­va­sive, de­struc­tive pro­ce­dures be­yond that offered by their less-rad­i­cal equiv­a­lents. De­spite this ev­i­dence, sur­geons and on­col­o­gists clung to these treat­ments with an al­most re­li­gious zeal, long af­ter they should have given up and aban­doned them. Per­haps they couldn’t bear to be­lieve that they had need­lessly poi­soned or maimed their pa­tients. Or per­haps the su­per­sti­tion was so strong that they felt they were court­ing doom by do­ing any­thing else.

The sim­plest way to avoid su­per­sti­tion is to wait for large scale tri­als. But from both Gawande ar­ti­cles, I get a sense that matches with anec­do­tal ev­i­dence from my own life and that of my friends. It’s the sense that if you want to do some­thing, any­thing, im­por­tant – if you want to in­crease your pro­duc­tivity or man­age your de­pres­sion/​anx­iety, or keep CF pa­tients al­ive – you’re likely to do much bet­ter if you take the large scale em­piri­cal re­sults and use them as a spring­board (or ig­nore them en­tirely if they don’t seem to work for you).

For peo­ple in­ter­ested in nootrop­ics, mela­tonin, or vi­tam­ins, there’s self-blind­ing tri­als, which provide many of the benefits of larger tri­als with­out the wait. But for other in­ter­ven­tions, it’s very hard to effec­tively blind your­self. If you want to see if med­i­ta­tion im­proves your fo­cus, for ex­am­ple, then you can’t re­ally hide the fact that you med­i­tated on cer­tain days from your­self [1].

When I think about how far from the es­tab­lished ev­i­dence I’ve gone to in­crease my pro­duc­tivity, I worry about the chance I could be­come su­per­sti­tious.

For ex­am­ple, trig­ger-ac­tion plans (TAPs) have a lot of ev­i­dence be­hind them. They’re also en­tirely use­less to me (I think be­cause I lack a vi­sual imag­i­na­tion with which to pre­pare a trig­ger) and I haven’t tried to make one in years. The Po­modoro method is widely used to in­crease pro­duc­tivity, but I find I work much bet­ter when I cut out the breaks en­tirely – or work through them and later take an equiv­a­lent amount of time off when­ever I please. I use po­mos only as a con­ve­nient, easy to Bee­mind mea­sure of how long I worked on some­thing.

I know mod­est episte­molo­gies are sup­posed to be out of favour now, but I think it can be use­ful to pause, re­flect, and won­der: when is one like the doc­tors sav­ing CF pa­tients and when is one like the doc­tors do­ing su­per-rad­i­cal mas­tec­tomies? I’ve writ­ten at length about the pro­duc­tivity regime I’ve de­vel­oped. How much of it is chaff?

It is un­de­ni­able that I am bet­ter at things. I’ve rigor­ously tracked the out­puts on Bee­minder and the graphs don’t lie. Last year I av­er­aged 20,000 words per month. This year, it’s 30,000. When I started my blog more than a year ago, I thought I’d be happy if I could pub­lish some­thing once per month. This year, I’ve pub­lished 1.1 times per week.

But peo­ple get bet­ter over time. The use­less­ness of su­per-rad­i­cal mas­tec­tomies was masked by other can­cer treat­ments get­ting bet­ter. Sur­vival rates went up, but when the ac­count­ing was finished, none of that was to the credit of those surg­eries.

And it’s not just use­less­ness that I’m wor­ried about, but also harm; it’s pos­si­ble that my habits have con­strained my nat­u­ral de­vel­op­ment, rather than pro­mot­ing it. This has hap­pened in the past, when poorly cho­sen met­rics made me fall vic­tim to Camp­bell’s Law.

From the per­spec­tive of avoid­ing su­per­sti­tion: even if you be­lieve that medicine can­not wait for placebo con­trol­led tri­als to try new, po­ten­tially life-sav­ing treat­ments, surely you must ad­mit that placebo con­trol­led tri­als are good for de­ter­min­ing which things aren’t worth it (take as an ex­am­ple the very com­mon knee surgery, arthro­scopic par­tial meniscec­tomy, which has re­peat­edly performed no bet­ter than sham surgery when sub­jected to con­trol­led tri­als).

Scott Alexan­der re­cently wrote about an ex­cit­ing new an­tide­pres­sant failing in Stage I tri­als. When the drug was first an­nounced, a few brave souls man­aged to syn­the­size some. When they tried it, they re­ported amaz­ing re­sults, re­sults that we now know to have been placebo. Look. You aren’t get­ting an ex­per­i­men­tal drug syn­the­sized and try­ing it un­less you’re pretty fa­mil­iar with nootrop­ics. Is the state of self-ex­per­i­men­ta­tion re­ally that poor among the nootrop­ics com­mu­nity? Or is it re­ally hard to figure out if some­thing works on you or not [2]?

Still, re­flec­tion isn’t the same thing as aban­don­ing the in­side view en­tirely. I’ve been think­ing up heuris­tics since I read Dr. Gawande’s ar­ti­cles; armed with these, I ex­pect to have a rea­son­able shot at know­ing when I’m at risk of be­com­ing su­per­sti­tious. They are:

- If you gen­uinely care only about the out­come, not the tech­niques you use to at­tain it, you’re less likely to mis­lead your­self (be­ware the per­son with a favourite tech­nique or a vested in­ter­est!).

- If the thing you’re try­ing to im­prove doesn’t tend to get bet­ter on its own and you’re only try­ing one po­ten­tially suc­cess­ful in­ter­ven­tion at a time, fewer of your in­ter­ven­tions will turn out to be su­per­sti­tions and you’ll need to prune less of­ten (much can be masked by a steady rate of change!).

- If you reg­u­larly aban­don sunk costs (“You aban­don a sunk cost. You didn’t want to. It’s cry­ing.”), su­per­sti­tions do less dam­age, so you can af­ford to spend less men­tal effort on avoid them.

Fi­nally, it might be that you don’t care that some effects are placebo, so long as you get them and get them re­peat­edly. That’s what hap­pened with the ex­per­i­ment I worked on that sum­mer. We knew we were su­per­sti­tious, but we didn’t care. We just needed enough data to pub­lish. And even­tu­ally, we got it.


[1] Even so, there are things you can do here to get use­ful in­for­ma­tion. For ex­am­ple, you could get in the habit of col­lect­ing in­for­ma­tion on your­self for a month or so (like hap­piness, fo­cus, etc.), then try sev­eral com­bi­na­tions of in­ter­ven­tions you think might work (e.g. A, B, C, AB, BC, CA, ABC, then back to baseline) for a few weeks each. As­sum­ing that at least one of the in­ter­ven­tions doesn’t work, you’ll have a placebo to com­pare against. Although be sure to cor­rect any re­sults for mul­ti­ple com­par­i­sons.

[2] That peo­ple still buy any­thing from HVMN (af­ter they re­branded them­selves in what might have been an at­tempt to avoid a study show­ing their product did no bet­ter than coffee) ac­tu­ally makes me sus­pect the lat­ter ex­pla­na­tion is true, but still.