The Epsilon Fallacy

Link post

Program Optimization

One of the earlier lessons in every programmer’s education is how to speed up slow code. Here’s an example. (If this is all greek to you, just note that there are three different steps and then skip to the next paragraph.)

//​ Step 1: import
Import foobarlib.*
//​ Step 2: Initialize random Foo array
Foo_field = Foo[1000]
//​ Step 3: Smooth foo field
For x in [1...998]:
Foo_field[x] = (foo_field[x+1] + foo_field[x-1])/​2

Our greenhorn programmers jump in and start optimizing. Maybe they decide to start at the top, at step 1, and think “hmm, maybe I can make this import more efficient by only importing Foo, rather than all of foobarlib”. Maybe that will make step 1 ten times faster. So they do that, and they run it, and lo and behold, the program’s run time goes from 100 seconds to 99.7 seconds.

In practice, most slow computer programs spend the vast majority of their time in one small part of the code. Such slow pieces are called bottlenecks. If 95% of the program’s runtime is spent in step 2, then even the best possible speedup in steps 1 and 3 combined will only improve the total runtime by 5%. Conversely, even a small improvement to the bottleneck can make a big difference in the runtime.

Back to our greenhorn programmers. Having improved the run time by 0.3%, they can respond one of two ways:

  • “Great, it sped up! Now we just need a bunch more improvements like that.”

  • “That was basically useless! We should figure out which part is slow, and then focus on that part.”

The first response is what I’m calling the epsilon fallacy. (If you know of an existing and/​or better name for this, let me know!)

The epsilon fallacy comes in several reasonable-sounding flavors:

  • The sign(epsilon) fallacy: this tiny change was an improvement, so it’s good!

  • The integral(epsilon) fallacy: another 100 tiny changes like that, and we’ll have a major improvement!

  • The infinity*epsilon fallacy: this thing is really expensive, so this tiny change will save lots of money!

The epsilon fallacy is tricky because these all sound completely reasonable. They’re even technically true. So why is it a fallacy?

The mistake, in all cases, is a failure to consider opportunity cost. The question is not whether our greenhorn programmers’ 0.3% improvement is good or bad in and of itself. The question is whether our greenhorn programmers’ 0.3% improvement is better or worse than spending that same amount of effort finding and improving the main bottleneck.

Even if the 0.3% improvement was really easy—even if it only took two minutes—it can still be a mistake. Our programmers would likely be better off if they had spent the same two minutes timing each section of the code to figure out where the bottleneck is. Indeed, if they just identify the bottleneck and speed it up, and don’t bother optimizing any other parts at all, then that will probably be a big win. Conversely, no matter how much they optimize everything besides the bottleneck, it won’t make much difference. Any time spent optimizing non-bottlenecks, could have been better spent identifying and optimizing the bottleneck.

This is the key idea: time spent optimizing non-bottlenecks, could have been better spent identifying and optimizing the bottleneck. In that sense, time spent optimizing non-bottlenecks is time wasted.

In programming, this all seems fairly simple. What’s more surprising is that almost everything in the rest of the world also works like this. Unfortunately, in the real world, social motivations make the epsilon fallacy more insidious.

Carbon Emissions

Back in college, I remember watching a video about this project. The video overviews many different approaches to carbon emissions reduction: solar, wind, nuclear, and bio power sources, grid improvements, engine/​motor efficiency, etc. It argues that none of these will be sufficient, on its own, to cut carbon emissions enough to make a big difference. But each piece can make a small difference, and if we put them all together, we get a viable carbon reduction strategy.

This is called the “wedge approach”.

Here’s a chart of US carbon emissions from electricity generation by year, shamelessly cribbed from a Forbes article.

Note that emissions have dropped considerably in recent years, and are still going down. Want to guess what that’s from? Hint: it ain’t a bunch of small things adding together.

In the early 00’s, US oil drilling moved toward horizontal drilling and fracking. One side effect of these new technologies was a big boost in natural gas production—US natgas output has been growing rapidly over the past decade. As a result, natgas prices became competitive with coal prices in the mid-00’s, and electricity production began to switch from coal to natgas. The shift is already large: electricity from coal has fallen by 25%, while natgas has increased 35%.

The upshot: natgas emits about half as much carbon per BTU as coal, and electricity production is switching from coal to natgas en mass. Practically all of the reduction in US carbon emissions over the past 10 years has come from that shift.

Now, back to the wedge approach. One major appeal of the wedge narrative is that it’s inclusive: we have all these well-meaning people working on all sorts of different approaches to carbon reduction. The wedge approach says “hey, all these approaches are valuable and important pieces of the effort, let’s all work together on this”. Kum-bay-a.

But then we look at the data. Practically all the carbon reduction over the past decade has come from the natgas transition. Everything else—the collective effort of hundreds of thousands of researchers and environmentalists on everything from solar to wind to ad campaigns telling people to turn off their lights when not in the room—all of that adds up to barely anything so far, compared to the impact of the natgas transition.

Now, if you’re friends with some of those researchers and environmentalists, or if you did some of that work yourself, then this will all sound like a status attack. We’re saying that all these well-meaning, hard-working people were basically useless. They were the 0.3% improvement to run time. So there’s a natural instinct to defend our friends/​ourselves, an instinct to say “no, it’s not useless, that 0.3% improvement was valuable and meaningful and important!” And we reach into our brains for a reason why our friends are not useless-

And that’s when the epsilon fallacy gets us.

“It’s still a positive change, so it’s worthwhile!”

“If we keep generating these small changes, it will add up to something even bigger than natgas!”

“Carbon emissions are huge, so even a small percent change matters a lot!”

This is the appeal of the wedge approach: the wedge approach says all that effort is valuable and important. It sounds a lot nicer than calling everyone useless. It is nicer. But niceness does not reduce carbon emissions.

Remember why the epsilon fallacy is wrong: opportunity cost.

Take solar photovoltaics as an example: PV has been an active research field for thousands of academics for several decades. They’ve had barely any effect on carbon emissions to date. What would the world look like today if all that effort had instead been invested in accelerating the natgas transition? Or in extending the natgas transition to China? Or in solar thermal or thorium for that matter?

Now, maybe someday solar PV actually will be a major energy source. There are legitimate arguments in favor.¹ Even then, we need to ask: would the long-term result be better if our efforts right now were focussed elsewhere? I honestly don’t know. But I will make one prediction: one wedge will end up a lot more effective than all others combined. Carbon emission reductions will not come from a little bit of natgas, a little bit of PV, a little bit of many other things. That’s not how the world works.

The 8020 Rule

Suppose you’re a genetic engineer, and you want to design a genome for a very tall person.

Our current understanding is that height is driven by lots of different genes, each of which has a small impact. If that’s true, then integral(epsilon) isn’t a fallacy. A large number of small changes really is the way to make a tall person.

On the other hand, this definitely is not the case if we’re optimizing a computer program for speed. In computer programs, one small piece usually accounts for the vast majority of the run time. If we want to make a significant improvement, then we need to focus on the bottleneck, and any improvement to the bottleneck will likely be significant on its own. “Lots of small changes” won’t work.

So… are things usually more like height, or more like computer programs?

A useful heuristic: the vast majority of real-world cases are less like height, and more like computer programs. Indeed, this heuristic is already well-known in a different context: it’s just the 8020 rule. 20% of causes account for 80% of effects.

If 80% of any given effect is accounted for by 20% of causes, then those 20% of causes are the bottleneck. Those 20% of causes are where effort needs to be focused to have a significant impact on the effect. For examples, here’s wikipedia on the 8020 rule:

  • 20% of program code contains 80% of the bugs

  • 20% of workplace hazards account for 80% of injuries

  • 20% of patients consume 80% of healthcare

  • 20% of criminals commit 80% of crimes

  • 20% of people own 80% of the land

  • 20% of clients account for 80% of sales

You can go beyond wikipedia to find whole books full of these things, and not just for people-driven effects. In the physical sciences, it usually goes under the name “power law”.

(As the examples suggest, the 8020 rule is pretty loose in terms of quantitative precision. But for our purposes, qualitative is fine.)

So we have an heuristic. Most of the time, the epsilon fallacy will indeed be a fallacy. But how can we notice the exceptions to this rule?

One strong hint is a normal distribution. If an effect results from adding up many small causes, then the effect will (typically) be normally distributed. Height is a good example. Short-term stock price movements are another good example. They might not be exactly normal, or there might be a transformation involved (stock price movements are roughly log-normal). If there’s an approximate normal distribution hiding somewhere in there, that’s a strong hint.

But in general, omitting some obvious normal distribution, our prior assumption should be that most things are more like computer programs than like height. The epsilon fallacy is usually fallacious.

Conclusion: Profile Your Code

Most programmers, at some point in their education/​career, are given an assignment to speed up a program. Typically, they start out by trying things, looking for parts of the code which are obviously suboptimal. They improve those parts, and it does not seem to have any impact whatsoever on the runtime.

After wasting a few hours of effort on such changes, they finally “profile” the code—the technical name for timing each part, to figure out how much time is spent in each section. They find out that 98% of the runtime is in one section which they hadn’t even thought to look at. Of course all the other changes were useless; they didn’t touch the part where 98% of the time is spent!

The intended lesson of the experience is: ALWAYS profile your code FIRST. Do not attempt to optimize any particular piece until you know where the runtime is spent.

As in programming, so in life: ALWAYS identify the bottleneck FIRST. Do not waste time on any particular small piece of a problem until you know which piece actually matters.


¹The Taleb argument provides an interesting counterweight to the epsilon fallacy. If we’re bad at predicting which approach will be big, then it makes sense to invest a little in many different approaches. We expect most of them to be useless, but a few will have major results—similar to venture capital. That said, it’s amazing how often people who make this argument just happen to end up working on the same things as everyone else.