Multitudinous outside views

There’s an important piece of advice for forecasters: don’t rely on your internal model of the world exclusively, and take the outside view, then adjust from there. But which view is ” the” outside view? It depends on the problem—and different people might tell you different things. But if the choice of outside view is subjective, it starts to seems like inside-views all the way down.

That’s where we get to base rates, which don’t solve this problem, but they do highlight it nicely.


Fans of superforecasting know, in a hedgehog-like sense of knowing one thing, that the outside view, which is the base rate, which is the rate of similar events, should be our starting point. But which events are similar, and how is similarity defined? We first need to choose a reference class, based on some pre-existing idea of similarity. And in different terms, there is a reference class problem, which we evidently don’t have a clear way to judge—and even as Bayesian thinkers, not only is that our problem, it’s an entire bucket of different problems.

Considering a Concrete Prediction: Tesla Motors

Let’s get really concrete: What will the price of Tesla stock be in 6 months?

Well, what is the reference class? In the last year, 90% of the time, the price of Tesla stock has been between $200 and $1000. But that’s a really bad reference class, when the price today is $1,800. OK, but looking at the set of all stocks would be even worse—and looking at automobile stocks even worse than that. Which stocks are comparable? What about stocks with P/​E ratios over 900? Or stocks with more than a half billion dollars of losses for their net income? We’re getting silly here.

Maybe we shouldn’t look at stock price, but should look at market capitalization? Or change in price? “Stocks that went up 9-fold over the course of a year” isn’t a super helpful reference class—it has only a few examples, and they are all very different from Tesla.

Of course, none of this is helpful. What we really want is the aggregate opinion of the market, so we look at futures contracts and the implied volatility curve for options expiring in February.

That doesn’t look like a reference class. But who needs an outside view, anyways?

What is a reference class?

If you want to know the probability of Kim Jung-Un staying alive, we can consult the reference class of 37 year old males in North Korea, where male life expectancy is 68. Alternatively, look at the reference class of his immediate family—his brother died at the age of 46, but his father lived to the age of 70, and his grandfather lived until 82. Are those useful reference points?

What we really want is the lifespan of dictators. Well, dictators of small countries. Oh, actually, dictators of small nuclear powers that know that Qaddafi was killed after renouncing his nuclear program—a reference class with no other members. Once again, of course, none of this is helpful.

In finance, the outside view is a consensus that markets are roughly rational, and the inside view is that you can beat the market. In international relations, the outside view is that dictatorships can be tenuous, but when the regime survives, the leadership lives quite a long time. The inside view is, perhaps, that China has a stake in keeping their nuclear neighbor stable, and won’t let anything happen.

Reference classes depend on models of the world.

In each case, the construction of a reference class is a function of a model. Models induce reference classes—political scientists might have expert political judgement, while demographers have expert lifespan judgement, and 2nd year equity analysts have expert financial judgement. All of those are useful.

What reference class should have been used for COVID-19 in, say, mid-March? The set of emerging infectious diseases over the past decade? Clearly not. In retrospect, of course, the best reference class needed a epidemiological model—the reference class of diseases with , where spread is determined by control measures. And the reference class for the success of response in the US should have been based on a libertarian view of the failure of American institutions, or a Democrat’s view of how Trump had been rapidly dismantling government, and not an index designed around earlier data which ignored political failure modes. But how do we know that in advance? Once again, none of this is helpful in deciding beforehand which reference class to use.

A final example. What reference class is useful for predicting the impact of artificial intelligence over the next decade? Robin Hanson would argue, I think, that it’s the reference class of purported game-changing technologies that have not yet attracted significant amounts of capital investment. Eliezer Yudkowsky might argue that it’s the reference class of intelligence evolving, sped up by a factor of what we’ve seen so far of computer intelligence, which moved from an AI winter in the mid-2000s and ant-level intelligence at navigation, to Deepmind being founded in 2010, to IBM’s Watson winning Jeopardy in 2011, to beating the Winograd Schema and acing general high-school science tests without specific training using GPT-3 now. And if you ask a dozen AI researchers, depending on your methods, you’ll get at least another dozen reference classes. But we still need to pick a reference class.

So which reference class is correct? In my (inside) view as a superforecaster, this is where we turn to a different superforecasting trick, about considering multiple models. As the saying goes, hedgehogs know one reference class, but foxes consult many hedgehogs.