Model Combination and Adjustment

The debate on the proper use of inside and outside views has raged for some time now. I suggest a way forward, building on a family of methods commonly used in statistics and machine learning to address this issue — an approach I’ll call “model combination and adjustment.”

Inside and outside views: a quick review

1. There are two ways you might predict outcomes for a phenomenon. If you make your predictions using a detailed visualization of how something works, you’re using an inside view. If instead you ignore the details of how something works, and instead make your predictions by assuming that a phenomenon will behave roughly like other similar phenomena, you’re using an outside view (also called reference class forecasting).

Inside view examples:

  • “When I break the project into steps and visualize how long each step will take, it looks like the project will take 6 weeks”

  • “When I combine what I know of physics and computation, it looks like the serial speed formulation of Moore’s Law will break down around 2005, because we haven’t been able to scale down energy-use-per-computation as quickly as we’ve scaled up computations per second, which means the serial speed formulation of Moore’s Law will run into roadblocks from energy consumption and heat dissipation somewhere around 2005.”

Outside view examples:

  • “I’m going to ignore the details of this project, and instead compare my project to similar projects. Other projects like this have taken 3 months, so that’s probably about how long my project will take.”

  • “The serial speed formulation of Moore’s Law has held up for several decades, through several different physical architectures, so it’ll probably continue to hold through the next shift in physical architectures.”

See also chapter 23 in Kahneman (2011); Planning Fallacy; Reference class forecasting. Note that, after several decades of past success, the serial speed formulation of Moore’s Law did in fact break down in 2004 for the reasons described (Fuller & Millett 2011).

2. An outside view works best when using a reference class with a similar causal structure to the thing you’re trying to predict. An inside view works best when a phenomenon’s causal structure is well-understood, and when (to your knowledge) there are very few phenomena with a similar causal structure that you can use to predict things about the phenomenon you’re investigating. See: The Outside View’s Domain.

When writing a textbook that’s much like other textbooks, you’re probably best off predicting the cost and duration of the project by looking at similar textbook-writing projects. When you’re predicting the trajectory of the serial speed formulation of Moore’s Law, or predicting which spaceship designs will successfully land humans on the moon for the first time, you’re probably best off using an (intensely informed) inside view.

3. Some things aren’t very predictable with either an outside view or an inside view. Sometimes, the thing you’re trying to predict seems to have a significantly different causal structure than other things, and you don’t understand its causal structure very well. What should we do in such cases? This remains a matter of debate.

Eliezer Yudkowsky recommends a weak inside view for such cases:

On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View… [But] on problems that are new things under the Sun, where there’s a huge change of context and a structural change in underlying causal forces, the Outside View also fails—try to use it, and you’ll just get into arguments about what is the proper domain of “similar historical cases” or what conclusions can be drawn therefrom. In this case, the best we can do is use the Weak Inside View — visualizing the causal process — to produce loose qualitative conclusions about only those issues where there seems to be lopsided support.

In contrast, Robin Hanson recommends an outside view for difficult cases:

It is easy, way too easy, to generate new mechanisms, accounts, theories, and abstractions. To see if such things are useful, we need to vet them, and that is easiest “nearby”, where we know a lot. When we want to deal with or understand things “far”, where we know little, we have little choice other than to rely on mechanisms, theories, and concepts that have worked well near. Far is just the wrong place to try new things.

There are a bazillion possible abstractions we could apply to the world. For each abstraction, the question is not whether one can divide up the world that way, but whether it “carves nature at its joints”, giving useful insight not easily gained via other abstractions. We should be wary of inventing new abstractions just to make sense of things far; we should insist they first show their value nearby.

In Yudkowsky (2013), sec. 2.1, Yudkowsky offers a reply to these paragraphs, and continues to advocate for a weak inside view. He also adds:

the other major problem I have with the “outside view” is that everyone who uses it seems to come up with a different reference class and a different answer.

This is the problem of “reference class tennis”: each participant in the debate claims their own reference class is most appropriate for predicting the phenomenon under discussion, and if disagreement remains, they might each say “I’m taking my reference class and going home.”

Responding to the same point made elsewhere, Robin Hanson wrote:

[Earlier, I] warned against over-reliance on “unvetted” abstractions. I wasn’t at all trying to claim there is one true analogy and all others are false. Instead, I argue for preferring to rely on abstractions, including categories and similarity maps, that have been found useful by a substantial intellectual community working on related problems.

Multiple reference classes

Yudkowsky (2013) adds one more complaint about reference class forecasting in difficult forecasting circumstances:

A final problem I have with many cases of ‘reference class forecasting’ is that… [the] final answers [generated from this process] often seem more specific than I think our state of knowledge should allow. [For example,] I don’t think you should be able to tell me that the next major growth mode will have a doubling time of between a month and a year. The alleged outside viewer claims to know too much, once they stake their all on a single preferred reference class.

Both this comment and Hanson’s last comment above point to the vulnerability of relying on any single reference class, at least for difficult forecasting problems. Beware brittle arguments, says Paul Christiano.

One obvious solution is to use multiple reference classes, and weight them by how relevant you think they are to the phenomenon you’re trying to predict. Holden Karnofsky writes of investigating things from “many different angles.” Jonah Sinick refers to “many weak arguments.” Statisticians call this “model combination.” Machine learning researchers call it “ensemble learning” or “classifier combination.”

In other words, we can use many outside views.

Nate Silver does this when he predicts elections (see Silver 2012, ch. 2). Venture capitalists do this when they evaluate startups. The best political forecasters studied in Tetlock (2005), the “foxes,” tended to do this.

In fact, most of us do this regularly.

How do you predict which restaurant’s food you’ll most enjoy, when visiting San Francisco for the first time? One outside view comes from the restaurant’s Yelp reviews. Another outside view comes from your friend Jade’s opinion. Another outside view comes from the fact that you usually enjoy Asian cuisines more than other cuisines. And so on. Then you combine these different models of the situation, weighting them by how robustly they each tend to predict your eating enjoyment, and you grab a taxi to Osha Thai.

(Technical note: I say “model combination” rather than “model averaging” on purpose.)

Model combination and adjustment

You can probably do even better than this, though — if you know some things about the phenomenon and you’re very careful. Once you’ve combined a handful of models to arrive at a qualitative or quantitative judgment, you should still be able to “adjust” the judgment in some cases using an inside view.

For example, suppose I used the above process, and I plan to visit Osha Thai for dinner. Then, somebody gives me my first taste of the Synsepalum dulcificum fruit. I happen to know that this fruit contains a molecule called miraculin which binds to one’s tastebuds and makes sour foods taste sweet, and that this effect lasts for about an hour (Koizumi et al. 2011). Despite the results of my earlier model combination, I predict I won’t particularly enjoy Osha Thai at the moment. Instead, I decide to try some tabasco sauce, to see whether it now tastes like doughnut glaze.

In some cases, you might also need to adjust for your prior over, say, “expected enjoyment of restaurant food,” if for some reason your original model combination procedure didn’t capture your prior properly.

Against “the outside view”

There is a lot more to say about model combination and adjustment (e.g. this), but for now let me make a suggestion about language usage.

Sometimes, small changes to our language can help us think more accurately. For example, gender-neutral language can reduce male bias in our associations (Stahlberg et al. 2007). In this spirit, I recommend we retire the phrase “the outside view..”, and instead use phrases like “some outside views...” and “an outside view...”

My reasons are:

  1. Speaking of “the” outside view privileges a particular reference class, which could make us overconfident of that particular model’s predictions, and leave model uncertainty unaccounted for.

  2. Speaking of “the” outside view can act as a conversation-stopper, whereas speaking of multiple outside views encourages further discussion about how much weight each model should be given, and what each of them implies about the phenomenon under discussion.