Book Review: The Signal and the Noise
This is a compressed version of a blog post I wrote on my personal website.
The Signal and the Noise: The Art and Science of Prediction was written by Nate Silver, a political analyst who is most famous for creating the election forecasting website fivethirtyeight.com. The Signal and the Noise is one of the small number of popular books about forecasting, hence why I thought this write-up would be useful.
Humans don’t have a good track record predicting the outcomes of complex systems. But one domain where we have excelled is weather forecasting. Weather forecasts are amazingly accurate relative to the complexity involved. In the mid-70s, the US National Weather Service was off by about 6 degrees (Fahrenheit) when trying to forecast three days in advance. This isn’t much more accurate than what you get if you look at long-term averages – as in, what temperature is most likely in this region at this time of year, not taking into account any specific information. Now, the average miss is 3.5 degrees. This is actually slightly less of an improvement than I would have guessed, although to reduce the error in a forecast by a factor of two requires way more than twice the effort, since errors can compound.
I was surprised to learn how large a role humans still play in weather forecasting. A human expert assessing many computer-generated forecasts is often better than any of the forecasts are by themselves. Humans make precipitation forecasts 25% more accurate than computers alone and temperature forecasts 10% more accurate. Moreover, the accuracy added by humans has not significantly changed over time, so humans have been getting better at the same rate as the machines! If you’re wondering why the weather forecasts you use don’t feel very accurate, it’s in part because weather services are private companies that tend to exaggerate forecasts for appeal; you won’t see this inaccuracy in government forecasts. In particular, meteorologists are known to have a “wet bias” – they forecast rain more often than it actually occurs.
There have been some pretty tremendous positive externalities of commercial weather forecasting, most notably in creating sophisticated early warning systems for extreme weather. The ability to predict typhoons in India and Bangladesh has probably saved many thousands of lives. Silver has a few stories in here about people who refuse to leave their homes during an evacuation because of an unjust scepticism of the forecasts. There also appears to be an exposure effect: studies of hurricanes find that having survived a hurricane before makes you less likely to evacuate future ones.
A fox knows many things, but a hedgehog knows one big thing.
You will probably be familiar with Philip Tetlock’s work on forecasting. Some details I didn’t know about it:
The more often an expert was on TV, the less accurate their predictions were.
When an expert says something has no chance of happening, it happens 15% of the time. When they say it is guaranteed to happen, it happens 75% of the time. While foxes get better at predicting with more information, hedgehogs get worse. If you have grand theories instead of partial explanations, having more facts can make your worldview even less accurate.
Group aggregations of forecasts outperform individual ones by 15-20% on average.
Partisan differences in prediction were not seen in general (people were relatively unbiased in guessing how many seats Republicans vs. Democrats would win) but there were marked in specific cases (a left-leaning pundit is much more likely to say a specific Democrat will win).
(I wonder if this generalises? If we have some kind of broad philosophical or political worldview that biases us, we might actually see more bias the more we zero in on specific cases. Hence, while talking about specifics and partial explanations is usually the better way to get at the truth, to be effective it might require some deconstructing of one’s prior beliefs.)
Historically, the magnitude of warming from climate change has been overestimated by scientists. The actual level of warming was below the 1990 IPCC estimates’ most optimistic projection. In response, the IPCC revised down its models in 1995, and now the observed outcomes fall well within the confidence interval of the projected outcomes (albeit the warming is still slightly less than predicted). You could certainly tell a story here about bias: scientists probably want to find a large warming effect and they think that we’re at more risk of panicking too little than too much. However, these estimates assumed a “business as usual” case; so, one factor that wasn’t addressed adequately was that Chinese industry caused an unexpected increase in sulphur dioxide concentration starting around 2000, and sulphur dioxide causes a cooling effect. People forget about the other factors that contribute to warming – water vapour is actually the factor that contributes the most to the greenhouse effect! This all seems complicated to take into consideration, so the less-than-stellar prediction performance of climate scientists can probably be forgiven. They also seem to have humility: just 19% of climate scientists think that climate science can do a good job of modelling sea-level rise 50 years from now. At least as of when this book was published (2012), the effect of climate change on most extreme weather events also appears to be unclear.
The estimates around climate change are spectacularly noisy, which is well-known, but I think I had failed to appreciate just how noisy they are. Over the last 100 years, temperature has declined in one-quarter of decades – e.g. global temperatures fell from 2001 to 2011.
The only function of economic forecasting it to make astrology look respectable.
John Kenneth Galbraith
Richard Thaler breaks down the efficient market hypothesis (EMH) into two parts: the No Free Lunch assumption and the Price is Right assumption. No Free Lunch (the Groucho Marx theorem) says that you shouldn’t be willing to buy a stock from anyone willing to sell it to you; it’s difficult if not impossible to consistently beat the market. The Price is Right says that assets are priced in a way that encapsulates all information.
Thaler has a famous paper in which he looks at the company 3Com, which created a separate stock offering for its subsidiary Palm. There was a scheme whereby 3Com stockholders were guaranteed to receive three shares in Palm for every two shares in 3Com that they held, which implied that it was mathematically impossible for Palm stock to trade at more than two-thirds of the value of 3Com stock. Yet, for several months, Palm actually traded higher than 3Com, through a combination of hype and transaction costs.
Silver points out that if you look at the predictions of the Blue Chip Economic Survey and the Survey of Professional Forecasters, the former has some forecasters which do consistently better than others over the long run, but the latter doesn’t. The reason why is that Blue Chip isn’t anonymous, and so forecasters have an incentive to make bold claims that would garner them a lot of esteem if they turned out to be true. One study found a “rational bias” – the lesser the reputation of the institution that someone was forecasting from, the bolder they were in the claims they made. While considerations of esteem probably worsen forecasts overall, they lead some individuals to consistently outperform the crowd.
If EMH is true, how could outside observers notice massive market inefficiencies? One of the reasons why bubbles do not sort themselves out is the career incentives of traders: if you bet against the market and the market doesn’t crash, you look like an idiot, while going along with the herd won’t result in exceptionally bad personal outcomes. Silver says there is significant evidence that such herding behaviour exists.
It shocked me to learn that, over the long run, house prices in the US were remarkably stable until recently. In inflation-adjusted terms, $10,000 invested in a home in 1896 would be worth just $10,600 in 1996 (as measured by the Case-Schiller index). The value of such an investment would then almost double between 1996 and 2006!
Scott Alexander criticises how people sometimes use the low total death tolls from terrorism as a way to mock conservatives, or people who are concerned about terrorism in general. Most years, lightning kills more people in the US than terrorism, so why worry? Well, every year since WW2, lightning has killed more people than atomic bombs. Would this be a convincing argument for not worrying about nuclear war? If you’ve read The Black Swan, you’ll know that lots of things are like this, with heavy-tailed risks, and that we sometimes try to shoehorn these into normal distributions.
Earthquakes are distributed according to one such heavy-tailed distribution (a power-law) whereby for every one-point increase on the Richter scale, an earthquake is ten times less likely. So the bulk of the devastation comes from just a few earthquakes. The Chilean earthquake of 1960, the Alaskan earthquake of 1964, and the Great Sumatra Earthquake of 2004 accounted for half of all energy released by all earthquakes in the world over the last 100 years!
One interesting thing Silver talks about in one of the middle chapters was the failure of SIR models to account for how there wasn’t a re-emergence of HIV in the early 2000s among active gay communities like that in San Francisco (there was an increase in unprotected sex and other STDs). It’s actually still somewhat a matter of debate why this didn’t happen, but probably it was because people began to “serosort” – namely, choose partners who had the same HIV status as them. This goes against one of the major assumptions of the SIR model, which is that interactions among individuals are random.
You may have recently heard about President Ford’s 1976 campaign to vaccinate 200 million people against a suspected H1N1 pandemic. The vaccine dramatically increased the rates of the nerve-damaging Guillain-Barré syndrome, and the public turned against it, such that only 50% of people were willing to be vaccinated! The severity of the outbreak also turned out to be less than expected, so the government gave up after 25% of people were immunised. I’m surprised that I haven’t seen more written about this, and a postmortem about what went wrong here would be helpful.
This seems to be very interesting: is the wet bias a marketing ploy that makes people feel the information is more valuable? Or is it an optimisation because people prefer to prepare for rain and then it not rain than vice versa? I think there’s room for a lot of probability fudging to match intuitive human expectations, just because we are not very good at understanding probability. One example is that if it predicted a 90% chance of sun and it rained I would be very upset, even though this is perfectly within their prediction.
Forecasters deliberately overstate the probablity of rain, following the apparent user preferences. Most people are poorly calibrated to the point of only explicitly noticing “rained without prediction”, and the cost asymmetry points in the same direction.
Making things more complicated is that in many cities it can be raining in one suburb and dry in another, and accurately communicating such spatial heterogeneity is almost as difficult as forecasting it.