Lessons from weather forecasting and its history for forecasting as a domain

This is the first of two (or more) posts that look at the domain of weather and climate forecasting and what we can learn from the history and current state of these fields for forecasting as a domain. It may not be of general interest to the LessWrong community, but I hope that it’s of interest to people here who have some interest either in weather-related material or in forecasting in general.

The science of weather forecasting has come a long way over the past century. Since people starting measuring and recording the weather (temperature, precipitation, etc.) two simple algorithms for weather prediction have existed (see also this):

Persistence: Assume that the weather tomorrow will be the same as the weather today.
Climatology: Assume that the weather on a given day of the year will be the same as the average of the weather on that same day in the last few years (we might also use averages for nearby days if we don’t have enough years of data).

Until the end of the 19th century, there was no weather prediction algorithm that did consistently better than both persistence and climatology. Between persistence and climatology, climatology won out over medium to long time horizons (a week or more), whereas persistence won out in some kinds of places over short horizons (1-2 days), though even there, climatology sometimes does better (see more here). Both methods have very limited utility when it comes to predicting and preparing for rare extreme weather events, such as blizzards, hurricanes, cyclones, polar winds, or heavy rainfall.

This blog post discusses the evolution and progress of weather forecasting algorithms that significantly improve over the benchmarks of persistence and climatology, and the implications both for the future of weather forecasting and for our understanding of forecasting as a domain.

Sources for further reading (uncited material in my post is usually drawn from one of these): Wikipedia’s page on the history of numerical weather prediction, The Signal and the Noise by Nate Silver, and The origins of computer weather prediction and climate modeling by Peter Lynch.

The three challenges: theory, measurement, and computation

The problem facing any method that tries to do better than persistence and climatology is that whereas persistence and climatology can rely on existing aggregate records, any more sophisticated prediction algorithm relies on measuring, theorizing about, and computing with a much larger number of other observed quantities. There are three aspects to the challenge:

The theoretical challenge or model selection challenge: The goal is to write a system of equations describing how the climate system evolves from certain initial measurements. In the weather prediction context, this was the first of the challenges to be nailed: the basic equations of the atmosphere come from physics, which has been well understood for over a century now.
The measurement challenge: A large number of measurements at different points in the area and at regular intervals of time need to be taken to initialize the data appropriately for the weather simulation. The measurement challenge was largely resolved early on: it was easy to set up stations for measuring temperature, humidity, and other indicators around the world, and communications technology enabled the data to be quickly relayed to a central processing station. Of course, many improvements have occurred over the 20th century: we can now make measurements using satellites, as well as directly measure weather indicators at different altitudes. But measurement challenges were not critical in getting weather prediction started.
The computational challenge: This appears to have been the most difficult of the challenges and the critical constraint in making real-time weather predictions. The computations needed for making a forecast that could beat persistence and climatology over any time horizon were just too numerous for humans to carry out in real time. In fact, the ability to make accurate weather predictions was one of the motivations for the development of improved computing machinery.

The basic theory of weather forecasting

The basic idea of weather forecasting is to use the equations of physics to model the evolution of the atmospheric system. In order to do this, we need to know how the system looks at a given point. In principle, that information, combined with the equations, should allow us to compute the weather indefinitely into the future. In practice, the equations we create don’t have any closed-form solutions, the data is only partial (we don’t have initial data on the whole world) and even small variations at a given time can balloon to bigger changes (this is called the butterfly effect; more on this later in the post).

Instead of trying to solve the system analytically, we discretize the problem (we use discrete spatial locations and discrete time steps) and then solve the problem numerically (this is a bit like using a difference quotient instead of a derivative when computing a rate of change). There are four dimensions to the discretization (three spatial and one temporal), and how fine we make the grid in each dimension is called the resolution in that dimension.

Spatial dimensions and spatial resolution: The region over which we are interested in forecasting the weather is converted to a grid. We have freedom in how fine we make the grid. In general, finer grids make for more precise and accurate weather predictions, but require more computational resources. The grid has two horizontal dimensions and one height dimension, hence a total of three dimensions. Thus, making the grid x times as fine (i.e., making the spatial resolution x times as fine) means increasing the number of regions to x³ times the current value.
Time dimension and temporal resolution: We also choose a time step. In general, smaller time steps make for more precise and accurate weather predictions, but require more computational resources (because the number of time steps needed to traverse a particular length of time is more). If we divide the time step by x, we multiply the time and space storage needs by a factor of x.

Thus, roughly, becoming finer by a factor of x in all three spatial dimensions and the time dimension requires upping computational resources to about x⁴ of the original value. So, doubling in all four dimensions requires improving computational power to 16 times the original value, which means four doublings. Combining this with natural improvements in computing, such as Moore’s law, we expect that we should be able to make our grid twice as fine in all dimensions (i.e., double the spatial and temporal resolution) every 8 years or so.

How much precision and accuracy does high resolution buy?

My intuitive prior would be that, for sufficiently short time horizons where we don’t expect chaos to play a dominant role, we’d expect the relationship suggested by the logarithmic timeline. Does this agree with the literature?

I don’t feel like I have a clear grasp of the literature, so the summary below is somewhat ad hoc. I hope it still helps with elucidating the relationship.

Higher resolution means greater precision of forecasts holding the time horizon constant (assuming that it’s a time horizon over which we could reasonably make forecasts). This makes sense: higher temporal resolution allows us to approximate the (almost) continuous evolution of the atmospheric system better, and higher spatial resolution allows us to work with a better initialization as well as approximate the continuous evolution better. For instance, a page on the website of weather forecasting service meteoblue has the title Resolution means precision.
The relation between resolution and accuracy is less clear. Although, up to a point, higher resolution enables more accurate forecasts, the relation does not continue at ever-higher levels of precision, for a variety of reasons (including the chaos problem discussed next). For more, see Climate prediction: a limit to adaptation by Suraje Dessai, Mike Hulme, Robert Lempert, and Roger Pielke, Jr.
The type of resolution that matters more can depend on the type of phenomenon that we are predicting. In some cases, temporal resolution is more important than spatial resolution. In some cases, particularly phenomena relating to interactions between the different layers of the atmosphere, vertical resolution matters more than horizontal resolution, whereas in other cases, horizontal resolution matters more. For instance, the paper Impacts of Numerical Weather Prediction Spatial Resolution On An Atmospheric Decision Aid For Directed Energy Weapon Systems finds that vertical resolution matters more than horizontal resolution for a particular application.

The problem of chaos and the butterfly effect

The main problem with weather prediction is hypersensitivity to initial conditions: even small differences in initial values can have huge effects over longer timescales (this is sometimes called the butterfly effect). This effect could occur at many different levels.

Measurements may not be sufficiently precise or detailed (temperature and precipitation are measured only at a few weather stations rather than everywhere). Some of the measurements may be somewhat flawed as well. Apart from the usual measurement error, the choice of weather stations may introduce bias: weather stations have often historically been located close to airports and to other hubs of activity, where temperatures may be higher due to the heat generated by the processes nearby (see also the page on urban heat island).
The computer programs that do numerical weather simulation don’t store data to infinite precision. Choices of how to round off can profoundly affect weather predictions.
There may be actions by humans, animals, or human institutions that aren’t modeled in the atmospheric system, but perturb it sufficiently to affect weather predictions. For instance, if lots of people burst firecrackers on a day, that might affect local temperatures and air composition in a small manner that might have larger effects over the coming days.

Due to these problems, modern algorithms for numerical weather prediction run simulations with many slight variations of the given initial conditions, using a probabilistic model to assign probabilities to the different scenarios considered. Note that here we are making slight variations to the data and running the model on these variations to generate a collection of scenarios weighted by probability.

As the time horizon for forecasting increases (we get to one week ahead or beyond) our understanding of how the equilibrating influences of the weather play out is more fuzzy. For such timescales, we use ensemble forecasting with a collection of different models. The models may use different data and give attention to different aspects of the data, based on slightly different underlying theories of how the different weather phenomena interact. As before, we generate probabilistic weather predictions.

Can (and should) weather forecasting be fully automated?

Nate Silver observed in his book The Signal and the Noise that the proportional improvement that human input made to the computer models has stayed constant at about 25% for precipitation forecasts and 10% for temperature forecasts, even as the computer models, and therefore the final forecast, have improved considerably over the last few decades. The sources cited by Silver don’t seem to be online, but I found another paper with the data that Silver uses. Silver says that humans’ main input is in the following respects:

Human vision (literally) is powerful in terms of identifying patterns and getting a broad sense of what is happening. Computers still have trouble seeing patterns. This is related to the fact that humans in general have an easier time with CAPTCHAs than computers, although that might be changing as machine learning improves.
Humans have better intuition at identifying false runaway predictions. For instance, a computer might think that a particular weather phenomenon will snowball, whereas humans are likely to identify equilibrating influences that will prevent the snowballing. Humans are also better at reasoning about what is reasonable to expect based on climatological history.

On a related noted, Charles Doswell has argued that it would be dangerous to try to fully automate weather forecasting, because direct involvement with the weather forecasting process is crucial for meteorologists to get a better sense of how to make improvements to their models.

Machine learning in weather prediction?

For most of its history, weather prediction has relied on models grounded in our understanding of physics, with some aspects of the models being tweaked based on experience running the models. This differs somewhat from the spirit of machine learning algorithms. Supervised machine learning algorithms take a bunch of input data and output data and try to learn how to predict the outputs from the inputs, with considerable agnosticism about the underlying theoretical mechanisms. In the context of weather prediction, a machine learning algorithm might view the current measured data as the input, and the measured data after a certain time interval (or some summary variable, such as a binary variable recording whether or not it rained) as the output to be predicted. The algorithm would then try to learn a relation from the inputs to the outputs.

In recent years, machine learning ideas have started being integrated into weather forecasting. However, the core of weather prediction still relies on using theoretically grounded models. (Relevant links: Quora question on the use of machine learning algorithms in weather forecasting, Freakonomics post on a company that claims to use machine learning to predict the weather far ahead, Reddit post about that Freakonomics post).

My uninformed speculation is that machine learning algorithms would be most useful in substituting for the human input element to the model rather than the core of the numerical simulation. In particular, to the extent that machine learning algorithms can make progress on the problem of vision, they might be able to use their “vision” to better interpret the results of numerical weather prediction. Moreover, the machine learning algorithms would be particularly well-suited to using Bayesian priors (arising from knowledge of historical climate) to identify cases where the numerical models are producing false feedback loops and predicting things that seem unlikely to happen.

Prehistory: before weather simulation came to fruition: meeting the theoretical challenge

Here is a quick summary of the initial steps taken to realizing weather forecasting as a science. These steps concentrated on the theoretical challenge. The measurement and computational challenges would be tackled later.

Cleveland Abbe made the basic observation that weather prediction was essentially a problem of the application of hydrodynamics and thermodynamics to the atmosphere. He detailed his observations in the 1901 paper The physical basis of long-range weather forecasts. But this was more an identification of the general reference class of models to use than a concrete model of how to predict the weather.
Vilhelm Bjerknes, in 1904, set down a two-step plan for rational forecasting: a diagnostic step, where the initial state of the atmosphere is determined using observations, and a prognostic step, where the laws of motion are used to calculate the evolution of the system over time. He even identified most of the relevant equations needed to compute the evolution of the system, but he didn’t try to prepare his ideas for actual practical use.
Lewis Fry Richardson published in 1922 a detailed description of how to predict the weather, and applied his model to an attempted 6-hour forecast that took him 6 weeks to compute by hand. His forecast was off by a huge margin, but he was still convinced that the model was broadly correct, and with enough data and computing power, it could produce useful predictions.

It’s interesting that scientists such as Richardson were so confident of their approach despite its failure to make useful predictions. The confidence arguably stemmed from the fact that the basic equations of physics that the model relied on were indubitably true. It’s not surprising that Richardson’s confidence wasn’t widely shared. What’s perhaps more surprising is that enough people were convinced by the approach that, when the world’s first computers were made, weather prediction was viewed as a useful initial use of these computers. How they were able to figure out that this approach could bear fruit is related to questions I raised in my paradigm shifts in forecasting post.

Could Richardson have fixed the model and made correct predictions by hand? With the benefit of hindsight, it turns out that if he’d applied a standard procedure to tweak the original data he worked with, he would have been able to make decent predictions by hand. But this is easier seen in hindsight, when we have the benefit of being able to try out many different tweaks of the algorithm and compare their performance in real time (more on this point below).

The first successful computer-based numerical weather prediction

In the mid-1930s, John von Neumann, one of the key figures in modern computing, stumbled across weather prediction and identified it as an ideal problem suited to computers: it required a huge amount of calculation using clearly defined algorithms from measured initial data. An initiative supported by von Neumann led to the meteorologist Jule Charney getting interested in the problem. In 1950, a team led by Charney came up with a complete numerical algorithm for weather prediction, building on and addressing some computational issues in Richardson’s original algorithm. This was then implemented on the ENIAC, the only computer available at the time. The simulation had a time ratio of 1: it took 24 hours to simulate 24 hours of weather. Charney called it a vindication of the vision of Lewis Fry Richardson.

Progress since then: the interplay of computational and theoretical

Since the 1950 ENIAC implementation of weather forecasting, weather forecasting has improved slowly and steadily. The bulk of the progress has been through access to faster computing power. Theoretical models have also improved. However, these improvements are not cleanly separable. The ability to run faster simulations computationally allows for quicker testing and comparison of different algorithms, and experimental adjustment to make them work faster and better. In this case, one-time access to better computational resources can lead to long-term improvements in the algorithms used.