The Extrapolation Problem

In a previous post I predicted that machine learning is limited by the algorithms it uses (as opposed to hardware or data). This has two big implications:

  • Scaling up existing systems will not render human thinking obsolete.

  • Replacing the multilayer perceptron with a fundamentally different algorithm could disrupt[1] the entire AI industry.

Today’s artificial neural networks (ANNs) require far more data to learn a pattern than human beings. This is called the sample efficiency problem. Some people think the sample efficiency problem can be solved by pouring more data and compute into existing architectures. Other people (myself included) believe the sample efficiency problem must be solved algorithmically. In particular, I think the sample efficiency problem is related to the extrapolation problem

Extrapolation and Interpolation

Machine learning models are trained from a dataset. We can think of this dataset as a set of points in vector space. Our training data is bounded which means it’s confined to a finite region of vector space. This region is called the interpolation zone. The space outside the interpolation zone is called the extrapolation zone.

Consider an illustraion from this paper. Ground truth is black. Training data is green. An ANN’s output trained on the training data is in blue.

A human being could look at the green training data and instantly deduce the pattern in black. The ANN performs great in the interpolation zone and completely breaks down in the extrapolation zone. The ANNs we use today can’t generalize to the extrapolation zone.

One way around this problem is to paper over it with big data. I agree that big data is a practical solution to many real-world problem. However, I do not believe it is a practical solution to all real-world problems. An algorithm that can’t extrapolate is an algorithm that can’t extrapolate. Period. No amount of data or compute you feed it will get it to extrapolate. Adding data just widens the interpolaton zone. If a problem requires extrapolation then an ANN can’t do it―at least if you’re using today’s methods.

Under this definition of extrapolation, GPT-3 is bad at extrapolation. An example of extrapolation would be if GPT- could invent technology in advance of training data (including prompts) from before the technology was invented. For example, if GPT- could produce schematics for a transistor using only training data from 1930 or earlier then my prediction would be falsified. This is theoretically possible because the field-effect transistor was proposed in 1926 and the Schrödinger equation was actually published in 1926. Feeding GPT- the correct values of universal constants is allowed. GPT- would have access to much more compute than was available when the transistor was actually invented in 1947. It’d just be a matter of doing the logic and math, which I predict GPT- will be forever incapable of.

Other architectures exist which are better at extrapolation.

Neural Ordinary Differential Equations (ODEs)

My predictions on AI timelines were shortened by Neural Ordinary Differential Equations. The math is clever and elegant but what’s really important is what it lets you do. Neural ODEs can extrapolate.

The Neural ODE paper was published in 2019. I predict the first person to successfully apply this technique to quantitative finance will become obscenely rich. If you become a billionaire from reading this article, please invite me to some of your parties.


  1. ↩︎

    I’m using “disrupt” the way Clayton Christensen does in his 1997 book The Innovator’s Dilemma.