A comment on Ajeya Cotra’s draft report on AI timelines

Ajeya Cotra’s draft report on AI timelines is the most useful, comprehensive report about AI timelines I’ve seen so far. I think the report is a big step towards modeling the fundamental determinants of AI progress. That said, I am skeptical that the arguments provided in the report should reduce our uncertainty about AI timelines by much, even starting from a naive perspective.

To summarize her report, and my (minor) critique,

  • Ajeya builds a distribution over how much compute it would take to train a transformative ML model using 2020 algorithms. This compute distribution is the result of combining the distributions under six different biological anchors, and spans over 20 orders of magnitude of compute.

  • To calculate when it will it will be affordable to train a transformative ML model, Ajeya guesses (1) how much hardware price-performance will fall in the coming decades, (2) how much algorithmic progress will happen, and (3) how much large actors (like governments) will be willing to spend on training a transformative AI.

  • Ajeya says that she spent most of her time building the 2020 compute distribution, and relatively little time forecasting price-performance, algorithmic progress, and willingness to spend parameters of the model. Therefore, it’s not surprising that I’d disagree with her estimates.

    However, my main contention is that her estimates for these parameters are just point estimates, meaning that our uncertainty over these parameters doesn’t translate into uncertainty in her final bottom-line distribution of when it will become affordable to train a transformative ML model.

  • Separately, I think the hardware forecast is too optimistic. She assumes that price-performance will cut in half roughly every 2.5 years over the next 50 years. I think it’s more reasonable to expect hardware progress will be slower in the future.

  • When making the appropriate adjustments implied by the prior two points, the resulting bottom-line distribution is both more uncertain and later in the future. This barely reduces our uncertainty about AI timelines compared to what many (perhaps most?) EAs/​rationalists believed prior to 2020.

Upfront, I definitely am not saying that her report is valueless in light of this rebuttal. In fact, I think Ajeya’s report is useful because of how it allows people to build their own models on top of it. I also think the report is admirably transparent, and unusually nuanced. I’m just skeptical that we should draw strong conclusions about when transformative AI will arrive based on her model alone (though future work could be more persuasive).

The critique, elaborated

A summary of how the compute distribution is constructed

Ajeya Cotra’s model is highly parameterized compared to simple alternatives that have sometimes been given (like some of Ray Kurzweil’s graphs). It also gives a very wide distribution over what we should expect.

The core of her model is the “2020 training computation requirements distribution”, which is a probability distribution over how much computation it would take to train a transformative machine learning model using 2020 algorithms. To build this distribution, she produces six anchor distributions, each rooted in some analogy to biology.

Two anchors are relatively straightforward: the lifetime anchor grounds our estimate of training compute to how much computation humans perform during childhood, and the evolution anchor grounds our estimate of training compute to how much computation evolution performed in the process of evolving humans from the first organisms with neurons.

Three more anchors (the “neural network anchors”) ground our estimate of the inference compute of a transformative model in the computation that the human brain uses, plus an adjustment for algorithmic inefficiency. From this estimate, we can derive the number of parameters such a model might have. Using empirical ML results, we can estimate how many data points it will take to train a model with that number of parameters.

Each neural network anchor is distinguished by how it translates “number of data points used during training” into “computation used to train”. The short-horizon neural network assumes that each data point will take very little computation, as in the case of language modeling. The medium and long-horizon neural networks use substantially longer “effective horizons”, meaning that they estimate it will take much more computation to calculate a single meaningful data point for the model.

The genome anchor grounds our estimate of the number of parameters of a transformative model to the number of parameters in the human genome. Then, it uses a long effective horizon to estimate how much computation it would take to train this model.

By assigning a weight to each of these anchors, we can construct a probability distribution over the total amount of computation it will take to train a transformative AI using 2020 algorithms. Since the lifetime anchor gives substantial probability on amounts of compute that have already been used in real training runs, Ajeya applies a small adjustment to this aggregate distribution.

The final compute distribution is extremely wide, assigning non-negligible probability between 10^24 FLOPs to 10^50 FLOPs.

My core contention

To translate the compute distribution into a timeline of when it will become affordable to train transformative AI, Ajeya produces three forecasts for the following three questions:

  1. How quickly will computing prices fall in the coming decades?

  2. How fast will we make progress in algorithmic efficiency?

  3. How much will large actors be willing to spend on training a transformative AI?

Ajeya comments that,

These numbers are much more tentative and unstable than my hypothesis distributions, and I expect many readers to have substantially more information and better intuitions about them than I do; I strongly encourage interested readers to generate their own versions of these forecasts with this template spreadsheet.

Her estimates for each of these parameters are simply numerical guesses, ie. point estimates. That means that our uncertainty in these guesses is not reflected in the bottom-line distribution for when training transformative AI should become affordable. But in my opinion, we should be moderately uncertain in each of these parameters.

I think it’s inaccurate to convey the final bottom-line probability distribution (the one with a median of about 2052) as representing all of our uncertainty over the variable being modeled. If we were to make some adjustments to the model to account for our uncertainty over these parameters, the bottom-line distribution would get much wider. As a consequence, it’s not clear to me that the Bio Anchors model should narrow our uncertainty about AI timelines by an appreciable degree.

To be clear: I’m not claiming that Ajeya makes any claims about Bio Anchors being a final all-thing-considered AI timelines model. Her real claims are quite modest, and she even performs a sensitivity analysis. Still, I think my takeaway from this report is different from how some others have interpreted it.

To give one example of what I see as an inaccurate conclusion being drawn from Ajeya’s report, Holden Karnofsky wrote,

Additionally, I think it’s worth noting a couple of high-level points from Bio Anchors that don’t depend on quite so many estimates and assumptions...

  • If AI models continue to become larger and more efficient at the rates that Bio Anchors estimates, it will probably become affordable this century to hit some pretty extreme milestones—the “high end” of what Bio Anchors thinks might be necessary. These are hard to summarize, but see the “long horizon neural net” and “evolution anchor” frameworks in the report.

  • One way of thinking about this is that the next century will likely see us go from “not enough compute to run a human-sized model at all” to “extremely plentiful compute, as much as even quite conservative estimates of what we might need.”

But, I think the main reason why we hit such high compute milestones by the end of the century in this model is simply that Ajeya assumes that we will have fast progress in computing price-performance in the coming decades; in fact, faster than what we’ve been used to in the last 10 years. I think this is a dubious assumption.

Under her conservative assumptions from the sensitivity analysis, we don’t actually get very close to being able to-run evolution by the end of the century.

Her mainline estimate for price-performance progress is that the price of computation will fall by half roughly every 2.5 years for the next 50 years, after which it will saturate. Her mainline forecast looks like this.

Price performance in the last decade has been quite a lot slower than this. For example, Median Group found a slower rate of price-performance cutting in half about every 3.5 years in the last 10 years. (I’d also expect a newer analysis to show an even slower trend given the recent GPU shortage).

Roughly, the main reason for this slowdown was the end of Dennard scaling, which heavily slowed down single-threaded CPU performance.

This slowdown is also in line with a consistent slowdown in supercomputing performance, which is plausibly a good reference class for how much computation will be available for AI projects. Here’s a chart from Top 500.

So why does she forecast that prices will cut in half every 2.5 years, instead of a slower rate, matching recent history? In the report, she explains,

This is slower than Moore’s law (which posits a ~1-2 year doubling time and described growth reasonably well until the mid-2000s) but faster than growth in effective FLOP per dollar from ~2008 to 2018 (a doubling time of ~3-4 years). In estimating this rate, I was attempting to weigh the following considerations:

  • Other things being equal, the recent slower trend is probably more informative than older data, and is fairly likely to reflect diminishing returns in the silicon chip manufacturing industry.

  • However, the older trend of faster growth has held for a much longer period of time and through more than one change in “hardware paradigms.” I don’t think it makes sense to extrapolate the relatively slower growth from 2008 to 2018 over a period of time several times longer than that

  • Additionally, a technical advisor informs me that the NVIDIA A100 GPU (released in 2020) is substantially more powerful than the V100 that it replaced, which could be more consistent with a ~2-2.5 year doubling time than a ~3.5 year doubling time.

Of these points, I find point 2 to be a weak argument in favor of fast future growth, and I find point 3 particularly weak. All things considered, I expect the price-performance trend to stay at its current pace (halving every 3-4 years) or to get even slower, as we get closer to saturating with fundamental physical limits. It just seems hard to imagine that, after experiencing the slowdown we saw in the last decade, hardware engineers will suddenly find a new paradigm that enables much quicker progress in the medium-term.

To get more specific, I’d assign about a 20% credence to the hypothesis that price performance trends will robustly get faster in the next 20 years, and maintain that pace for another 30 years after that. Conversely, that means I think there’s about an 80% chance that the hardware milestones predicted in this report are too optimistic.

On this basis, I think it is also reasonable to conclude that we should be similarly conservative for other variables that are downstream from hardware progress, which potentially includes algorithmic progress.

Conclusion

I agree that Ajeya Cotra’s report usefully rebuts e.g. Robin Hanson’s timelines in which AI is centuries away. However, my impression from talking to rationalists prior to 2020 is that most people didn’t put much weight on these sorts of very long timelines anyway.

(Of course, the usual caveats apply. I might just not have been talking to the right people.)

Personally, prior to 2020, I thought AI was very roughly 10 to 100 years away, and this report doesn’t substantially update me away from this very uncertain view. That’s not to say it’s not an interesting, useful report to build on. But it’s difficult for me to see how the report, on its own, has any major practical implications for when we should expect to see advanced AI.