Two-year update on my personal AI timelines

Ajeya Cotra2 Aug 2022 23:07 UTC

LW: 291 AF: 105

I worked on my draft report on biological anchors for forecasting AI timelines mainly between ~May 2019 (three months after the release of GPT-2) and ~Jul 2020 (a month after the release of GPT-3), and posted it on LessWrong in Sep 2020 after an internal review process. At the time, my bottom line estimates from the bio anchors modeling exercise were:^[1]

Roughly ~15% probability of transformative AI by 2036^[2] (16 years from posting the report; 14 years from now).
A median of ~2050 for transformative AI (30 years from posting, 28 years from now).

These were roughly close to my all-things-considered probabilities at the time, as other salient analytical frames on timelines didn’t do much to push back on this view. (Though my subjective probabilities bounced around quite a lot around these values and if you’d asked me on different days and with different framings I’d have given meaningfully different numbers.)

It’s been about two years since the bulk of the work on that report was completed, during which I’ve mainly been thinking about AI. In that time it feels like very short timelines have become a lot more common and salient on LessWrong and in at least some parts of the ML community.

My personal timelines have also gotten considerably shorter over this period. I now expect something roughly like this:

~15% probability by 2030 (a decrease of ~6 years from 2036).
~35% probability by 2036 (a ~3x likelihood ratio^[3] vs 15%).
- This implies that each year in the 6 year period from 2030 to 2036 has an average of over 3% probability of TAI occurring in that particular year (smaller earlier and larger later).
A median of ~2040 (a decrease of ~10 years from 2050).
- This implies that each year in the 4 year period from 2036 to 2040 has an average of almost 4% probability of TAI.
~60% probability by 2050 (a ~1.5x likelihood ratio vs 50%).

As a result, my timelines have also concentrated more around a somewhat narrower band of years. Previously, my probability increased from 10% to 60%^[4] over the course of the ~32 years from ~2032 and ~2064; now this happens over the ~24 years between ~2026 and ~2050.

I expect these numbers to be pretty volatile too, and (as I did when writing bio anchors) I find it pretty fraught and stressful to decide on how to weigh various perspectives and considerations. I wouldn’t be surprised by significant movements.

In this post, I’ll discuss:

Some updates toward shorter timelines (I’d largely characterize these as updates made from thinking about things more and talking about them with people rather than updates from events in the world, though both play a role).
Some updates toward longer timelines (which aren’t large enough to overcome the updates toward shorter timelines, but claw the size of the update back a bit).
Some claims associated with short timelines that I still don’t buy.
Some sources of bias I’m not sure what to do with.
Very briefly, what this may mean for actions.

This post is a catalog of fairly gradual changes to my thinking over the last two years; I’m not writing this post in response to an especially sharp change in my view—I just thought it was a good time to take stock, particularly since a couple of people have asked me about my views recently.

Updates that push toward shorter timelines

I list the main updates toward shorter timelines below roughly in order of importance; there are some updates toward longer timelines as well (discussed in the next section) which claw back some of the impact of these points.

Picturing a more specific and somewhat lower bar for TAI

Thanks to Carl Shulman, Paul Christiano, and others for discussion around this point.

When writing my report, I was imagining that a transformative model would likely need to be able to do almost all the tasks that remote human workers can do (especially the scientific research-related tasks). Now I’m inclined to think that just automating most of the tasks in ML research and engineering—enough to accelerate the pace of AI progress manyfold—is sufficient.

Roughly, my previous picture (similar to what Holden describes here) was:

Automate science → Way more scientists → Explosive feedback loop of technological progress

But if it’s possible to automate science with AI, then automating AI development itself seems like it would make the world crazy almost as quickly:

Automate AI R&D → Explosive feedback loop of AI progress specifically → Much better AIs that can now automate science (and more) → explosive feedback loop of technological progress

To oversimplify, suppose I previously thought it would take the human AI development field ~10 years of work from some point T to figure out how to train a scientist-AI. If I learn they magically got access to AI-developer-AIs that accelerate progress in the field by 10x, I should now think that it will take the field ~1 year from point T to get to a scientist-AI.

This feels like a lower bar for TAI than what I was previously picturing—the most obvious reason is because automating one field should be easier than automating all fields, meaning that the model size required should be appreciably smaller. But additionally, I think AI development in particular seems like it has properties that make it easier to automate with only short horizon training (see below for some discussion). So this update reduces my estimate of both model size and effective horizon length.

Feeling like meta-learning may be unnecessary, making short horizon training seem more plausible

Thanks to Carl Shulman, Dan Kokotajlo, and others for discussion of what short horizon training could look like.

In my report, I acknowledged that models trained with short effective horizon lengths could probably do a lot of economically useful work, and that breaking long tasks down into smaller pieces could help a lot. The main candidate in my mind for a task that might require long training horizons to learn was (and still is) “efficient learning” itself.

That is, I thought that a meta-learning project attempting to train a model on many instances of the task “master some complex new skill (that would take a human a long time to learn, e.g. a hard video game or a new type of math) from scratch within the current episode and then apply it” would have a long effective horizon length, since each individual learning task (each “data point”) would take the model some time to complete. Absent clever tricks, my best guess was that this kind of meta-learning run would have an effective horizon length roughly similar to the length of time it would take for a human to learn the average skill in the distribution.

I felt like having the ability to learn novel skills in a sample-efficient way would be important for a model to have a transformative impact, and was unsure about the extent to which clever tricks could make things cheaper than the naive view of “train the model on a large number of examples of trying to learn a complex task over many timesteps.” This pulled my estimate for effective horizon length upwards (to a median of multiple subjective hours).

By and large, I haven’t really seen much evidence in the last two years that this kind of meta-learning—where each object-level task being learned would take a human a long time to learn—can be trained much more cheaply than I thought, or much evidence that ML can directly achieve human-like sample efficiencies without the need for expensive meta-learning (footnote attempts to briefly address some possible objections here).^[5]

Instead, as I’ve thought harder about the bar for “transformative,” I’ve come to think that it’s likely not necessary for the first transformative models to learn new things super efficiently. Specifically, if the main thing needed to have a transformative impact is to accelerate AI development itself:

There’s so much human-imitation data on programming and AI^[6] that the model can train on vastly more examples than a human sees in their lifetime, and after that training it may not really need to learn particularly complex novel skills after training to act as a very skilled AI engineer / researcher.
Coding is intentionally very modular, so it seems especially well-suited to break down into small short-horizon steps.
Probably as a result of the above two points, we’re already seeing much more concrete progress on coding than we are in other technical and scientific domains; AI systems seem likely to add non-trivial value to practicing programmers soon. This generally de-risks the prospect of using AI to help with AI development somewhat, compared to other applications we haven’t started to see yet.
Brute-force search seems like it could play a larger role in progress than in many other sciences (e.g. a relatively simple ML model could generate and test out thousands of different small tweaks to architectures, loss functions, optimization algorithms, etc. at small scale, choosing the one that works best empirically—other sciences have a less clear-cut search space and longer feedback loops).

I do still think that eventually AI systems will learn totally new skills much more efficiently than existing ML systems do, whether that happens through meta-learning or through directly writing learning algorithms much more sample-efficient than SGD. But now this seems likely to come after short-horizon, inefficiently-trained coding models operating pretty close to their training distributions have massively accelerated AI research.

Explicitly breaking out “GPT-N” as an anchor

This change is mainly a matter of explicitly modeling something I’d thought of but found less salient at the time; it felt more important to me to factor it in after the lower bar for TAI made me consider short horizons in general more likely.

My original Short Horizon neural network anchor assumed that effective horizon length would be log-uniformly distributed between ~1 subjective second (which is about GPT-3 level) and ~1000 subjective seconds; this meant that the Short Horizon anchor was assuming an effective horizon length substantially longer than a pure language model (the mean was ~32 subjective seconds, vs ~1 subjective second for a language model).

I’m now explicitly putting significant weight on an amount of compute that’s more like “just scaling up language models to brain-ish sizes.” (Note that this hypothesis/anchor is just saying that the training computation is very similar to the amount of computation needed to train GPT-N, not that we’d literally do nothing else besides train a predictive language model. For example, it’s consistent with doing RL fine-tuning but just needing many OOMs less data for that than for the original training run—and I think that’s the most likely way it would manifest.)

Considering endogeneities in spending and research progress

Thanks to Carl Shulman for raising this point, and to Tom Davidson for research fleshing it out.

My report forecasted algorithmic progress (the FLOP required to train a transformative model in year Y), hardware progress (FLOP / $ in year Y), and willingness to spend ($ that the largest training run could spend on FLOP in year Y) as simple trendline forecasts, which I didn’t put very much thought into.

In the open questions section, I gestured at various ways these forecasts could be improved. One salient improvement (mentioned but not highlighted very much) would be to switch from a black box trend extrapolation to a model that takes into account how progress in R&D relates to R&D investment.

That is, rather than saying “Progress in [hardware/software] has been [X doublings per year] recently, so let’s assume it continues that way,” we could say:

Progress in [hardware/software] has been [X doublings per year]
Over this time, the amount of [money / labor] invested into [hardware / software R&D] has been growing at [Y doublings per year]
This implies that every Y doublings of R&D leads to X doublings of improvement in [hardware / software]

This would then allow us to express beliefs about how investment in R&D will change, which can then translate into beliefs about how fast research will progress. And if ML systems have lucrative near-term applications, then it seems likely there will be demand for increasing investment into hardware and software R&D beyond the historical trend, suggesting that this progress should happen faster than I model.

Furthermore, it seems possible that pre-transformative systems would substantially automate some parts of AI research itself, potentially further increasing the effective “total R&D efforts gone into AI research” beyond what might be realistic from increasing the human labor force alone.

Seeing continued progress and no major counterexamples to DL scaling well

My timelines model assumed that there was a large (80%) chance that scaling up 2020 ML techniques to use some large-but-not-astronomically-large amount of computation (and commensurate amount of data) would work for producing a transformative model.^[7]

Over the last two years, I’d say deep learning has broadly continued to scale up well. Since that was the default assumption of my model, there isn’t a big update toward shorter timelines here—but there was some opportunity for deep learning to “hit a wall” over the last two years, and that didn’t really happen, modestly increasing my confidence in the premise.

Seeing some cases of surprisingly fast progress

My forecasting method was pretty anchored to estimates of brain computation, rather than observations of the impressiveness of models that existed at the time, so I was pretty unsure what the framework would imply for very-near-term progress^[8] (“How good at coding would a mouse be, if it’d been bred over millennia to write code instead of be a mouse?”).

As a result, I didn’t closely track specific capabilities advances over the last two years; I’d have probably deferred to superforecasters and the like about the timescales for particular near-term achievements. But progress on some not-cherry-picked benchmarks was notably faster than what forecasters predicted, so that should be some update toward shorter timelines for me. I’m pretty unsure how much, and it’s possible this should be larger than I think now.

Making a one-time upward adjustment for “2020 FLOP / $”

In my report I estimated that the effective computation per dollar in 2020 was 1e17 FLOP/ $, and projected this forward to get hardware estimates for future years. However, this seems to have been an underestimate of FLOP/ $) as of 2020. This is because:

I was using the V100 as my reference machine; this was in fact the most advanced publicly available chip on the market as of 2020, but it was released in 2018 and on its way out, so it was better as an estimate for 2018 or 2019 compute than 2020 compute. The more advanced A100 was 2-3x more powerful per dollar and released in late 2020 almost immediately after my report was published.
I was using the rental price of a V100 (~$1/hour), but big companies get better deals on compute than that, by about another 2-3x.
I was assuming ~⅓ utilization of FLOP/s, which was in line with what people were achieving then, but utilization seems to have improved, maybe to ~50% or so.

This means the 2020 start point should have been 2.5 * 2.5 * 1.5 = nearly 10x larger. From the 2020 start point, I projected that FLOP / $ would double every ~2.5 years—which is slightly faster than the 2010 to 2018 period but slightly slower than Moore’s law. I haven’t looked into it deeply but my understanding is that this has roughly held, so the update I’m making here is a one-time increase to the starting point rather than a change in rate (separate from the changes in rate I’m imagining due to the endogeneities update).

Updates that push toward longer timelines

My report estimates that the amount of training data required to train a model with N parameters scales as N^0.8, based significantly on results from Kaplan et al 2020. In 2022, the Chinchilla scaling result (Hoffmann et al 2022) showed that instead the amount of data should scale as N.
- Some people have suggested this should be an update toward shorter timelines. This would be true if your method for forecasting timelines was observing models like GPT-2 and GPT-3, gauging how impressive they were, and trying to guess how many orders of magnitude more training computation would be required to reach TAI. The Chinchilla result would show that GPT-3 was “not as good as it could have been” for a fixed amount of training computation, so your estimate for the amount of additional scaling required should go down.
- But in my report I arrive at a forecast by fixing a model size based on estimates of brain computation, and then using scaling laws to estimate how much data is required to train a model of that size. The update from Chinchilla is then that we need more data than I might have thought.
I’m somewhat surprised that I haven’t seen more vigorous commercialization of language models and commercial applications that seem to reliably add real value beyond novelty; this is some update toward thinking that language models are less impressive than they seemed to me, or that it’s harder to translate from a capable model into economic impact than I believed.
There’s been a major market downturn that hit tech companies especially hard; it seems a little less likely to me now than it did when writing the report that there will be a billion dollar training run by 2025.

Overall, the updates in the previous section seem a lot stronger than these updates.

Claims associated with short timelines that I still don’t buy

I don’t expect a discontinuous jump in AI systems’ generality or depth of thought from stumbling upon a deep core of intelligence; I’m not totally sure I understand it but I probably don’t expect a sharp left turn.
Relatedly, I don’t expect that progress will be driven by a small number of key “game-changing” algorithmic insights we can’t anticipate today; I expect transformative models to look quite similar to today’s models^[9] (more so now that my timelines are shorter) and progress to be driven by scale and a large number of smaller algorithmic improvements.
Relatedly, I still don’t expect that TAI will be cheap (e.g. <$10B for a project) and don’t think smallish “underdog” research groups are likely to develop TAI;^[10] I still expect developing TAI to require hundreds of billions of dollars and to be done by tech companies with high valuations,^[11] likely after pretty significant commercialization of sub-transformative systems.
I think the concept of a “point of no return” that is not an objective observable event like “the AI has monopolized violence” is tricky to reason about, since it folds in complicated forecasts about the competence and options of future actors,^[12] but to the extent that I understand the concept I’m mostly not expecting a PONR before explosive acceleration in research progress is underway (e.g. I don’t expect a PONR before the first automated AI company). The update toward a lower bar for “transformative” pushes me more in this direction—I expect there to be a lot of helpful things to do in the final couple of years in between “AIs accelerate AI research a lot” and “Far superintelligent AI” (a big one being use the AIs to help with alignment research).

Sources of bias I’m not sure what to do with

Putting numbers on timelines is in general a kind of insane and stressful exercise, and the most robust thing I’ve taken away from thinking about all this is something like “It’s really a real, live possibility that the world as we know it is radically upended soon, soon enough that it should matter to all of us on normal planning horizons.” A large source of variance in stated numbers is messy psychological stuff.

The most important bias that suggests I’m not updating hard enough toward short timelines is that I face sluggish updating incentives in this situation—bigger changes to my original beliefs will make people update harder against my reasonableness in the past, and holding out some hope that my original views were right after all could be the way to maximize social credit (on my unconscious calculation of how social credit works).

But there are forces in the other direction too—most of my social group is pretty system 1 bought into short timelines,^[13] which for many of us likely emotionally justifies our choice to be all-in on AI risk with our careers. My own choices since 2020 look even better on my new views than my old. I have constantly heard criticism over the last two years that my timelines are too long, and very little criticism that they were too short, even though almost everyone in the world (including economists, ML people, etc) would probably have the other view. I find myself not as interested or curious as I should theoretically be (on certain models of epistemic virtue) in pushback from such people. I spend most of my time visualizing concrete worlds in which things move fast and hardly any time visualizing concrete worlds where they move slow.^[14]

What does this mean?

I’m unclear how decision-relevant bouncing around within the range I’ve been bouncing around is. Given my particular skills and resources, I’ve steered my career over the last couple years in a direction that looks roughly as good or better on shorter timelines than what I had in 2020.

This update should also theoretically translate into a belief that we should allocate more money to AI risk over other areas such as bio risk, but this doesn’t in fact bind us since even on our previous views, we would have liked to spend more but were more limited by a lack of capacity for seeking out and evaluating possible grants than by pure money.

Probably the biggest behavioral impact for me (and a lot of people who’ve updated toward shorter timelines in the last few years) will be to be more forceful and less sheepish about expressing urgency when e.g. trying to recruit particular people to work on AI safety or policy.

The biggest strategic update that I’m reflecting on now is the prospect of making a lot of extremely fast progress in alignment with comparatively limited / uncreative / short-timescale systems in some period a few months or a year before systems that are agentic / creative enough to take over the world. I’m not sure how realistic this is, but reflecting on how much progress could be made with pretty “dumb” systems makes me want to game out this possibility more.

↩︎
Bio Anchors part 4 of 4, pg 14-16.
↩︎
A year chosen to evaluate a claim made by Holden in 2016 that there was a >10% chance of TAI within 20 years.
↩︎
Given by the ratio of odds ratios: (0.35 / 0.65) / (0.15 / 0.85) = 3.05. This implies my observations and logical updates from thinking more since 2020 were 3x more likely in a world where TAI happens by 2036 than in a world where it doesn’t happen by 2036.
↩︎
This contains 50% of my probability mass but is not a “50% confidence interval” as the term is normally used, because the range I’m considering is not the range from 25% probability to 75% probability. This is mainly to keep the focus on the left tail of the distribution, which is more important and easier to think about. E.g. if I’m wrong about most of the models that lead me to expect TAI soonish, then my probability climbs very slowly up to 75% since I would revert back to simple priors.
↩︎
People use “few-shot learning” to refer to language models’ ability to understand the pattern of what they’re being asked for after seeing a small number of examples in the prompt (e.g. after seeing a couple of examples of translating an English sentence into French, a model will complete the pattern and translate the next English sentence into French). However, this doesn’t seem like much evidence about the kind of meta-learning I’m interested in, because it takes very little time for a human to learn a pattern like that. If a bilingual French-and-English-speaking human saw a context with two examples of translating an English sentence into French, they would ~immediately understand what was going on. Since the model already knows English and French, the learning problem it faces is very short-horizon (the amount of time it would take a human to read the text). I haven’t yet seen evidence that language models can be taught new skills they definitely didn’t know already over the course of many rounds of back-and-forth.

I’ve also seen EfficientZero cited as evidence that SGD itself can reach human-level sample complexities (without the need for explicit meta-learning), but this doesn’t seem right to me. The EfficientZero model learned the environment dynamics of a game with ML and then performed a search against that model of the environment to play the game. It took the model a couple of hours per game to learn environment dynamic facts like “the paddle moves left to right”, “if an enemy hits you you die,” etc. The relevant comparison is not how much time it would take a human to learn to play the game, it’s how much time it would take a human to get what’s going on and what they’re supposed to aim for—and the latter is not something that it would take a human 2 hours of watching to figure out (probably more like 15-60 seconds).

The kind of thing that would seem like evidence about efficient meta-learning is something like “A model is somehow trained on a number of different video games, and then is able to learn how to play (not just model the dynamics of) a new video game it hadn’t seen before decently with a few hours of experience.” The kind of thing that would seem like evidence of human-like sample efficiency directly from SGD would be something like “good performance on language tasks while training on only as many words as a human sees in a lifetime.”
↩︎
All the publicly-available code online (e.g. GitHub), plus company-internal repos, keylogging of software engineers, explicitly constructed curricula/datasets (including datasets automatically generated from the outputs of slightly smaller coding models), etc. Also, it seems like most of the reasoning that went into generating the code is in some sense manifested in the code itself, whereas e.g. the thinking and experimentation that went into a biology experiment isn’t all directly present in the resulting paper.
↩︎
Row 17 in sheet “Main”
↩︎
The key exception, as discussed above, was predicting that cheap meta-learning wouldn’t happen, which I’d say it didn’t.
↩︎
Note that there’s a bunch of already-widely-used techniques (most notably search) that some people wouldn’t count as “pure deep learning” which I expect to continue to play an important role. Transformative AI seems quite likely to look like AlphaGo (which uses search), RETRO (which uses retrieval), etc.
↩︎
Though my probability on this necessarily increased some because shifting the distribution to the left has to increase probability mass on “surprisingly cheap,” it’s still not my default. If I had to guess I’d say maybe ~15% chance on <$10B for training TAI and a similar probability that it’s trained by a company with <2% of the valuation of the biggest tech companies. Betting on this possibility—with one’s career or investments—seems better than it did to me before (and I don’t think it was insane in the past either; just not something I’d consider the default picture of the future).
↩︎
Though labs that are currently smallish could grow to have massive valuations and a ton of employees and then develop transformative systems, and that seems a lot more likely than that a company would develop TAI while staying small.
↩︎
E.g., it seems like it wouldn’t be hard to argue for a “PONR” in the past, e.g. “alignment is so hard that the fact that we didn’t get started on it 20 years ago means we’re past the point of no return.” Instead it just feels like the difficulty of changing course just gets worse and worse over time, and there are very late-stage opportunities that could still technically help, like “shutting down all the datacenters.”
↩︎
I might have ended up, with this update, near the median of the people I hang out with most, but I could also still be slower than them—not totally sure.
↩︎
Since longer timelines are harder to think about since the world will have changed more before TAI, less decision-relevant since our actions will have washed out more, less emotionally gripping, etc.

What links here?