Some Intuitions Around Short AI Timelines Based on Recent Progress

tldr: I give some informal evidence and intuitions that point toward AGI coming soon. These include thinking about how crazy the last year has been, beliefs from those in major AI labs, and progress on MMLU.

Intro

This post is intended to be a low-effort reference I can point people to when I say I think there is some evidence for short AI timelines. I might describe the various bits of evidence and intuitions presented here as “intuitions around short AI timelines based on recent progress” (though perhaps there are better terms). They are not a thorough model like Ajeya’s; insofar as somebody is using multiple models when putting together a timelines estimate, I think it would be unreasonable to place less than 20% or greater than 95% weight on extrapolation from current systems and recent progress.[1]

In the spirit of being informal, you can use whatever definition of AGI you like. I mostly use AGI to refer to something like “an AI system which can do pretty much all the human cognitive tasks as well or better than humans (~99% of tasks people in 2023 do).”

Some evidence

I (Aaron) started following AI and AI existential safety around the beginning of 2022; it’s been a little over a year. Some of that time was my understanding catching up with advances from the past couple years, but there have also been major advances.

Some major advances since I’ve been paying attention: Chinchilla paper popularized the scaling laws that were already known to some, there was some DALL-E and related stuff which was cool, CICERO happened which I didn’t follow but indicates we’re probably going to train the AIs to do all the dangerous stuff (see also Auto-GPT and Chaos-GPT, or GPT-4 getting plugins within 2 weeks of release, as more recent updates in this saga of indignity), ChatGPT shows how much more usable models are with RLHF (popularizing methods that have been known for a while), Med-PaLM gets a passing score on the US medical licensing exam (also tons of other PaLM and Flan-PaLM results I haven’t followed but which seem impressive). LLaMA and Alpaca take powerful capabilities from compute-efficient (and over) training and hand them to the public. GPT-4 blows the competition out of the water on many benchmarks. I probably missed a couple big things (including projects which honorably didn’t publicly push SOTA, 1, 2); the list is probably a bit out of order; I’ve also included things from 2023; but man, that sure is a year of progress.

I don’t expect there are all that many more years with this much progress before we hit AGI — maybe 3-12 years. More importantly, I think this ~15 month period, especially November 2022-now, has generated a large amount of hype and investment in AI research and products. We seem to be on a path such that — in every future year before we die — there is more talent+effort+money working on improving AI capabilities than there was in 2022.[2] I hold some hope that major warning shots and/​or regulation would change this picture, in fact I think it’s pretty likely we’ll see warning shots beyond those we have already seen, but I am not too optimistic about what the response will be. As crazy as 2022 was, we should be pretty prepared for a world that gets significantly crazier. I find it hard to imagine that we could have all that many more years that look like 2022 AI progress-wise, especially with significantly increased interest in AI.


A large amount of the public thinks AGI is near. I believe these people are mostly just thinking about how good current systems (GPT-4) are and informally extrapolating

[image description: A Twitter poll from Lex Fridman where he asks “When will superintelligent general AI (AGI) arrive?” There are 270,000 responses. 61% of respondents answered that it will arrive in 1 to 10 years and 21% of respondents answered that it will arrive in 10-20 years.


Some Anthropic staff seem to think something-like-AGI could be near:

Over the next 5 years we might expect around a 1000x increase in the computation used to train the largest models, based on trends in compute cost and spending. If the scaling laws hold, this would result in a capability jump that is significantly larger than the jump from GPT-2 to GPT-3 (or GPT-3 to Claude). At Anthropic, we’re deeply familiar with the capabilities of these systems and a jump that is this much larger feels to many of us like it could result in human-level performance across most tasks… we believe they jointly support a greater than 10% likelihood that we will develop broadly human-level AI systems within the next decade.

I have ~0 evidence, but I expect folks at OpenAI are also in the ‘significant probability of AGI before 2030’ boat. Note that the conductors on the train are surely biased[3], but they also sure do have better evidence than all the rest of us.


AI is starting to be used to accelerate AI research.

A group of intuition pumps

Being in the thick of it can really bias people’s judgment. The format of the following intuitions pumps is that I imagine visiting alien civilizations much like earth, and I try to reason from just one piece of evidence at a time about how long that planet has.


A major tech company there releases a paper about “sparks of artificial general intelligence” in a state-of-the-art system. How much longer does this planet have until AGI? I would expect something like 1-8 years.


You visit a different planet. People are using coding assistants to speed up their coding process, and in the first couple years of doing this people are able to cut their coding time by ~50% (on the higher side of ways to interpret the data). How much longer does this planet have until AGI? Maybe 2-11 years.


You visit a different planet. You ask somebody there how many young aliens are studying AI and how much AI research is happening each year. They show you the following graphs:

[image description: The first two graphs show the overall number of college degrees and the number of STEM degrees conferred per year from 2011 to 2021. They indicate that for Bachelor’s and Master’s degrees, STEM has become quite popular in the last 10 years, seeing almost double the STEM Bachelor’s in 2021 as 2011, while the increase across all Bachelor’s degrees is only around 20%. The third graph shows the same data but for AI degrees specifically. The number of Master’s in AI degrees conferred approximately doubled over the 10 year period, while AI Bachelor’s degrees more than doubled. PhDs do not seem to have experienced significant growth in graphs 1 through 3. The fourth graph shows the number of AI publications between 2010 and 2021. There are about twice as many publications in 2021 as 2010; there seems to be an increase in growth rate after 2017, but overall these numbers aren’t too crazy. The fourth graph shows the number of AI publications broken down by field of study, and we can see that machine learning takes off around 2017, experiencing a significant increase in publishing since then.]

These trend lines aren’t that crazy. But they sure do indicate an uptick in people and research working on AI on that planet.


You visit a different planet. You are shown a benchmark used for AIs known as MMLU. It is a large multiple choice test covering many topics, from high school world history to college physics. Some unspecialized aliens taking this test scored around 35%; experts are predicted to score around 90%; you guess that a smart university student alien could probably score around 70% if they spent a few months studying. The graph of SOTA AI performance on this test, in the last 4 years, is as follows:

[image description: Data for MMLU performance over time. The first datapoint is GPT-2 from early 2019 which scores 32%. In mid 2020 GPT-3 scores 54%. In early 2022 Chinchilla scores 68%. In late 2022 Flan-PaLM scores 75%. In early 2023 GPT-4 scores 86%.]

Now, there are some rumors about dataset contamination in the results on the most recent record, but the evidence isn’t very strong for that, and it probably doesn’t change the picture too much. Eyeballing a trend line through this performance would imply that this planet is going to have AIs that outperform expert aliens on most academic cognitive tasks in months to years. [edit: I think passing 90% MMLU is not actually the same as “outperforming experts on most academic cognitive tasks”; it’s probably closer to “having above-expert-level knowledge in almost all academic areas”.]

Conclusion

I think there are some pretty compelling reasons to expect AGI soon, for most reasonable definitions of AGI. There are a number of things that have happened in the world recently which — if I were a neutral observer — I think would make me expect AGI in a single digit number of years. While I don’t like how vague/​informal/​ass-number these “intuitions around [short] AI timelines based on recent progress” are, the alternative might be that we all fall over dead while staring at a graph that says we’re 5 OOMs of compute short. I’m interested in feedback on this post, see this footnote for details.[4]

Footnotes:

  1. ^

    It seems unreasonable to not put at least 20% weight on looking at current systems when projecting future progress (see also Jacob Steinhardt’s discussion of different weights for thinking about how future AI systems will be), but by no means do you have to use the ideas in this post; if you have a nice model for AI timelines that easily incorporates recent progress, please let me know! Unfortunately, it seems to me that AI timeline estimates are often divorced from current systems, for example Ajeya’s report, Tom’s report. I appreciate Matthew putting together a framework that can more easily update on recent advances. I have generally felt pretty confused about how to update on advances in AI, in part due to some kind of measurability bias where the AI timeline models which have been really fleshed out don’t seem to easily allow updating on the performance of current systems. Part of the reason for this post is to publicly give myself permission to update on the things I see when I open my eyes, even though I don’t have a nice model to put these observations in.

  2. ^

    Hopefully we don’t die. When we’re safe I will happily rewrite this sentence to say “in every future year before we exit the acute risk period”.

  3. ^

    I would expect those inside AI labs to be somewhat biased toward shorter timelines because: insular social environment that focuses on AGI, wanting to think their work is important, selection effect of choosing to work on AGI because they think it’s feasible for them to make it, interest in generating hype for the sake of financial investment, probably more.

  4. ^

    Types of feedback I would be interested in:

    - If you think any of the evidence/​arguments presented actually point in a different direction than I say, let me know!

    - If you think there are major pieces of evidence from this “intuitions around [short] AI timelines based on recent progress” perspective that I did not include, including evidence from recent advances that points toward longer timelines, drop a comment! I’m less interested in evidence from other approaches given that this post is trying to build a specific model to then be mixed with other models.

    - If you have nice models/​frameworks for thinking about AI timelines that allow for updating on recent progress, let me know!

    - If you really think this post would benefit from being fleshed out or cleaned up (I’m pretty averse to this because I don’t think it’s worth the time)!

    - If you think one should put less than 20% of their timeline thinking weight on recent progress, I would be interested in hearing why.

    - If you think I’m seriously messing up somewhere, let me know!

    - I’m not very interested in debating the definition of AGI unless you think it really matters for informal thoughts like these.