I thought it would be helpful to post about my timelines and what the timelines of people in my professional circles (Redwood, METR, etc) tend to be.
Concretely, consider the outcome of: AI 10x’ing labor for AI R&D[1], measured by internal comments by credible people at labs that AI is 90% of their (quality adjusted) useful work force (as in, as good as having your human employees run 10x faster).
Here are my predictions for this outcome:
25th percentile: 2 year (Jan 2027)
50th percentile: 5 year (Jan 2030)
The views of other people (Buck, Beth Barnes, Nate Thomas, etc) are similar.
I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).
I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).
@ryan_greenblatt can you say more about what you expect to happen from the period in-between “AI 10Xes AI R&D” and “AI takeover is very plausible?”
I’m particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during this period. Would be curious for your takes on how much of this stays relatively private/internal (e.g., only a handful of well-connected SF people know how good the systems are) vs. obvious/public/visible (e.g., the majority of the media-consuming American public is aware of the fact that AI research has been mostly automated) or somewhere in-between (e.g., most DC tech policy staffers know this but most non-tech people are not aware.)
I don’t feel very well informed and I haven’t thought about it that much, but in short timelines (e.g. my 25th percentile): I expect that we know what’s going on roughly within 6 months of it happening, but this isn’t salient to the broader world. So, maybe the DC tech policy staffers know that the AI people think the situation is crazy, but maybe this isn’t very salient to them. A 6 month delay could be pretty fatal even for us as things might progress very rapidly.
Note that the production function of the 10x really matters. If it’s “yeah, we get to net-10x if we have all our staff working alongside it,” it’s much more detectable than, “well, if we only let like 5 carefully-vetted staff in a SCIF know about it, we only get to 8.5x speedup”.
(It’s hard to prove that the results are from the speedup instead of just, like, “One day, Dario woke up from a dream with The Next Architecture in his head”)
AI is 90% of their (quality adjusted) useful work force (as in, as good as having your human employees run 10x faster).
I don’t grok the “% of quality adjusted work force” metric. I grok the “as good as having your human employees run 10x faster” metric but it doesn’t seem equivalent to me, so I recommend dropping the former and just using the latter.
Fair, I really just mean “as good as having your human employees run 10x faster”. I said “% of quality adjusted work force” because this was the original way this was stated when a quick poll was done, but the ultimate operationalization was in terms of 10x faster. (And this is what I was thinking.)
Basic clarifying question: does this imply under-the-hood some sort of diminishing returns curve, such that the lab pays for that labor until it net reaches as 10x faster improvement, but can’t squeeze out much more?
And do you expect that’s a roughly consistent multiplicative factor, independent of lab size? (I mean, I’m not sure lab size actually matters that much, to be fair, it seems that Anthropic keeps pace with OpenAI despite being smaller-ish)
Yeah, for it to reach exactly 10x as good, the situation would presumably be that this was the optimum point given diminishing returns to spending more on AI inference compute. (It might be the returns curve looks very punishing. For instance, many people get a relatively large amount of value from extremely cheap queries to 3.5 Sonnet on claude.ai and the inference cost of this is very small, but greatly increasing the cost (e.g. o1-pro) often isn’t any better because 3.5 Sonnet already gave an almost perfect answer.)
I don’t have a strong view about AI acceleration being a roughly constant multiplicative factor independent of the number of employees. Uplift just feels like a reasonably simple operationalization.
I’d guess that xAI, Anthropic, and GDM are more like 5-20% faster all around (with much greater acceleration on some subtasks). It seems plausible to me that the acceleration at OpenAI is already much greater than this (e.g. more like 1.5x or 2x), or will be after some adaptation due to OpenAI having substantially better internal agents than what they’ve released. (I think this due to updates from o3 and general vibes.)
I was saying 2x because I’ve memorised the results from this study. Do we have better numbers today? R&D is harder, so this is an upper bound. However, since this was from one year ago, so perhaps the factors cancel each other out?
This case seems extremely cherry picked for cases where uplift is especially high. (Note that this is in copilot’s interest.) Now, this task could probably be solved autonomously by an AI in like 10 minutes with good scaffolding.
I think you have to consider the full diverse range of tasks to get a reasonable sense or at least consider harder tasks. Like RE-bench seems much closer, but I still expect uplift on RE-bench to probably (but not certainly!) considerably overstate real world speed up.
Yeah, fair enough. I think someone should try to do a more representative experiment and we could then monitor this metric.
btw, something that bothers me a little bit with this metric is the fact that a very simple AI that just asks me periodically “Hey, do you endorse what you are doing right now? Are you time boxing? Are you following your plan?” makes me (I think) significantly more strategic and productive. Similar to I hired 5 people to sit behind me and make me productive for a month. But this is maybe off topic.
btw, something that bothers me a little bit with this metric is the fact that a very simple AI …
Yes, but I don’t see a clear reason why people (working in AI R&D) will in practice get this productivity boost (or other very low hanging things) if they don’t get around to getting the boost from hiring humans.
Thanks for this—I’m in a more peripheral part of the industry (consumer/industrial LLM usage, not directly at an AI lab), and my timelines are somewhat longer (5 years for 50% chance), but I may be using a different criterion for “automate virtually all remote workers”. It’ll be a fair bit of time (in AI frame—a year or ten) between “labs show generality sufficient to automate most remote work” and “most remote work is actually performed by AI”.
A key dynamic is that I think massive acceleration in AI is likely after the point when AIs can accelerate labor working on AI R&D. (Due to all of: the direct effects of accelerating AI software progress, this acceleration rolling out to hardware R&D and scaling up chip production, and potentially greatly increased investment.) See also here and here.
So, you might very quickly (1-2 years) go from “the AIs are great, fast, and cheap software engineers speeding up AI R&D” to “wildly superhuman AI that can achieve massive technical accomplishments”.
I think massive acceleration in AI is likely after the point when AIs can accelerate labor working on AI R&D.
Fully agreed. And the trickle-down from AI-for-AI-R&D to AI-for-tool-R&D to AI-for-managers-to-replace-workers (and -replace-middle-managers) is still likely to be a bit extended. And the path is required—just like self-driving cars: the bar for adoption isn’t “better than the median human” or even “better than the best affordable human”, but “enough better that the decision-makers can’t find a reason to delay”.
I thought it would be helpful to post about my timelines and what the timelines of people in my professional circles (Redwood, METR, etc) tend to be.
Concretely, consider the outcome of: AI 10x’ing labor for AI R&D[1], measured by internal comments by credible people at labs that AI is 90% of their (quality adjusted) useful work force (as in, as good as having your human employees run 10x faster).
Here are my predictions for this outcome:
25th percentile: 2 year (Jan 2027)
50th percentile: 5 year (Jan 2030)
The views of other people (Buck, Beth Barnes, Nate Thomas, etc) are similar.
I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).
Only including speedups due to R&D, not including mechanisms like synthetic data generation.
My timelines are now roughly similar on the object level (maybe a year slower for 25th and 1-2 years slower for 50th), and procedurally I also now defer a lot to Redwood and METR engineers. More discussion here: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines?commentId=hnrfbFCP7Hu6N6Lsp
@ryan_greenblatt can you say more about what you expect to happen from the period in-between “AI 10Xes AI R&D” and “AI takeover is very plausible?”
I’m particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during this period. Would be curious for your takes on how much of this stays relatively private/internal (e.g., only a handful of well-connected SF people know how good the systems are) vs. obvious/public/visible (e.g., the majority of the media-consuming American public is aware of the fact that AI research has been mostly automated) or somewhere in-between (e.g., most DC tech policy staffers know this but most non-tech people are not aware.)
I don’t feel very well informed and I haven’t thought about it that much, but in short timelines (e.g. my 25th percentile): I expect that we know what’s going on roughly within 6 months of it happening, but this isn’t salient to the broader world. So, maybe the DC tech policy staffers know that the AI people think the situation is crazy, but maybe this isn’t very salient to them. A 6 month delay could be pretty fatal even for us as things might progress very rapidly.
Note that the production function of the 10x really matters. If it’s “yeah, we get to net-10x if we have all our staff working alongside it,” it’s much more detectable than, “well, if we only let like 5 carefully-vetted staff in a SCIF know about it, we only get to 8.5x speedup”.
(It’s hard to prove that the results are from the speedup instead of just, like, “One day, Dario woke up from a dream with The Next Architecture in his head”)
I don’t grok the “% of quality adjusted work force” metric. I grok the “as good as having your human employees run 10x faster” metric but it doesn’t seem equivalent to me, so I recommend dropping the former and just using the latter.
Fair, I really just mean “as good as having your human employees run 10x faster”. I said “% of quality adjusted work force” because this was the original way this was stated when a quick poll was done, but the ultimate operationalization was in terms of 10x faster. (And this is what I was thinking.)
Basic clarifying question: does this imply under-the-hood some sort of diminishing returns curve, such that the lab pays for that labor until it net reaches as 10x faster improvement, but can’t squeeze out much more?
And do you expect that’s a roughly consistent multiplicative factor, independent of lab size? (I mean, I’m not sure lab size actually matters that much, to be fair, it seems that Anthropic keeps pace with OpenAI despite being smaller-ish)
Yeah, for it to reach exactly 10x as good, the situation would presumably be that this was the optimum point given diminishing returns to spending more on AI inference compute. (It might be the returns curve looks very punishing. For instance, many people get a relatively large amount of value from extremely cheap queries to 3.5 Sonnet on claude.ai and the inference cost of this is very small, but greatly increasing the cost (e.g. o1-pro) often isn’t any better because 3.5 Sonnet already gave an almost perfect answer.)
I don’t have a strong view about AI acceleration being a roughly constant multiplicative factor independent of the number of employees. Uplift just feels like a reasonably simple operationalization.
I’ve updated towards a bit longer based on some recent model releases and further contemplation.
I’d now say:
25th percentile: Oct 2027
50th percentile: Jan 2031
How much faster do you think we are already? I would say 2x.
I’d guess that xAI, Anthropic, and GDM are more like 5-20% faster all around (with much greater acceleration on some subtasks). It seems plausible to me that the acceleration at OpenAI is already much greater than this (e.g. more like 1.5x or 2x), or will be after some adaptation due to OpenAI having substantially better internal agents than what they’ve released. (I think this due to updates from o3 and general vibes.)
I was saying 2x because I’ve memorised the results from this study. Do we have better numbers today? R&D is harder, so this is an upper bound. However, since this was from one year ago, so perhaps the factors cancel each other out?
This case seems extremely cherry picked for cases where uplift is especially high. (Note that this is in copilot’s interest.) Now, this task could probably be solved autonomously by an AI in like 10 minutes with good scaffolding.
I think you have to consider the full diverse range of tasks to get a reasonable sense or at least consider harder tasks. Like RE-bench seems much closer, but I still expect uplift on RE-bench to probably (but not certainly!) considerably overstate real world speed up.
Yeah, fair enough. I think someone should try to do a more representative experiment and we could then monitor this metric.
btw, something that bothers me a little bit with this metric is the fact that a very simple AI that just asks me periodically “Hey, do you endorse what you are doing right now? Are you time boxing? Are you following your plan?” makes me (I think) significantly more strategic and productive. Similar to I hired 5 people to sit behind me and make me productive for a month. But this is maybe off topic.
Yes, but I don’t see a clear reason why people (working in AI R&D) will in practice get this productivity boost (or other very low hanging things) if they don’t get around to getting the boost from hiring humans.
This is intended to compare to 2023/AI-unassisted humans, correct? Or is there some other way of making this comparison you have in mind?
Yes, “Relative to only having access to AI systems publicly available in January 2023.”
More generally, I define everything more precisely in the post linked in my comment on “AI 10x’ing labor for AI R&D”.
Thanks for this—I’m in a more peripheral part of the industry (consumer/industrial LLM usage, not directly at an AI lab), and my timelines are somewhat longer (5 years for 50% chance), but I may be using a different criterion for “automate virtually all remote workers”. It’ll be a fair bit of time (in AI frame—a year or ten) between “labs show generality sufficient to automate most remote work” and “most remote work is actually performed by AI”.
A key dynamic is that I think massive acceleration in AI is likely after the point when AIs can accelerate labor working on AI R&D. (Due to all of: the direct effects of accelerating AI software progress, this acceleration rolling out to hardware R&D and scaling up chip production, and potentially greatly increased investment.) See also here and here.
So, you might very quickly (1-2 years) go from “the AIs are great, fast, and cheap software engineers speeding up AI R&D” to “wildly superhuman AI that can achieve massive technical accomplishments”.
Fully agreed. And the trickle-down from AI-for-AI-R&D to AI-for-tool-R&D to AI-for-managers-to-replace-workers (and -replace-middle-managers) is still likely to be a bit extended. And the path is required—just like self-driving cars: the bar for adoption isn’t “better than the median human” or even “better than the best affordable human”, but “enough better that the decision-makers can’t find a reason to delay”.