Thanks for writing this up, glad to see the engagement! I’ve only skimmed and have not run this by any other AI 2027 authors, but a few thoughts on particular sections:
My predictions for AI by the end of 2027
I agree with most but not all of these in the median case, AI 2027 was roughly my 80th percentile aggressiveness prediction at the time.
Edited to add, I feel like I should list the ones that I have <50% on explicitly:
AI still can’t tell novel funny jokes, write clever prose, generate great business ideas, invent new in-demand products, or generate important scientific breakthroughs, except by accident.
I disagree re: novel funny jokes, seems plausible that this bar has already been passed. I agree with the rest except maybe clever prose, depending on the operationalization.
LLMs are broadly acknowledged to be plateauing, and there is a broader discussion about what kind of AI will have to replace it.
Disagree but not super confident.
Most breakthroughs in AI are not a result of directly increasing the general intelligence/”IQ” of the model, e.g. advances in memory, reasoning or agency. AI can stay on task much longer than before without supervision, especially for well-specified, simple tasks. Especially since AI coding platforms will have gotten better at tool use and allowing AI to manually test the thing they’re working on. By the end of 2027, AI can beat a wide variety of video games is hasn’t played before.
I disagree with the first clause, but I’m not sure what you mean because advances in reasoning and agency seem to me like examples of increases in general intelligence. Especially staying on task for longer without supervision. Are you saying that these reasoning and agency advances will mostly come from scaffolding rather than the underlying model getting smarter? That I disagree with.
There is more public discussion on e.g. Hacker News about AI code rot and the downsides of using AI. People have been burned by relying too much on AI. But I think non-coders running businesses will still by hyped about AI in 2027.
Disagree on the first two sentences.
AI still can’t drive a damned car well enough that if I bought a car I wouldn’t have to.
I don’t follow self-driving stuff much, but this might depend on location? Seems like good self-driving cars are getting rolled out in limited areas at the moment.
As you touch on later in your post, it’s plausible that we made a mistake by focusing on 2027 in particular:
But I do worry about what happens in 2028, when everyone realizes none of the doomsday stuff predicted in 2025 actually came true, or even came close. Then the AI alignment project as a whole may risk being taken as seriously as the 2012 apocalypse theory was in 2013. The last thing you want is to be seen as crackpots.
I think this is a very reasonable concern and we probably should have done better in our initial release making our uncertainty about timelines clear (and/or taking the time to rewrite and push back to a later time frame, e.g. once Daniel’s median changed to 2028). We are hoping to do better on this in future releases, including via just having scenarios be further out, and perhaps better communicating our timelines distributions.
Also:
Listening to several of the authors discuss the AI 2027 predictions after they were published leads me to believe they don’t intuitively believe their own estimates.
What do you mean by this? My guess is that it’s related to the communciation issues on timelines?
The Takeoff Forecast is Based on Guesswork
Agree.
The Presentation was Misleading
Nothing wrong with guesswork, of course, if it’s all you’ve got! But I would have felt a lot better if the front page of the document had said “AI 2027: Our best guess about what AI progress might look like, formulated by using math to combine our arbitrary intuitions about what might happen.”
But instead it claims to be based on “trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes”, and links to 193 pages of data/theory/evidence.
They never outright stated it wasn’t based on vibes, of course, and if you dig into the document, that’s what you find out.
I very much understand this take and understand where you’re coming from because it’s a complaint I’ve had regarding some previous timelines/takeoff forecasts.
Probably some of our disagreement is very tied-in to the object-level disagreements about the usefulness of doing this sort of forecasting; I personally think that although the timelines and takeoff forecasts clearly involved a ton of guesswork, they are still some of the best forecasts out there, and we need to base our timelines and takeoff forecasts on something in the absence of good data.
But still, since we both agree that the forecasts rely on lots of guesswork, even if we disagree on their usefulness, we might be able to have some common ground when discussing whether the presentation was misleading in this respect. I’ll share a few thoughts from my perspective below:
I think it’s a very tricky problem to communicate that we think that AI 2027 and its associated background research is some of the best stuff out there, but is still relying on tons of guesswork because there’s simply not enough empirical data to forecast when AGI will arrive, how fast takeoff will be, and what effects it will have precisely. It’s very plausible that we messed up in some ways, including in the direction that you posit.
Keep in mind that we have to optimize for a bunch of different audiences, I’d guess that for each direction (i.e. taking the forecast too seriously, vs. not seriously enough) many people came away with conclusions too far in that direction, from my perspective. This also means that some others have advertised our work in a way that seems overselling to me, though others have IMO undersold it.
As you say, we tried to take care to not overclaim regarding the forecast, in terms of the level of vibes it was based on. We also explicitly disclaimed our uncertainty in several places, e.g. in the expandables “Why our uncertainty increases substantially beyond 2026” and “Our uncertainty continues to increase.” as well as “Why is it valuable?” right below the foreword.
Should we have had something stronger in the foreword or otherwise more prominent on the frontpage? Yeah, perhaps, we iterated on the language a bunch to try to make it convey all of (a) that we put quite a lot of work into it, (b) that we think it’s state-of-the-art or close on most dimensions and represents subtantial intellectual progress, but also (c) giving the right impression about our uncertainty level and (d) not overclaiming regarding the methodology. But we might have messed up these tradeoffs.
You proposed “AI 2027: Our best guess about what AI progress might look like, formulated by using math to combine our arbitrary intuitions about what might happen.” This seems pretty reasonable to me except as you might guess I take issue with the connotation of arbitary. In particular, I think there’s reason to trust our intuitions regarding guesswork given that we’ve put more thinking time into this sort of thing than all but a few people in the world, our guesswork was also sometimes informed by surveys (which were still very non-robust, to be clear, but I think improving upon previous work in terms of connecting surveys to takeoff estimates), and we have a track record to at least some extent. So I agree with arbitrary in some sense in that we can’t ground out our intuitions into solid data, but my guess is that it gives the wrong connotation in terms of to what weight the guesswork should be given relative to other forms of evidence,
I’d also not emphasize math if we’re discussing the scenario as opposed to timelines or takeoff speeds in particular.
My best guess is for the timelines and takeoff forecast, we should have had a stronger disclaimer or otherwise made more clear in the summary that they are based on lots of guesswork. I also agree that the summaries at the top had pretty substantial room for improvement.
I’m curious what you would think of something like this disclaimer in the timelines forecast summary (and a corresponding one in takeoff): Disclaimer: This forecast relies substantially on intuitive judgment, and involves high levels of uncertainty. Unfortunately, we believe that incorporating intuitive judgment is necessary to forecast timelines to highly advanced AIs, since there simply isn’t enough evidence to extrapolate conclusively.
I’ve been considering adding something like this but haven’t quite gotten to it due to various reasons, but potentially I should prioritize it more highly.
We’re also working on updates to these models and will aim to do better at communicating in the future! And will take into account suggestions.
I think this might have happened because to us it’s clear to us that we can’t make these sorts of forecasts without tons of guesswork, and we didn’t have much slack in terms of the time spent thinking about how these supplements would read to others; I perhaps made a similar mistake to one that I have previously criticized others for.
(I had edited to add this paragraph in, but I’m going to actually strike it out for now because I’m not sure I’m doing a good job accurately representing what happened and it seems important to do so precisely, but I’ll still leave it up because I don’t want to feel like I’m censoring something that I already had in a version of the comment.)Potentially important context is that our median expectation is that AI 2027 would do much worse than it did, so we were mostly spending time trying to increase the expected readership (while of course following other constraints like properly disclaiming uncertainty). I think we potentially should have spent a larger fraction of our time thinking “if this got a ton of readership then what would happen” and to be clear we did spend time thinking about this, but I think it might be important context to note that we did not expect AI 2027 to get so many readers so a lot of our headspace was around increasing readership.
Linking to some other comments I’ve written that are relevant to this: here, here
Thank you for taking the time to write such a detailed response.
My main critique of AI 2027 is not about communication, but the estimates themselves (2027 is an insane median estimate for AI doom) and that I feel you’re overconfident about the quality/reliability of the forecasts. (And I am glad that you and Daniel have both backed off a bit from the original 2027 estimate.)
What do you mean by this? My guess is that it’s related to the communication issues on timelines?
Probably this is related to communication issues on timelines, yes. Also, I think if I genuinely believed everyone I knew and loved was going to die in ~2 years, I would probably be acting a certain way that I don’t sense from the authors of the AI 2027 document. But I don’t want to get too much into mind reading.
With respect to the communication issue, I think the AI 2027 document did include enough disclaimers about the authors’ uncertainty, and more disclaimers wouldn’t help. I think the problem is that the document structurally contradicts those disclaimers, by seeming really academic and precise. Adding disclaimers to the research sections would also not be valuable simply because most people won’t get that far.
Including a written scenario is something I can understand why you chose to do, but it also seems like a mistake for the reasons I mentioned in my post. It makes you sound way more confident than we both agree you actually are. And a specific scenario is also more likely to be wrong than a general forecast.
You have said things like:
“One reason I’m hesitant to add [disclaimers] is that I think it might update non-rationalists too much toward thinking it’s useless, when in fact I think it’s pretty informative.”
“The graphs are the result of an actual model that I think is reasonable to give substantial weight to in one’s timelines estimates.”
“In our initial tweet, Daniel said it was a ‘deeply researched’ scenario forecast. This still seems accurate to me.”
“we put quite a lot of work into it”
“it’s state-of-the-art or close on most dimensions and represents subtantial intellectual progress”
“In particular, I think there’s reason to trust our intuitions”
As I said in my post, “The whole AI 2027 document just seems so fancy and robust. That’s what I don’t like. It gives a much more robust appearance than this blog post, does it not? But is it any better? I claim no.”
I don’t think your guesses are better than mine because of the number of man hours your put into justifying them, nor because the people who worked on the estimates are important, well-regarded people who worked at OpenAI or have a better track record, nor because the estimates involved surveys, wargames, and mathematics.
I do not believe your guesses are particularly informative, nor do I think that about my own guesses. We’re all just guessing. Nor do I agree with calling them forecasts at all. I don’t think they’re reliable enough that anybody should be trusting them over their own intuition. In the end, neither of us can prove what we believe to a high degree of confidence. The only thing that will matter is who’s right, and none of the accoutrements of fancy statistics, hours spent researching, past forecasting successes, and so on will matter.
Putting too much work into what are essentially guesses is also in itself a kind of communication that this is Serious Academic Work—a kind of evidence or proof that people should take very seriously. Which it can’t be, since you and I agree that “there’s simply not enough empirical data to forecast when AGI will arrive”. If that’s true, then why all the forecasting?
(All my criticism is about the Timelines/Takeoff Forecasting, since these are things you can’t really forecast at this time. I am glad the Compute Forecast exists, and I didn’t read the AI Goals and Security Forecasts)
Okay, it sounds like our disagrement basically boils down to the value of the forecasts as well as the value of the scenario format (does that seem right?), which I don’t think is something we’ll come to agreement on.
Thanks again for writing this up! I hope you’re right about timelines being much longer and 2027 being insane (as I mentioned, it’s faster than my median has ever been, but I think it’s plausible enough to take seriously).
edit: I’d also be curious for you to specify what you mean by academic? The scenario itself seems like a very unusual format for academia. I think it would have seemed more serious academic-y if we had ditched the scenario format.
Perhaps we will find some agreement come Christmastime 2027. Until then, thanks for your time!
edit: Responding to your edit, by seeming academic, I meant things like seeming “detailed and evidence-based”, “involving citations and footnotes”, “involving robust statistics”, “resulting in high-confidence conclusions”, and stuff like that. Even the typography and multiple authors makes it seem Very Serious. I agree that the scenario part seemed less academic that the research pages.
Thanks for writing this up, glad to see the engagement! I’ve only skimmed and have not run this by any other AI 2027 authors, but a few thoughts on particular sections:
I agree with most but not all of these in the median case, AI 2027 was roughly my 80th percentile aggressiveness prediction at the time.
Edited to add, I feel like I should list the ones that I have <50% on explicitly:
I disagree re: novel funny jokes, seems plausible that this bar has already been passed. I agree with the rest except maybe clever prose, depending on the operationalization.
Disagree but not super confident.
I disagree with the first clause, but I’m not sure what you mean because advances in reasoning and agency seem to me like examples of increases in general intelligence. Especially staying on task for longer without supervision. Are you saying that these reasoning and agency advances will mostly come from scaffolding rather than the underlying model getting smarter? That I disagree with.
Disagree on the first two sentences.
I don’t follow self-driving stuff much, but this might depend on location? Seems like good self-driving cars are getting rolled out in limited areas at the moment.
As you touch on later in your post, it’s plausible that we made a mistake by focusing on 2027 in particular:
I think this is a very reasonable concern and we probably should have done better in our initial release making our uncertainty about timelines clear (and/or taking the time to rewrite and push back to a later time frame, e.g. once Daniel’s median changed to 2028). We are hoping to do better on this in future releases, including via just having scenarios be further out, and perhaps better communicating our timelines distributions.
Also:
What do you mean by this? My guess is that it’s related to the communciation issues on timelines?
Agree.
I very much understand this take and understand where you’re coming from because it’s a complaint I’ve had regarding some previous timelines/takeoff forecasts.
Probably some of our disagreement is very tied-in to the object-level disagreements about the usefulness of doing this sort of forecasting; I personally think that although the timelines and takeoff forecasts clearly involved a ton of guesswork, they are still some of the best forecasts out there, and we need to base our timelines and takeoff forecasts on something in the absence of good data.
But still, since we both agree that the forecasts rely on lots of guesswork, even if we disagree on their usefulness, we might be able to have some common ground when discussing whether the presentation was misleading in this respect. I’ll share a few thoughts from my perspective below:
I think it’s a very tricky problem to communicate that we think that AI 2027 and its associated background research is some of the best stuff out there, but is still relying on tons of guesswork because there’s simply not enough empirical data to forecast when AGI will arrive, how fast takeoff will be, and what effects it will have precisely. It’s very plausible that we messed up in some ways, including in the direction that you posit.
Keep in mind that we have to optimize for a bunch of different audiences, I’d guess that for each direction (i.e. taking the forecast too seriously, vs. not seriously enough) many people came away with conclusions too far in that direction, from my perspective. This also means that some others have advertised our work in a way that seems overselling to me, though others have IMO undersold it.
As you say, we tried to take care to not overclaim regarding the forecast, in terms of the level of vibes it was based on. We also explicitly disclaimed our uncertainty in several places, e.g. in the expandables “Why our uncertainty increases substantially beyond 2026” and “Our uncertainty continues to increase.” as well as “Why is it valuable?” right below the foreword.
Should we have had something stronger in the foreword or otherwise more prominent on the frontpage? Yeah, perhaps, we iterated on the language a bunch to try to make it convey all of (a) that we put quite a lot of work into it, (b) that we think it’s state-of-the-art or close on most dimensions and represents subtantial intellectual progress, but also (c) giving the right impression about our uncertainty level and (d) not overclaiming regarding the methodology. But we might have messed up these tradeoffs.
You proposed “AI 2027: Our best guess about what AI progress might look like, formulated by using math to combine our arbitrary intuitions about what might happen.” This seems pretty reasonable to me except as you might guess I take issue with the connotation of arbitary. In particular, I think there’s reason to trust our intuitions regarding guesswork given that we’ve put more thinking time into this sort of thing than all but a few people in the world, our guesswork was also sometimes informed by surveys (which were still very non-robust, to be clear, but I think improving upon previous work in terms of connecting surveys to takeoff estimates), and we have a track record to at least some extent. So I agree with arbitrary in some sense in that we can’t ground out our intuitions into solid data, but my guess is that it gives the wrong connotation in terms of to what weight the guesswork should be given relative to other forms of evidence,
I’d also not emphasize math if we’re discussing the scenario as opposed to timelines or takeoff speeds in particular.
My best guess is for the timelines and takeoff forecast, we should have had a stronger disclaimer or otherwise made more clear in the summary that they are based on lots of guesswork. I also agree that the summaries at the top had pretty substantial room for improvement.
I’m curious what you would think of something like this disclaimer in the timelines forecast summary (and a corresponding one in takeoff): Disclaimer: This forecast relies substantially on intuitive judgment, and involves high levels of uncertainty. Unfortunately, we believe that incorporating intuitive judgment is necessary to forecast timelines to highly advanced AIs, since there simply isn’t enough evidence to extrapolate conclusively.
I’ve been considering adding something like this but haven’t quite gotten to it due to various reasons, but potentially I should prioritize it more highly.
We’re also working on updates to these models and will aim to do better at communicating in the future! And will take into account suggestions.
I think this might have happened because to us it’s clear to us that we can’t make these sorts of forecasts without tons of guesswork, and we didn’t have much slack in terms of the time spent thinking about how these supplements would read to others; I perhaps made a similar mistake to one that I have previously criticized others for.
(I had edited to add this paragraph in, but I’m going to actually strike it out for now because I’m not sure I’m doing a good job accurately representing what happened and it seems important to do so precisely, but I’ll still leave it up because I don’t want to feel like I’m censoring something that I already had in a version of the comment.)
Potentially important context is that our median expectation is that AI 2027 would do much worse than it did, so we were mostly spending time trying to increase the expected readership (while of course following other constraints like properly disclaiming uncertainty). I think we potentially should have spent a larger fraction of our time thinking “if this got a ton of readership then what would happen” and to be clear we did spend time thinking about this, but I think it might be important context to note that we did not expect AI 2027 to get so many readers so a lot of our headspace was around increasing readership.Linking to some other comments I’ve written that are relevant to this: here, here
Thank you for taking the time to write such a detailed response.
My main critique of AI 2027 is not about communication, but the estimates themselves (2027 is an insane median estimate for AI doom) and that I feel you’re overconfident about the quality/reliability of the forecasts. (And I am glad that you and Daniel have both backed off a bit from the original 2027 estimate.)
Probably this is related to communication issues on timelines, yes. Also, I think if I genuinely believed everyone I knew and loved was going to die in ~2 years, I would probably be acting a certain way that I don’t sense from the authors of the AI 2027 document. But I don’t want to get too much into mind reading.
With respect to the communication issue, I think the AI 2027 document did include enough disclaimers about the authors’ uncertainty, and more disclaimers wouldn’t help. I think the problem is that the document structurally contradicts those disclaimers, by seeming really academic and precise. Adding disclaimers to the research sections would also not be valuable simply because most people won’t get that far.
Including a written scenario is something I can understand why you chose to do, but it also seems like a mistake for the reasons I mentioned in my post. It makes you sound way more confident than we both agree you actually are. And a specific scenario is also more likely to be wrong than a general forecast.
You have said things like:
“One reason I’m hesitant to add [disclaimers] is that I think it might update non-rationalists too much toward thinking it’s useless, when in fact I think it’s pretty informative.”
“The graphs are the result of an actual model that I think is reasonable to give substantial weight to in one’s timelines estimates.”
“In our initial tweet, Daniel said it was a ‘deeply researched’ scenario forecast. This still seems accurate to me.”
“we put quite a lot of work into it”
“it’s state-of-the-art or close on most dimensions and represents subtantial intellectual progress”
“In particular, I think there’s reason to trust our intuitions”
As I said in my post, “The whole AI 2027 document just seems so fancy and robust. That’s what I don’t like. It gives a much more robust appearance than this blog post, does it not? But is it any better? I claim no.”
I don’t think your guesses are better than mine because of the number of man hours your put into justifying them, nor because the people who worked on the estimates are important, well-regarded people who worked at OpenAI or have a better track record, nor because the estimates involved surveys, wargames, and mathematics.
I do not believe your guesses are particularly informative, nor do I think that about my own guesses. We’re all just guessing. Nor do I agree with calling them forecasts at all. I don’t think they’re reliable enough that anybody should be trusting them over their own intuition. In the end, neither of us can prove what we believe to a high degree of confidence. The only thing that will matter is who’s right, and none of the accoutrements of fancy statistics, hours spent researching, past forecasting successes, and so on will matter.
Putting too much work into what are essentially guesses is also in itself a kind of communication that this is Serious Academic Work—a kind of evidence or proof that people should take very seriously. Which it can’t be, since you and I agree that “there’s simply not enough empirical data to forecast when AGI will arrive”. If that’s true, then why all the forecasting?
(All my criticism is about the Timelines/Takeoff Forecasting, since these are things you can’t really forecast at this time. I am glad the Compute Forecast exists, and I didn’t read the AI Goals and Security Forecasts)
Okay, it sounds like our disagrement basically boils down to the value of the forecasts as well as the value of the scenario format (does that seem right?), which I don’t think is something we’ll come to agreement on.
Thanks again for writing this up! I hope you’re right about timelines being much longer and 2027 being insane (as I mentioned, it’s faster than my median has ever been, but I think it’s plausible enough to take seriously).
edit: I’d also be curious for you to specify what you mean by academic? The scenario itself seems like a very unusual format for academia. I think it would have seemed more serious academic-y if we had ditched the scenario format.
Perhaps we will find some agreement come Christmastime 2027. Until then, thanks for your time!
edit: Responding to your edit, by seeming academic, I meant things like seeming “detailed and evidence-based”, “involving citations and footnotes”, “involving robust statistics”, “resulting in high-confidence conclusions”, and stuff like that. Even the typography and multiple authors makes it seem Very Serious. I agree that the scenario part seemed less academic that the research pages.