I would strongly disagree with the notion that FOOM is “a key plank” in the story for why AI is dangerous. Indeed, one of the most useful things that I, personally, got from the book, was seeing how it is *not* load bearing for the core arguments.
I think the primary reason why the foom hypothesis seems load-bearing for AI doom is that without a rapid AI and local takeoff, we won’t simply get “only one chance to correctly align the first AGI [ETA: or the first ASI]”.
If foom occurs, there will be a point where a company develops an AGI that quickly transitions from being just an experimental project to something capable of taking over the entire world. This presents a clear case for caution: if the AI project you’re working on will undergo explosive recursive self-improvement, then any alignment mistakes you build into it will become locked in forever. You cannot fix them after deployment because the AI will already have become too powerful to stop or modify.
However, without foom, we are more likely to see a gradual and diffuse transition from human control over the world to AI control over the world, without any single AI system playing a critical role in the transition by itself. The fact that the transition is not sudden is crucial because it means that no single AI release needs to be perfectly aligned before deployment. We can release imperfect systems, observe their failures, and fix problems in subsequent versions. Our experience with LLMs demonstrates this pattern, where we could fix errors after deployment, making sure future model releases don’t have the same problems (as illustrated by Sydney Bing, among other examples).
A gradual takeoff allows for iterative improvement through trial and error, and that’s simply really important. Without foom, there is no single critical moment where we must achieve near-perfect alignment without any opportunity to learn from real-world deployment. There won’t be a single, important moment where we abruptly transition from working on “aligning systems incapable of taking over the world” to “aligning systems capable of taking over the world”. Instead, systems will simply gradually and continuously get more powerful, with no bright lines.
Without foom, we can learn from experience and course-correct in response to real-world observations. My view is that this fundamental process of iteration, experimentation, and course correction in response to observed failures makes the problem of AI risk dramatically more tractable than it would be if foom were likely.
we won’t simply get “only one chance to correctly align the first AGI”
We only get one chance for a “sufficiently critical try” which means an AI of the level of power where you lose control over the world if you failed to align it. I expect there are no claims to the effect that there will be only one chance to correctly align the first AGI.
A counterargument from no-FOOM should probably claim that there will never be such a “sufficiently critical try” at all because at every step of the way it would be possible to contain a failure of alignment at that step and try again and again until you succeed, as normal science and engineering always do.
I expect there are no claims to the effect that there will be only one chance to correctly align the first AGI.
For the purpose of my argument, there is no essential distinction between ‘the first AGI’ and ‘the first ASI’. My main point is to dispute the idea that there will be a special ‘it’ at all, which we need to align on our first and only try. I am rejecting the scenario where a single AI system suddenly takes over the world. Instead, I expect AI systems will continuously and gradually assume more control over the world over time. In my view, there will not be one decisive system, but rather a continuous process of AIs assuming greater control over time.
To understand the distinction I am making, consider the analogy of genetically engineering humans. By assumption, if the tech continues improving, there will eventually be a point where genetically engineered humans will be superhuman in all relevant respects compared to ordinary biological humans. They will be smarter, stronger, healthier, and more capable in every measurable way. Nonetheless, there is no special point at which we develop ‘the superhuman’. There is no singular ‘it’ to build, which then proceeds to take over the world in one swift action. Instead, genetically engineered humans would simply progressively get smarter, more capable, and more powerful across time as the technology improves. At each stage of technological innovation, these enhanced humans would gradually take over more responsibilities, command greater power in corporations and governments, and accumulate a greater share of global wealth. The transition would be continuous rather than discontinuous.
Yes, at some point such enhanced humans will possess the raw capability to take control over the world through force. They could theoretically coordinate to launch a sudden coup against existing institutions and seize power all at once. But the default scenario seems more likely: a continuous transition from ordinary human control over the world to superhuman genetically engineered control over the world. They would gradually occupy positions of power through normal economic and political processes rather than through sudden conquest.
For the purpose of my argument, there is no essential distinction between ‘the first AGI’ and ‘the first ASI’.
For the purpose of my response there is no essential distinction there either, except perhaps the book might be implicitly making use of the claim that building an ASI is certainly a “sufficiently critical try” (if something weaker isn’t already a “sufficiently critical try”), which makes the argument more confusing if left implicit, and poorly structured if used at all within that argument rather than outside of it.
The argument is still not that there is only one chance to align an ASI (this is a conclusion, not the argument for that conclusion). The argument is that there is only one chance to align the thing that constitutes a “sufficiently critical try”. A “sufficiently cricial try” is conceptually distict from “ASI”. The premise of the argument isn’t about a level of capability alone, but rather about lack of control over that level of capability.
One counterargument is to reject the premise and claim that even ASI won’t constitute a “sufficiently critical try” in this sense, that is even ASI won’t successfully take control over the world if misaligned. Probably because by the time it’s built there are enough checks and balances that it can’t (at least individually) take over the world if misaligned. And indeed this seems to be in line with the counterargument you are making. You don’t expect there will be lack of control, even as we reach ever higher levels of capability.
Nonetheless, there is no special point at which we develop ‘the superhuman’. There is no singular ‘it’ to build, which then proceeds to take over the world in one swift action.
Thus there is no “sufficiently critical try” here. But if there were, it would be a problem, because we would have to get it right the first time then. Since in your view there won’t be a “sufficiently critical try” at all, you reject the premise, which is fair enough.
Another counterargument would be to say that if we ever reach a “sufficiently critical try” (uncontainable lack of control over that level of capability if misaligned), by that time getting it right the first time won’t be as preposterous anymore as it is for the current humanity. Probably because with earlier AIs there will be a lot of more effective cognitive labor and institutions around to make it work.
I think this is missing the point of the date of AI Takeover is not the day the AI takes over, that the point of no return might appear much earlier than when Skynet decides to launch the nukes. Like, I think the default outcome in a gradualist world is ‘Moloch wins’, and there’s no fire alarm that allows for derailment once it’s clear that things are not headed in the right direction.
For example, I don’t think it was the case 5 years ago that a lot of stock value was downstream of AI investment, but this is used elsewhere on this very page as an argument against bans on AI development now. Is that consideration going to be better or worse, in five years? I don’t think it was obvious five years ago that OpenAI was going to split over disagreements on alignment—but now it has, and I don’t see the global ‘trial and error’ system repairing that wound rather than just rolling with it.
I think the current situation looks bad and just letting it develop without intervention will mean things get worse faster than things get better.
I think the primary reason why the foom hypothesis seems load-bearing for AI doom is that without a rapid AI and local takeoff, we won’t simply get “only one chance to correctly align the first AGI”.
As the review makes very clear, the argument isn’t about AGI, it’s about ASI. And yes, they argue that you would in fact only get one chance to align the system that takes over. As the review discusses at length:
I do think we benefit from having a long, slow period of adaptation and exposure to not-yet-extremely-dangerous AI. As long as we aren’t lulled into a false sense of security, it seems very plausible that insights from studying these systems will help improve our skill at alignment. I think ideally this would mean going extremely slowly and carefully, but various readers may be less cautious/paranoid/afraid than me, and think that it’s worth some risk of killing every child on Earth (and everyone else) to get progress faster or to avoid the costs of getting everyone to go slow. But regardless of how fast things proceed, I think it’s clearly good to study what we have access to (as long as that studying doesn’t also make things faster or make people falsely confident).
But none of this involves having “more than one shot at the goal” and it definitely doesn’t imply the goal will be easy to hit. It means we’ll have some opportunity to learn from failures on related goals that are likely easier.
The “It” in “If Anyone Builds It” is a misaligned superintelligence capable of taking over the world. If you miss the goal and accidentally build “it” instead of an aligned superintelligence, it will take over the world. If you build a weaker AGI that tries to take over the world and fails, that might give you some useful information, but it does not mean that you now have real experience working with AIs that are strong enough to take over the world.
As the review makes very clear, the argument isn’t about AGI, it’s about ASI. And yes, they argue that you would in fact only get one chance to align the system that takes over.
I’m aware; I was expressing my disagreement with their argument. My comment was not premised on whether we were talking about “the first AGI” or “the first ASI”. I was making a more fundamental point.
In particular: I am precisely disputing the idea that there will be “only one chance to align the system that takes over”. In my view, the future course of AI development will not be well described as having a single “system that takes over”. Instead, I anticipate waves of AI deployment that gradually, and continuously assume more control.
I fundamentally dispute the entire framing of thinking about “the system” that we need to align on our “first try”. I think AI development is an ongoing process in which we can course correct. I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
I seems like you’re arguing against something different than the point you brought up. You’re saying that slow growth on multiple systems means we can get one of them right, by course correcting. But that’s a really different argument—and unless there’s effectively no alignment tax, it seems wrong. That is, the systems that are aligned would need to outcompete the others after they are smarter than each individual human, and beyond our ability to meaningfully correct. (Or we’d need to have enough oversight to notice much earlier—which is not going to happen.)
You’re saying that slow growth on multiple systems means we can get one of them right, by course correcting.
That’s not what I’m saying. My argument was not about multiple simultaneously existing systems growing slowly together. It was instead about how I dispute the idea of a unique or special point in time when we build “it” (i.e., the AI system that takes over the world), the value of course correction, and the role of continuous iteration.
I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
You can argue against FOOM, but the case for a significant overhang seems almost certain to me. I think we are close enough to building ASI to know how it will play out. I believe that transformers/LLM will not scale to ASI, but the neocortex algorithm/architecture if copied from biology almost certainly would if implemented in a massive data center.
For a scenario, lets say we get the 1 million GPU data center built, it runs LLM training, but doesn’t scale to ASI, then progress slows for 1+ years. In 2-5 years time, someone figures out the neocortex algorithm as a sudden insight, then deploys it at scale. Then you must get a sudden jump in capabilities. (There is also another potential jump where the 1GW datacenter ASI searches and finds an even better architecture if it exists.)
How could this happen more continuously? Lets say we find arch’s less effective than the neocortex, but sufficient to get that 1GW datacenter >AGI to IQ 200. That’s something we can understand and likely adapt to. However that AI will then likely crack the neocortex code and quite quickly advance to something a lot higher in a discontinuous jump that could plausibly happen in 24 hours, or even if it takes weeks still give no meaningful intermediate steps.
I am not saying that this gives >50% P(doom) but I am saying it is a specific uniquely dangerous point that we know will happen and should plan for. The “Let the mild ASI/strong AGI push the self optimize button” is that point.
It takes subjective time to scale new algorithms, or to match available hardware. Current models still seem to be smaller than the amount of training compute and the latest inference hardware could in principle support (GPT-4.5 might the the closest to this, possibly Gemini 3.0 will catch up). It’s taking RLVR two years to catch up to pretraining scale, if we count time from the strawberry rumors, and only Google plausibly had the opportunity to do RLVR on larger models without massive utilization penalties of small scale-up worlds of Nvidia’s older 8-chip servers.
When there are AGIs, such things will be happening faster, but also the AGIs will have more subjective time, progress in AI capabilities will seem much slower to them than to us. Letting AGIs push the self-optimize button in the future is not qualitatively different from letting humans push the build-AGI button currently. The process happens faster in physical time, but not necessarily in their own subjective time. Also, the underlying raw compute is much slower in their subjective time.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught (it’s not in the interest of AGIs to create an ASI misaligned with the AGIs). Forcing them not to notice this and keep slamming the self-optimize button as fast as possible might be difficult in the same way that aligning them is difficult.
I was talking about subjective time for us, rather than the AGI. In many situations I had in mind, there isn’t meaningful subjective time for the AI/AI’s as they may be built, torn down and rearranged or have memory wiped. There is a range of continuity/self for AI. At one end is a collection of tool AI agents, in the middle a goal directed agent and the other end a full self that protects is continuous identity in the same way we do.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught
I don’t expect they will be in control or have a coherent self enough to make these decisions. Its easy for me to imagine an AI agent that is built to optimize AI architectures (doesn’t even have to know its doing its own arch)
I think the primary reason why the foom hypothesis seems load-bearing for AI doom is that without a rapid AI and local takeoff, we won’t simply get “only one chance to correctly align the first AGI [ETA: or the first ASI]”.
If foom occurs, there will be a point where a company develops an AGI that quickly transitions from being just an experimental project to something capable of taking over the entire world. This presents a clear case for caution: if the AI project you’re working on will undergo explosive recursive self-improvement, then any alignment mistakes you build into it will become locked in forever. You cannot fix them after deployment because the AI will already have become too powerful to stop or modify.
However, without foom, we are more likely to see a gradual and diffuse transition from human control over the world to AI control over the world, without any single AI system playing a critical role in the transition by itself. The fact that the transition is not sudden is crucial because it means that no single AI release needs to be perfectly aligned before deployment. We can release imperfect systems, observe their failures, and fix problems in subsequent versions. Our experience with LLMs demonstrates this pattern, where we could fix errors after deployment, making sure future model releases don’t have the same problems (as illustrated by Sydney Bing, among other examples).
A gradual takeoff allows for iterative improvement through trial and error, and that’s simply really important. Without foom, there is no single critical moment where we must achieve near-perfect alignment without any opportunity to learn from real-world deployment. There won’t be a single, important moment where we abruptly transition from working on “aligning systems incapable of taking over the world” to “aligning systems capable of taking over the world”. Instead, systems will simply gradually and continuously get more powerful, with no bright lines.
Without foom, we can learn from experience and course-correct in response to real-world observations. My view is that this fundamental process of iteration, experimentation, and course correction in response to observed failures makes the problem of AI risk dramatically more tractable than it would be if foom were likely.
We only get one chance for a “sufficiently critical try” which means an AI of the level of power where you lose control over the world if you failed to align it. I expect there are no claims to the effect that there will be only one chance to correctly align the first AGI.
A counterargument from no-FOOM should probably claim that there will never be such a “sufficiently critical try” at all because at every step of the way it would be possible to contain a failure of alignment at that step and try again and again until you succeed, as normal science and engineering always do.
For the purpose of my argument, there is no essential distinction between ‘the first AGI’ and ‘the first ASI’. My main point is to dispute the idea that there will be a special ‘it’ at all, which we need to align on our first and only try. I am rejecting the scenario where a single AI system suddenly takes over the world. Instead, I expect AI systems will continuously and gradually assume more control over the world over time. In my view, there will not be one decisive system, but rather a continuous process of AIs assuming greater control over time.
To understand the distinction I am making, consider the analogy of genetically engineering humans. By assumption, if the tech continues improving, there will eventually be a point where genetically engineered humans will be superhuman in all relevant respects compared to ordinary biological humans. They will be smarter, stronger, healthier, and more capable in every measurable way. Nonetheless, there is no special point at which we develop ‘the superhuman’. There is no singular ‘it’ to build, which then proceeds to take over the world in one swift action. Instead, genetically engineered humans would simply progressively get smarter, more capable, and more powerful across time as the technology improves. At each stage of technological innovation, these enhanced humans would gradually take over more responsibilities, command greater power in corporations and governments, and accumulate a greater share of global wealth. The transition would be continuous rather than discontinuous.
Yes, at some point such enhanced humans will possess the raw capability to take control over the world through force. They could theoretically coordinate to launch a sudden coup against existing institutions and seize power all at once. But the default scenario seems more likely: a continuous transition from ordinary human control over the world to superhuman genetically engineered control over the world. They would gradually occupy positions of power through normal economic and political processes rather than through sudden conquest.
For the purpose of my response there is no essential distinction there either, except perhaps the book might be implicitly making use of the claim that building an ASI is certainly a “sufficiently critical try” (if something weaker isn’t already a “sufficiently critical try”), which makes the argument more confusing if left implicit, and poorly structured if used at all within that argument rather than outside of it.
The argument is still not that there is only one chance to align an ASI (this is a conclusion, not the argument for that conclusion). The argument is that there is only one chance to align the thing that constitutes a “sufficiently critical try”. A “sufficiently cricial try” is conceptually distict from “ASI”. The premise of the argument isn’t about a level of capability alone, but rather about lack of control over that level of capability.
One counterargument is to reject the premise and claim that even ASI won’t constitute a “sufficiently critical try” in this sense, that is even ASI won’t successfully take control over the world if misaligned. Probably because by the time it’s built there are enough checks and balances that it can’t (at least individually) take over the world if misaligned. And indeed this seems to be in line with the counterargument you are making. You don’t expect there will be lack of control, even as we reach ever higher levels of capability.
Thus there is no “sufficiently critical try” here. But if there were, it would be a problem, because we would have to get it right the first time then. Since in your view there won’t be a “sufficiently critical try” at all, you reject the premise, which is fair enough.
Another counterargument would be to say that if we ever reach a “sufficiently critical try” (uncontainable lack of control over that level of capability if misaligned), by that time getting it right the first time won’t be as preposterous anymore as it is for the current humanity. Probably because with earlier AIs there will be a lot of more effective cognitive labor and institutions around to make it work.
I think this is missing the point of the date of AI Takeover is not the day the AI takes over, that the point of no return might appear much earlier than when Skynet decides to launch the nukes. Like, I think the default outcome in a gradualist world is ‘Moloch wins’, and there’s no fire alarm that allows for derailment once it’s clear that things are not headed in the right direction.
For example, I don’t think it was the case 5 years ago that a lot of stock value was downstream of AI investment, but this is used elsewhere on this very page as an argument against bans on AI development now. Is that consideration going to be better or worse, in five years? I don’t think it was obvious five years ago that OpenAI was going to split over disagreements on alignment—but now it has, and I don’t see the global ‘trial and error’ system repairing that wound rather than just rolling with it.
I think the current situation looks bad and just letting it develop without intervention will mean things get worse faster than things get better.
As the review makes very clear, the argument isn’t about AGI, it’s about ASI. And yes, they argue that you would in fact only get one chance to align the system that takes over. As the review discusses at length:
I’m aware; I was expressing my disagreement with their argument. My comment was not premised on whether we were talking about “the first AGI” or “the first ASI”. I was making a more fundamental point.
In particular: I am precisely disputing the idea that there will be “only one chance to align the system that takes over”. In my view, the future course of AI development will not be well described as having a single “system that takes over”. Instead, I anticipate waves of AI deployment that gradually, and continuously assume more control.
I fundamentally dispute the entire framing of thinking about “the system” that we need to align on our “first try”. I think AI development is an ongoing process in which we can course correct. I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
I seems like you’re arguing against something different than the point you brought up. You’re saying that slow growth on multiple systems means we can get one of them right, by course correcting. But that’s a really different argument—and unless there’s effectively no alignment tax, it seems wrong. That is, the systems that are aligned would need to outcompete the others after they are smarter than each individual human, and beyond our ability to meaningfully correct. (Or we’d need to have enough oversight to notice much earlier—which is not going to happen.)
That’s not what I’m saying. My argument was not about multiple simultaneously existing systems growing slowly together. It was instead about how I dispute the idea of a unique or special point in time when we build “it” (i.e., the AI system that takes over the world), the value of course correction, and the role of continuous iteration.
You can argue against FOOM, but the case for a significant overhang seems almost certain to me. I think we are close enough to building ASI to know how it will play out. I believe that transformers/LLM will not scale to ASI, but the neocortex algorithm/architecture if copied from biology almost certainly would if implemented in a massive data center.
For a scenario, lets say we get the 1 million GPU data center built, it runs LLM training, but doesn’t scale to ASI, then progress slows for 1+ years. In 2-5 years time, someone figures out the neocortex algorithm as a sudden insight, then deploys it at scale. Then you must get a sudden jump in capabilities. (There is also another potential jump where the 1GW datacenter ASI searches and finds an even better architecture if it exists.)
How could this happen more continuously? Lets say we find arch’s less effective than the neocortex, but sufficient to get that 1GW datacenter >AGI to IQ 200. That’s something we can understand and likely adapt to. However that AI will then likely crack the neocortex code and quite quickly advance to something a lot higher in a discontinuous jump that could plausibly happen in 24 hours, or even if it takes weeks still give no meaningful intermediate steps.
I am not saying that this gives >50% P(doom) but I am saying it is a specific uniquely dangerous point that we know will happen and should plan for. The “Let the mild ASI/strong AGI push the self optimize button” is that point.
It takes subjective time to scale new algorithms, or to match available hardware. Current models still seem to be smaller than the amount of training compute and the latest inference hardware could in principle support (GPT-4.5 might the the closest to this, possibly Gemini 3.0 will catch up). It’s taking RLVR two years to catch up to pretraining scale, if we count time from the strawberry rumors, and only Google plausibly had the opportunity to do RLVR on larger models without massive utilization penalties of small scale-up worlds of Nvidia’s older 8-chip servers.
When there are AGIs, such things will be happening faster, but also the AGIs will have more subjective time, progress in AI capabilities will seem much slower to them than to us. Letting AGIs push the self-optimize button in the future is not qualitatively different from letting humans push the build-AGI button currently. The process happens faster in physical time, but not necessarily in their own subjective time. Also, the underlying raw compute is much slower in their subjective time.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught (it’s not in the interest of AGIs to create an ASI misaligned with the AGIs). Forcing them not to notice this and keep slamming the self-optimize button as fast as possible might be difficult in the same way that aligning them is difficult.
I was talking about subjective time for us, rather than the AGI. In many situations I had in mind, there isn’t meaningful subjective time for the AI/AI’s as they may be built, torn down and rearranged or have memory wiped. There is a range of continuity/self for AI. At one end is a collection of tool AI agents, in the middle a goal directed agent and the other end a full self that protects is continuous identity in the same way we do.
I don’t expect they will be in control or have a coherent self enough to make these decisions. Its easy for me to imagine an AI agent that is built to optimize AI architectures (doesn’t even have to know its doing its own arch)