As the review makes very clear, the argument isn’t about AGI, it’s about ASI. And yes, they argue that you would in fact only get one chance to align the system that takes over.
I’m aware; I was expressing my disagreement with their argument. My comment was not premised on whether we were talking about “the first AGI” or “the first ASI”. I was making a more fundamental point.
In particular: I am precisely disputing the idea that there will be “only one chance to align the system that takes over”. In my view, the future course of AI development will not be well described as having a single “system that takes over”. Instead, I anticipate waves of AI deployment that gradually, and continuously assume more control.
I fundamentally dispute the entire framing of thinking about “the system” that we need to align on our “first try”. I think AI development is an ongoing process in which we can course correct. I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
I seems like you’re arguing against something different than the point you brought up. You’re saying that slow growth on multiple systems means we can get one of them right, by course correcting. But that’s a really different argument—and unless there’s effectively no alignment tax, it seems wrong. That is, the systems that are aligned would need to outcompete the others after they are smarter than each individual human, and beyond our ability to meaningfully correct. (Or we’d need to have enough oversight to notice much earlier—which is not going to happen.)
You’re saying that slow growth on multiple systems means we can get one of them right, by course correcting.
That’s not what I’m saying. My argument was not about multiple simultaneously existing systems growing slowly together. It was instead about how I dispute the idea of a unique or special point in time when we build “it” (i.e., the AI system that takes over the world), the value of course correction, and the role of continuous iteration.
I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
You can argue against FOOM, but the case for a significant overhang seems almost certain to me. I think we are close enough to building ASI to know how it will play out. I believe that transformers/LLM will not scale to ASI, but the neocortex algorithm/architecture if copied from biology almost certainly would if implemented in a massive data center.
For a scenario, lets say we get the 1 million GPU data center built, it runs LLM training, but doesn’t scale to ASI, then progress slows for 1+ years. In 2-5 years time, someone figures out the neocortex algorithm as a sudden insight, then deploys it at scale. Then you must get a sudden jump in capabilities. (There is also another potential jump where the 1GW datacenter ASI searches and finds an even better architecture if it exists.)
How could this happen more continuously? Lets say we find arch’s less effective than the neocortex, but sufficient to get that 1GW datacenter >AGI to IQ 200. That’s something we can understand and likely adapt to. However that AI will then likely crack the neocortex code and quite quickly advance to something a lot higher in a discontinuous jump that could plausibly happen in 24 hours, or even if it takes weeks still give no meaningful intermediate steps.
I am not saying that this gives >50% P(doom) but I am saying it is a specific uniquely dangerous point that we know will happen and should plan for. The “Let the mild ASI/strong AGI push the self optimize button” is that point.
It takes subjective time to scale new algorithms, or to match available hardware. Current models still seem to be smaller than the amount of training compute and the latest inference hardware could in principle support (GPT-4.5 might the the closest to this, possibly Gemini 3.0 will catch up). It’s taking RLVR two years to catch up to pretraining scale, if we count time from the strawberry rumors, and only Google plausibly had the opportunity to do RLVR on larger models without massive utilization penalties of small scale-up worlds of Nvidia’s older 8-chip servers.
When there are AGIs, such things will be happening faster, but also the AGIs will have more subjective time, progress in AI capabilities will seem much slower to them than to us. Letting AGIs push the self-optimize button in the future is not qualitatively different from letting humans push the build-AGI button currently. The process happens faster in physical time, but not necessarily in their own subjective time. Also, the underlying raw compute is much slower in their subjective time.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught (it’s not in the interest of AGIs to create an ASI misaligned with the AGIs). Forcing them not to notice this and keep slamming the self-optimize button as fast as possible might be difficult in the same way that aligning them is difficult.
I was talking about subjective time for us, rather than the AGI. In many situations I had in mind, there isn’t meaningful subjective time for the AI/AI’s as they may be built, torn down and rearranged or have memory wiped. There is a range of continuity/self for AI. At one end is a collection of tool AI agents, in the middle a goal directed agent and the other end a full self that protects is continuous identity in the same way we do.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught
I don’t expect they will be in control or have a coherent self enough to make these decisions. Its easy for me to imagine an AI agent that is built to optimize AI architectures (doesn’t even have to know its doing its own arch)
I’m aware; I was expressing my disagreement with their argument. My comment was not premised on whether we were talking about “the first AGI” or “the first ASI”. I was making a more fundamental point.
In particular: I am precisely disputing the idea that there will be “only one chance to align the system that takes over”. In my view, the future course of AI development will not be well described as having a single “system that takes over”. Instead, I anticipate waves of AI deployment that gradually, and continuously assume more control.
I fundamentally dispute the entire framing of thinking about “the system” that we need to align on our “first try”. I think AI development is an ongoing process in which we can course correct. I am disputing that there is an important, unique point when we will build “it” (i.e. the ASI).
I seems like you’re arguing against something different than the point you brought up. You’re saying that slow growth on multiple systems means we can get one of them right, by course correcting. But that’s a really different argument—and unless there’s effectively no alignment tax, it seems wrong. That is, the systems that are aligned would need to outcompete the others after they are smarter than each individual human, and beyond our ability to meaningfully correct. (Or we’d need to have enough oversight to notice much earlier—which is not going to happen.)
That’s not what I’m saying. My argument was not about multiple simultaneously existing systems growing slowly together. It was instead about how I dispute the idea of a unique or special point in time when we build “it” (i.e., the AI system that takes over the world), the value of course correction, and the role of continuous iteration.
You can argue against FOOM, but the case for a significant overhang seems almost certain to me. I think we are close enough to building ASI to know how it will play out. I believe that transformers/LLM will not scale to ASI, but the neocortex algorithm/architecture if copied from biology almost certainly would if implemented in a massive data center.
For a scenario, lets say we get the 1 million GPU data center built, it runs LLM training, but doesn’t scale to ASI, then progress slows for 1+ years. In 2-5 years time, someone figures out the neocortex algorithm as a sudden insight, then deploys it at scale. Then you must get a sudden jump in capabilities. (There is also another potential jump where the 1GW datacenter ASI searches and finds an even better architecture if it exists.)
How could this happen more continuously? Lets say we find arch’s less effective than the neocortex, but sufficient to get that 1GW datacenter >AGI to IQ 200. That’s something we can understand and likely adapt to. However that AI will then likely crack the neocortex code and quite quickly advance to something a lot higher in a discontinuous jump that could plausibly happen in 24 hours, or even if it takes weeks still give no meaningful intermediate steps.
I am not saying that this gives >50% P(doom) but I am saying it is a specific uniquely dangerous point that we know will happen and should plan for. The “Let the mild ASI/strong AGI push the self optimize button” is that point.
It takes subjective time to scale new algorithms, or to match available hardware. Current models still seem to be smaller than the amount of training compute and the latest inference hardware could in principle support (GPT-4.5 might the the closest to this, possibly Gemini 3.0 will catch up). It’s taking RLVR two years to catch up to pretraining scale, if we count time from the strawberry rumors, and only Google plausibly had the opportunity to do RLVR on larger models without massive utilization penalties of small scale-up worlds of Nvidia’s older 8-chip servers.
When there are AGIs, such things will be happening faster, but also the AGIs will have more subjective time, progress in AI capabilities will seem much slower to them than to us. Letting AGIs push the self-optimize button in the future is not qualitatively different from letting humans push the build-AGI button currently. The process happens faster in physical time, but not necessarily in their own subjective time. Also, the underlying raw compute is much slower in their subjective time.
And if being smarter makes AGIs saner, they’ll convergently notice that pushing the self-optimize button without understanding ASI-grade alignment is fraught (it’s not in the interest of AGIs to create an ASI misaligned with the AGIs). Forcing them not to notice this and keep slamming the self-optimize button as fast as possible might be difficult in the same way that aligning them is difficult.
I was talking about subjective time for us, rather than the AGI. In many situations I had in mind, there isn’t meaningful subjective time for the AI/AI’s as they may be built, torn down and rearranged or have memory wiped. There is a range of continuity/self for AI. At one end is a collection of tool AI agents, in the middle a goal directed agent and the other end a full self that protects is continuous identity in the same way we do.
I don’t expect they will be in control or have a coherent self enough to make these decisions. Its easy for me to imagine an AI agent that is built to optimize AI architectures (doesn’t even have to know its doing its own arch)