Thanks to everyone who commented! Since there are too many comments for me to respond to all, let me try to summarize here where I disagree with the binay “before vs. after” of EY & NS. (For a very high level of the “continuous” point of view, see OpenAI’s blog post.) As I wrote, I also disagree with the “grown” vs “crafted” as a hard binary dichotomy, but won’t focus on this in this comment.
The way I see it, this framework makes the following assumptions, which I do not believe are currently well supported:
Singular takeover event:
The assumption is that all that matters is a singular time where the AI “takes over.” Even the notion of “take over” is not well defined. For example, is “taking over” the united states enough? I would imagine so. Is “taking over” north Korea enough? Maybe also—DPRK has already been taken over by a hostile entity but most countries in the world are not eager to risk a nuclear confrontation to remedy this. Is “taking over” some company and growing over time in its power also enough, maybe so?
In reality I think there is going to be a gradual increase both in capabilities of AI and in its integration into society and amount of control it is handed over critical systems. There are still much room to grow in both dimensions: both capabilities are still very far from working autonomously at typical human level, let alone superhuman, and the integration in society is still its infancy. EY&NS make the point that to some extent you could trade lack of power for intelligence—e.g. if you are not already in charge of the power grid, you can hack into it—but there is also a lot of friction in such an exchange.
It is unclear why, if AI systems have a propensity for acting covertly in pursuit on their own goals, we would not see this propensity materializes in harmful ways of growing magnitude well before they are capable of taking over the world. It seems that the underlying assumption is their ability to be perfect at hiding their intention and “lying in wait”, but current AI systems are not perfect at anything.
Treating “AI” as a singular entity:
EY&NS essentially treat AI as a singular entity that waits until it is powerful enough to strike. Part of treating it as a single entity means that they don’t model humans as being augmented with AI (or they treat these AIs as insignificant since they are not ASI). In reality there would likely be many AI systems from different vendors with varying capabilities. There may be some degree of collusion and/or affinity between different systems, but to the extent this is an issue I believe it can be measured and tracked over time. However, the EY&NS requires AIs to essentially view themselves as one unit. If an AI system is already in control of a decent-size company and could pursure its goals, the EY&NS model is that it will still not do that, but continue pretending to be perfectly aligned, so that its successor would be able to take over the world.
This is also somewhat related to the “grown” vs “crafted” issue. AI systems today sometimes scheme, hack, and lie. But why they do that is actually not so mysterious as EY&NS make it to be. Often we can trace back how certain aspects in training—e.g. rewarding models for user preferences, or for passing coding tests—will give rise to these bad behaviors. This is not good and we need to fix that, but it’s not some arbitrary behavior either.
Recursive self improvement
EY&NS don’t talk about this enough, but I think the only potentially true story for an actual singularity is via recursive self improvement (RSI). That would be the real point where there is a singularity rather than “take over” which not well defined.
One way to think about this is that RSI happens when a model can completely autonomously train its successor. But for true RSI it should be the case that if it took flops to train model n, then it would take for flops to train model n+1 that is more intelligent than model n, and so on and so forth. (And even such an improvement chain would take non trivial amount of time—e.g., if it took 8 months to train model n, then even in the optimistic setting of maybe it would take 4 months to train model n+1, and 2 months to train model n+1, etc.. which does converge, but it’s also not happening in split seconds either.)
I think we will see if we are headed in this direction, but right now it does not seems this way. First, there are multiple “doublings” that need to happen before we reach the “train your successor” phase. (See the screenshot below.) Second, to a first approximation, our current paradigm in AI is:
power --> compute --> intelligence
There is certainly a lot of room to improve both of these but:
1. Radically improving the power --> compute efficiency will likely require building new hardware, datacenters, etc.. that takes time.
2. For improving the compute --> intelligence , note that even the most significant ideas, like transformers, were mostly about improving utilization of existing compute—so it was not so much about creating more intelligence per FLOP but about being able to use more of the FLOPs of the existing hardware of many GPUs. So these are also tied to existing hardware. There is definitely some room in increasing utilization of existing compute, but there is a limit to how many OOMs you can get this way before you need to design new hardware. There is also room for improving intelligence/FLOP without it, but I don’t think we have evidence that there is huge number of OOMs that can be saved.
Given the costs involved and huge incentive to save on both 1 and 2, I expect to continue to see improvements in both directions, including improvements that are using AIs, but I expect these will be gradual and help “maintain the exponential.”
Sum up
It is possible that over the next few years, new evidence will emerge that points out more to the EY&NS point of view. Maybe we will see a certain “threshold” crossed where AI systems behave in a completely strategic way. Or maybe we would see some evidence for a completely new paradigm of getting arbitrary levels of intelligence without needing more compute. Even in the “gradual” point of view, it does not mean we will be safe. Perhaps we will see tendencies such as collusion, scheming, deception increasing with model capabilities, with alignment methods unable to keep up. Perhaps as AIs will be deployed in the world, we will see catastrophes of continually growing magnitudes with signs pointing out to safety getting worse, but for some reason or another, humanity would not be able to get its act together. I am cautiously optimistic at the moment (e.g., on Monday Anthropic released Claude Sonnet 4.5 that they claimed—and I think with good reason—that it was both the most capable model and the most aligned model that they ever released). But I think there is till much that we don’t know and still a lot of work to be done.
A screenshot from the first lecture in my AI class.
Wrote one long comment in my non review of IABIED as response to a bunch of other comments.