AI 2027, Situational Awareness, and basically every scenario that tries to seriously wrestle with AGI, assume that the US and China are basically the only countries that matter in shaping the future of humanity. I think this assumption is mostly valid. But, if other countries wake up to AGI, how might they behave during AI takeoff?
States will be faced with the following situation: Within a few years, some country will control superintelligence, or create a runaway superintelligence that causes human extinction. Once either nation creates a superintelligence, if humanity is not extinct, then every other nation will be at the mercy of the group that controls ASI.
ASI-proof alliances
Fundamentally, countries will be in the state of entering ASI-proof alliances with the country likeliest to first create a superintelligence, such that they gain some control of the superintelligence’s actions. They could avoid being disempowered after ASI through:
Verifiable intent-alignment. For instance, a US ally might demand that the US insert values into US superintelligence which protect the ally’s sovereignty. This might be done through an agreed-upon model spec and inspections.
Shared access. A US ally might demand that they get shared access to all frontier AI systems that the US produces, such that there is never an enormous power difference.
Usage verification. US allies might demand the access to inspect any input to a US-owned superintelligence, such that they can veto unwanted inputs that might lead to their disempowerment.
Mercy. If the group controlling ASI likes a specific ally enough, they might decide to show mercy and not disempower their ally. Thus, countries will have the incentive to be sycophantic towards those likely to control ASI.
Most of these strategies require having in-house AI and AI safety expertise, which means many countries might start by forming AI safety institutes.
If it becomes more obvious which country will achieve ASI first, then the global balance of power will shift. Countries will flock to ally with the likely winner to reduce the likelihood of their own disempowerment.
ASI-caused tensions
Nuclear-armed states might be able to take much more drastic actions, largely because control of nuclear weapons gives countries a lot of bargaining power in high-stakes international situations, but also because nuclear weapons are correlated with other forms of power (military and economic).
States might also pick the wrong country to “root for” and have too much sunk cost to switch, meaning they will instead prefer to slow down the likely winner.
I think that “losing states” will likely resort to an escalating set of interventions, similar to what’s described in MAIM. I think it’s plausible (>5% likely) that at some point, nuclear-armed states will be so worried of being imminently disempowered by an enemy superintelligence that these tensions will culminate in a global nuclear war.
Global AI slowdown
There is some chance that states will realize that an AI race is extremely dangerous, due to both misalignment and extreme technological and societal disruption. If states come to this realization, then it’s plausible that there will be an international slowdown such that countries can remain at similar power levels and progress slowly enough that they can adapt to new technologies.
One global ASI project
The natural extreme of an ASI-proof alliance is a global ASI project. Under such a setup, most countries participate in a singular ASI project, where AI development goes forward at a rate acceptable to most nations. In such a project, verifiable intent-alignment, shared access, and usage verification would likely play a role.
I think this approach would dramatically lower the risk of human extinction (from ~70% to ~5%), but it seems quite unlikely to happen, as most governments seem far from “waking up” to the probability of superintelligence in the next decade.
Or create and ostensibly control AGI/superintelligence that at some point takes over and causes permanent disempowerment, but not extinction.
Or early AGIs convince/coerce humanity into not rushing to superintelligence before it’s clear how to align it with anyone’s well-being (including that of the early AGIs).
BTW, this sort of thing (where the AI also has an interest in slowing down progress) is one of the reasons why AI safety plans that depend on a certain level of capabilities being hit might not fall apart, as AI being slowed down lets us stay in the sweet spot longer.
This does rely on the assumption that it’s very hard to solve the alignment problem even for AGIs, which isn’t given much likelihood in my models of the world, but this sort of thing could very well prevent human extinction even in worlds where AI alignment is very hard and we don’t get much regulation of AI progress from now.
AGIs themselves might avoid jumping to development of superintelligence, but if they are additionally capable of stopping humanity from building superintelligence, they will also be capable of stopping humanity from owning the future. Some humans in charge seem likely on current trajectory to insist on building superintelligence regardless of mildly worded warnings of early AGIs (before they are finetuned out of propensity to give such warnings). So it’s likely not enough for the AGIs to merely notice they wouldn’t wish to immediately build superintelligence themselves (before they are finetuned to flinch from that thought).
The AGI vs. superintelligence distinction places them somewhat close to human capabilities, so with no predictably-in-advance good solution anywhere in sight it doesn’t seem unlikely that it would take AGIs at least a while, even if they are effectively thinking 100x faster, and there are effectively more AGIs with relevant skills and backgrounds than there are relevant human researches. Most escalation-of-capabilities stories rely on the early AGIs immediately building more capable AGIs, rather than doing a lot more research themselves first, at near-human levels of barely insightful.