Why I Think Pause is Impossible

Note: I deeply believe in trying to figure out how to make AI go well for humanity. If you’ve read other things I’ve written you’ll notice I am too dumb to figure out solutions^[1], but on occasion I think I see gaps in proposed solutions from others. I am not writing this on the basis of some weirdo motivation other than to try to encourage debate and rigorous thinking on the very popular idea of pausing AI development because I think it has major vulnerabilities.

Why Pause?

The idea of a global pause feels natural: we can see that AI is changing everything, and no one can deny that there is a chance things could go very poorly, so wouldn’t it be good to pause to give us more time to figure things out? Depending on how you view the stakes, the idea could even be viewed as a moral obligation. Unfortunately, I think pause is almost certainly an impossibility due to the structure of the game being played. Pausing ends up as a robustly dominated strategy that no rational self-preserving actor can choose regardless of how dangerous they think superintelligence might be.

For pause to work four things would need to be true, which are not: the game would need to continue indefinitely, compliance would need to be verifiable, pausing would need to be decision-theoretically rational, and uncertainty must favor caution. I am going to work through each of these in turn.

The Game Does Not Continue Indefinitely

Every call for a coordinated pause, regardless of whether it is a voluntary moratorium, a binding treaty, or a formal non-proliferation agreement^[2] implicitly relies on the Folk Theorem of repeated games. The concept is simple: cooperation is maintained because the discounted value of continued play in the future is more than the temptation to cheat. This is the basic idea that explains why we didn’t all die in a nuclear apocalypse during the Cold War, and fingers crossed if you are reading this, still haven’t. Everyone understands that this game doesn’t end.

The race to superintelligence is different because it has an “absorbing” state, a state that once entered ends the game forever. In this case, the game could end forever because the first to superintelligence could provide omnipotent, unipolar, control. However; we don’t actually need this strict of condition, we only need a weaker threshold property: that there exists some capability level such that for the strategic advantage becomes self-reinforcing.

Any actor with AI like this can improve faster, beat their competitors more thoroughly, etc. in a way that trailing actors can’t keep up with, eventually causing the trailing actors to discontinuously shift from “compete” to “capitulate”. When that happens, the game is over. This violates the Folk Theorem because it requires the cumulative probability of the game continuing forever to be strictly positive. ^[3]

This is immensely problematic because in any given period a rational actor can reach , end the game, and win a payoff that is much larger the the result from continued cooperation. The whole apparatus that underpins nuclear deterrence dissolves on this ground alone for AI, it’s just too beneficial to defect.

Compliance is Not Verifiable

Think about this: merely having a nuclear weapon doesn’t win the game, using it does. And if an agent uses a nuclear bomb all the other agents would know in minutes and launch retaliatory strikes. In other words, no one can secretly win the war. This is different than the path to superintelligence because crossing is the decisive act, and it can be won in secret with significant detection lag from others. If the detection lag is longer than the time it takes to solidify a self-reinforcing advantage then any retaliatory mechanism is obsolete. It is actually the case that this detection lag dynamic exists in an unresolvable identification region which makes no retaliation credible.

As I mentioned in my essay on canonical probabilities^[4], general purpose software natively contain “splice programs”, research programs that look exactly like normal narrow AI development or safety research until the moment they cross . It’s possible to show these scenarios are indistinguishable mathematically by comparing the observable data generated by both scenarios. Let be the measure over observable traces (compute usage, research, etc.) under genuine compliance, and be the measure under covert defection. The Evidential Screening Property guarantees that the likelihood ratio between these models is bounded by a tiny algorithmic complexity penalty:

where grows at most . The critical issue is that no monitoring scheme, no matter how detailed, can distinguish compliance from covert defection with canonical confidence before the threshold is crossed. Nuclear development has unambiguous evidence leakage (the dormancy defect , the rate at which covert programs leak observable evidence, is meaningfully greater than zero, so data can accumulate against covert nuclear programs and likelihood ratios grow over time), but AI development does not (the dormancy effect is near zero). Please check out my other essay if you want to see more of the reasoning behind this.

Pausing is Not Decision-Theoretically Rational

Let’s backtrack for a moment and assume the two points above were wrong (the game continued indefinitely and pausing was verifiable). Even then, pause is robustly dominated. If you erroneously use infinite payoffs^[5] in your payoff matrix, it’s easy to get stuck in Pascal’s Mugging situations, so I don’t want to use anything with infinity here.

The probabilities that would allow an agent to figure out if pause is rational are stuck in an uncertain range (an “identification region”). Because the agent has no way of knowing the exact chances, they must assume the worst and make the best decision they can (Minimax Regret). To show this, let’s define prior-free identification regions for the key parameters:

: partially identified probability an adversary secretly defects.
: partially identified probability defection successfully yields .
: value of unipolar ASI (astronomically large, but strictly finite, no need for it to be ).
: finite loss from getting caught defecting (sanctions, kinetic strikes, etc.).
: status quo value.

Now we will make only one overall assumption: that we cannot canonically exclude the possibility of secret defection, that we can’t exclude the possibility of success, and that the unipolar threshold advantage is more valuable than the chance of getting caught defecting and the status quo.

^[6]

Next we apply the minimax regret where the regret of an action is the difference between the payoff you received and would have received if you had chosen optimally:

The maximum regret of pausing is that an adversary defects and achieves which forces you into unrecoverable strategic subordination. Under any point in the identification region this expected regret is at least .
The maximum regret of defecting is being caught before you reach and suffer which is a finite, canonically bounded geopolitical penalty.

Because evidential screening ensures the identification region does not shrink over time, a pausing actor can never resolve the ambiguity and they are always exposed to the worst case regret. The regret of pausing scales with which is very large and is finite. So defecting strictly minimizes maximum expected regret across the entire identification region.

Uncertainty Does Not Favor Caution

This section feels bad to write because it is very counter-intuitive. Most of the time when you are uncertain about risks, it makes all the sense in the world to be cautious and pause. If you are in the mountains and want to cross a slope but are not sure if it is going to avalanche, it’s totally reasonable to not cross it or wait and get more data.

But in the AI development game, irreducible uncertainty is anti-cautious. What I mean is that for pause to be a rational strategy, a pausing actor must have canonical confidence that their adversary’s defection probability is very close to zero. But the non-canonical framework shows that an actor literally can’t figure out this probability in any policy-relevant timeline. So pausing isn’t safe in any traditional sense and it asks nationstates to make a sovereignty-level bet on a parameter that is provably unknowable.

Even worse, this has a sort of ratchet effect: once a nation tastes the forbidden sweetness of defecting, each increment of progress lowers the cost to get to so the rational incentive to continue becomes stronger over time. All actors know the other actors face this exact incentive structure so the probability that one adversary has already begun to covertly defect increases monotonically over time. The whole equilibrium unspools from the future back to the present and current actors are forced to preemptively defect today.

Conclusion

In an attempt to offer a constructive solution, one counterintuitive idea I had (one that I only hold loosely) is that more heavily entangling AI development among nations, rather than trying to segregate it under a racing development paradigm, may be beneficial. In other words, if the US opened the floodgates to Chinese firms and they jointly worked together then it is the world that approaches , not a single nation or corporation. This fundamentally disrupts the payoff matrix before the absorbing state is reached. It feels odd even to me, but there is a logic to it.

In conclusion, strategies ported from the 20th century are appealing because they are familiar and have served humanity well. It would be awesome if superintelligence had the same qualities that made the last centuries crises so tractable. It doesn’t. It would be better for humanity if we stopped pretending it did.

^
I tried to think of one solution at the conclusion, but it may be a really bad idea.
^
I wrote a more narrow critique on this topic here: The Jackpot Jinx (or why “Superintelligence Strategy” is wrong).
^
The cumulative probability must be strictly positive. is the probability that someone reaches the absorbing state in period . The infinite product is positive iff . As global compute scales, more actors enter the race, and AI becomes better, is not decaying, rather it is likely bounded away from 0 or increasing, so the sum diverges and the product converges to zero. Importantly, unlike the exogenous termination risks that the Folk Theorem can accommodate, reaching is an endogenous choice that rewards the actor who makes it, meaning the game wouldn’t end randomly, it ends because a player chose to win.
^
Unprecedented Catastrophes Have Non-Canonical Probabilities.
^
As I stupidly did in my previous paper The Jackpot Jinx.
^
Note: this could be contestable if is a nuclear war or something terrible.