A particular pattern of argument keeps appearing in security-focused circles: National security, cybersecurity, arms control/nonproliferation, global AI governance, sanctions enforcement and smuggling, or combating election fraud.
The argument is that more often than not, attackers have the strategic advantage over defenders, and that in a world of actors who can choose to either cooperate or defect, this creates a game-theoretic structure with only defect-defect equilibria. Or in simpler terms: Offense is the best defense, so the right move for people, institutions, nations etc. is to attack their rivals first, rather than hoping that others act in good faith.
Yet locally, civilisation and positive-sum games can evidently remain stable for extended periods of time, via law enforcement, or reputational, financial or other consequences for defectors. One explanation for why rule-of-law works nationally but often fails internationally is the presence of an autocratic enforcer: Rules are enforced top-down, by a more powerful entity with information asymmetry and hard power. This interpretation explains coordination failures (particularly on the global stage) via a lack of one over-ruling entity, and failures of rule-of-law through the hegemon’s lack of informational advantage or lack of hard power.
Top-down chains of command and power are one way to keep (lower-ranking) harmful actors in check, but I do not need—or want—to write an essay about the legitimacy, accountability, trust, incentive and representation problems of autocracy (or power concentration in general).
Instead, I want to circle back to the underlying assumption of the worldview that explains civilisation exclusively via top-down exercise of offensive power: Offense-dominance. Decentralised, multipolar power could be stable in a defense-dominant world, where offense is contrary to the self-interest of individual actors. In this post, I want to examine what fundamentally makes the difference between the “defender’s dilemma” (or “security dilemma”) vs. success stories of genuine coordination without one-sided enforcement.
Popular framing of the “Defender’s Dilemma”
Robert Jervis’s “Cooperation Under the Security Dilemma” describes how defensive measures under anarchy can be misread as offensive, producing arms races even among status-quo powers. The intensity of the dilemma scales with two variables: offense-defense balance and offense-defense differentiation. Jervis’s working definition:
“When we say that the offense has the advantage, we simply mean that it is easier to destroy the other’s army and take its territory than it is to defend one’s own.”
In cybersecurity, the term “defender’s dilemma” proliferated independently in multiple publications, with a high-profile example by Libicki, Ablon, and Webb at RAND. The underlying strategic position was stated plainly by US Deputy Secretary of Defense William J. Lynn III in Foreign Affairs:
“In an offense-dominant environment, a fortress mentality will not work.”
An infamous historical formulation is the IRA’s claim of responsibility after the Brighton hotel bombing on 12 October 1984, which targeted Margaret Thatcher and most of her Cabinet:
“Today we were unlucky, but remember we have only to be lucky once, you will have to be lucky always.”
The less well-known conditions for an “Attacker’s Dilemma”
There are situations in which offense is not the best defense, depending on:
What the attacker has to lose if caught in the act.
The probability of getting caught.
Whether when caught, they are likely to be stopped before net-benefitting from the attack.
Assuming calculating, self-interested actors, an “attacker’s dilemma” can be engineered by controlling these three factors (and the attacker’s knowledge about them).
Aumann and Lindell’s “Security Against Covert Adversaries” introduced an adversary model that sits between the standard “semi-honest” assumption (the adversary follows the protocol) and the worst-case “malicious” assumption (the adversary deviates arbitrarily). They coin the term “Covert Adversary”, who faces the following situation:
A deterrence factor ε with the property that detection probability is at least ε·p, where p is the probability of cheating. The adversary cheats if
The same qualitative insight had been arrived at independently from the cyber-defender perspective. Richard Bejtlich, on the TaoSecurity blog in May 2009, named the inversion as the “Intruder’s Dilemma”: the defender only needs to detect one indicator of the intruder’s presence to trigger consequential action. His plain-English statement of the scaling property:
“perversely, the bigger the incident, the more likely someone is going to notice.”
Another way to put this: Deterrence can be particularly effective if detection likelihood scales with the scope of the attack.
David J. Bianco has coined “the Attacker’s Dilemma” as the inverted framing, challenging some of the assumptions of the “Defender’s Dilemma”. One of these assumptions is there only being one chance at detecting the attacker.
“Attackers have to get it right through the entire attack. Defenders only need to detect once.”
This means we can introduce another axis to detection likelihood: Both the severity and the number of attacks should increase this likelihood for deterrence to be successful. If the adversary is a collective of people, every one of these people has the power to expose the covert defection (provided they can obtain evidence). Snowden is a visible example, and we can only speculate about the number of “leaks” elsewhere that were not purely unintentional.
Finally, information asymmetry is not only an attacker’s asset: While it is true that attackers can choose the timing and approach of their campaigns, defenders can likewise maintain ambiguity about their true detection capabilities. This adds psychological deterrence on top of purely mathematical/game theoretic calculation.
Two examples:
WADA’s Athlete Biological Passport, introduced in 2009, publishes the testing programme but does not detail threshold values, the timing of out-of-competition tests, and the specific analytes targeted in any given blood sample.
Arms control treaties make the same move explicit through the “national technical means” (NTM, including satellites, signals intelligence, and similar unilateral verification assets) clause. The 1972 ABM Treaty’s Article XII committed both parties to use NTM and not to interfere with the other party’s NTM. A publicly known inspection regime sets a floor for detection likelihood, while the secret NTM capabilities add another level of risk.
The other two conditions
I discussed detection likelihood, which hits close to home due to my work on AI hardware governance, compute monitoring and workload verification. While verification is an enabler of coordination, conditions 1 and 3 from above are set by society and the environment around individuals and groups.
Laws are enforced by police, career opportunities are (typically) granted based on a track record of reliability and good work, friendships depend (among many things) on how trustworthy you are. Of course, game theory and strategy matter much less in personal relationships than at the civilisational scale. Geopolitical rivals who seek to de-escalate conflict, or even cooperate economically, do not always need a third, overruling party. For this, states can
Leverage intelligence agencies or agree on verification mechanisms.
Agree on consequences for defection, and coordinate with allies on enforcement. This includes positive incentives for cooperation, which a defector can lose.
Prepare intervention measures that can credibly prevent the defector from gaining advantage.
What complicates this is if there is a subset of actors that are egregiously uncooperative, sometimes for irrational reasons such as spite, sometimes because of historical grievances, sometimes because of a misperception of incentives (even if the reality is closer to an attacker’s dilemma than they might think). On a large scale, the presence of such actors is often unavoidable. Here, it matters which one of two possible equilibria has more “critical mass” than the other: The defect-defect equilibrium, or the attacker’s dilemma that a majority of cooperators aim to maintain.
The pessimist’s model treats civilisation as a temporary exception. An illusion sustained only so long as a hegemon can punish defectors.
Mark Carney’s speech at the World Economic Forum in Davos put the opposing worldview into memorable words, so I will close this essay with some quotes:
“A world of fortresses will be poorer, more fragile and less sustainable.”
“Hegemons cannot continually monetize their relationships.”
“If we’re not at the table, we’re on the menu.”
“We shouldn’t allow the rise of hard power to blind us to the fact that the power of legitimacy, integrity and rules will remain strong — if we choose to wield them together.”
This hits very close to home for people like me who want AI to go well. In a defect-defect world, humanity may soon find itself on the menu of superior predators. In a world of bridges rather than fortresses, we do not need overlords —human or AI— to protect us from each other.
Excellent! The viewpoint you argue against is indeed common. It’s the center of my worrying If we solve alignment, do we die anyway? and this logic does clarify the arguments for a stable multipolar AGI situation.
I appreciate the work in this direction; I think it’s a neglected branch of preparing for a post-AGI world.
However, I don’t think this deals well with the situation with ASI. And since we should expect AGI to create ASI fairly quickly, I’m not sure how much this logic helps. Perhaps it does if there’s an agreement to not create ASI.
With ASI in play, I think a lot of the implicit assumptions here break. Those assumptions are that an attack is survivable and that the perpetrator can be punished, and even more implicitly, a finite playing field that can be monitored. ASI will be capable of developing new weapons and tactics.
Unlike humans and their power-bases, ASI doesn’t need collaboration, and it isn’t stuck building offensive capabilities slowly and laboriously. It isn’t a player in traditional game theory; it isn’t even a pawn that becomes a queen; it can shift the rules more dramatically.
If I am (or control) an ASI, my first move might be to place some of my assets “off the playing field” and outside of my rivals’ ability to monitor them. That could be underground or space manufacturing. Then I’ll set about building more assets as fast as possible, in a fully exponential exponential mode.
If I were vicious enough, I might attack as soon as I had a novel weapon or strategy. The more vicious I am, the less I care about retaliation that can’t eliminate me (or my backups) and perhaps a few humans I actually care about.
Triggering a nuclear exchange to tilt the playing field is just a human-level strategem. Getting out of the system and then sending the sun nova is probably the worst of a plethora of ideas.
I mention these possiblities as examples of how the logic shifts once you have ASI that can rapidly build new capabilities. This theory might applly while we’re in an AGI stage, but I don’t see how that lasts long enough to create a stable balance of power. Unless perhaps those AGIs agree that preventing the construction of ASI is mutually beneficial, and help their humans come up with plans to enforce an equilibrium like you describe.