International Law Cannot Prevent Extinction Either
The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass and write a whole-assed post.
I agree with Eliezer’s main thesis that individual violence against AI researchers is both morally wrong and strategically stupid. Where I disagree is with the claim that international law can prevent extinction. It can’t, for the following reasons.
I. International law is largely a fiction (especially when interests diverge sharply)
The analogy with nuclear weapons is a poor one. North Korea signed the nuclear non-proliferation treaty and developed nuclear weapons anyway. The treaty deterred only those who weren’t very motivated anyway. And the reason why the US and Russia didn’t nuke each other has nothing to do with international treaties (see point II).
In practice, powerful countries disregard international law whenever they want. A stark example of this is the Budapest Memorandum: in 1994, Ukraine surrendered all its nuclear warheads in exchange for written sovereignty guarantees from Russia, the US, and the UK. Russia annexed a part of Ukraine in 2014, and the international community expressed concern. Russia launched a full-scale invasion in 2022, and the first thing the international community did was block the bank cards of anti-war citizens fleeing Russia. No military intervention ever materialized. Putin is doing just fine.
There are no stable enforcement mechanisms to address violations of international law. This is not a world of parties engaging in good-faith negotiations. It is a world in which Putin, Xi, and Trump treat international commitments as an empty pretext to show off in front of the cameras.
II. The AI race is perceived as asymmetrical, unlike nuclear MAD
The proposed AI treaty is compared to nuclear non-proliferation, but the underlying incentive structures in these two cases differ radically.
As Eliezer noted, neither Soviet nor American leadership expected to have a good day if an actual nuclear war happened. They understood that a first strike wouldn’t prevent devastating retaliation. Carl Sagan and colleagues reinforced these fears with their nuclear winter research, arguing that even a limited nuclear exchange could trigger catastrophic global cooling. The Soviets refrained from launching a first strike not out of adherence to international treaties. They did so because they genuinely perceived the situation as lose-lose.
However, the AI race appears to have a different payoff structure, at least as most people perceive it. If you develop ASI first, you potentially win decisively, preventing retaliation altogether. This creates immense pressure to defect.
III. There is virtually zero possibility of consensus on AI risk, unlike nuclear weapons
To be fair, the win-lose perception described above is probably wrong. In most cases, a misaligned ASI is catastrophic regardless of who builds it first. So the true payoff structure is likely to be lose-lose, just like nuclear war. But that doesn’t matter for treaty prospects, because behavior is driven by perceived payoffs, not actual ones.
Eliezer’s treaty seems to require that everyone becomes so terrified of ASI that only a madman would violate it. Is that realistic? I’d say that if there emerges a scientific consensus on this topic, it’ll be a great first step towards such a world. Eliezer writes that a few hundred computer scientists, Nobel laureates, and others have called AI an extinction risk. Unfortunately, many others have disagreed, and the public debate is nowhere near settled. The situation is more analogous to the early climate change debates than to nuclear weapons. And even as scientific consensus on climate did eventually appear, most countries have done little of substance in response, because their incentives to ignore the consensus outweigh the perceived risk.
Without a real consensus on the AI risk, the perceived payoff will continue to be enormous compared to the perceived risk. Trying to build a working AI ban on this foundation means a lot of wasted time and effort.
At least with climate change, evidence eventually accumulated. With AI risk, by the time the evidence arrives, it will be too late to act on it.
IV. The proposed enforcers have a demonstrated track record of not enforcing things
For a treaty to function, its enforcement has to be credible, i.e., the adversaries have to believe that violations will actually trigger consequences, i.e., the proposed airstrikes. If the US is seen as unwilling to follow through on stated commitments, the treaty is not going to be taken seriously[1].
Regardless of the administration or country in question[2], modern international politics seems to be characterized by a persistent pattern of maximum stated commitment followed by a face-saving partial retreat declared as success. Viewed charitably, this pattern reflects the objective difficulty of costly enforcement against nuclear-armed adversaries, which is exactly what an AI treaty would require overcoming, repeatedly.
V. GPU control is not analogous to nuclear material control
The treaty’s proposed mechanism is control of high-end GPU clusters. This is much worse than nuclear material control in terms of ease and reliability.
Weapons-grade uranium and plutonium are physically rare and require large and specialized industrial infrastructure (centrifuge cascades emit distinctive mechanical vibrations, enrichment facilities have identifiable thermal and radiological signatures), and can be monitored with a relatively small number of inspection sites. And one can’t simply find a less conspicuous way to create nuclear weapons.
For AI training, GPUs are the current bottleneck, where the key word is “current”. Training frontier models on CPUs is much slower per chip, but CPU clusters are much more accessible, geographically diffuse, and unmonitorable at scale. A country like Russia (with a functioning math education system that produces new research talent every year) could plausibly distribute training across tens of thousands of ordinary servers in many locations. Unlike uranium centrifuges, small server clusters emit no detectable signatures from orbit.
More importantly, algorithmic advancements have repeatedly reduced computational requirements.[3] The one guaranteed result of the proposed treaty is a flowering of new creative, unmonitorable alternatives to the current training methods.
VI. A flawed treaty is not better than nothing
The argument that even an imperfect treaty that buys two years is better than no treaty sounds reasonable in the abstract. Without a treaty, the AI race is unconstrained. However, the flaws in any such treaty will systematically favor power-hungry authoritarian countries, which automatically increases the odds of the worst possible outcome.
Which states usually comply with international agreements they find costly and which don’t? The Soviet bioweapons program, Biopreparat, continued in flagrant violation of the Biological Weapons Convention from 1975 until the early 1990s. The program employed tens of thousands of people and remained undetected by Western intelligence for most of that period.
One almost certain consequence of the treaty is that the most risk-aware AI lab currently working at the frontier will stop its capabilities research. Another is that a certain former KGB officer will be jumping for joy in his secret underground bunker, not believing his luck.
If a flawed treaty means that Anthropic pauses and authoritarian programs continue, then the treaty is truly worse than nothing. A quick death from a genuinely misaligned ASI, built by anyone, is a terrible outcome. But there are things much worse than death, and one can only hope that an ASI trained under the supervision of professional torturers turns out misaligned and kills us all quickly enough.
So is there a better way?
There is one kind of action that seems valuable in the absence of a good solution: extending the scope of the search effort. Right now it seems that our search should be wider, more desperate, and at a larger scale than it currently is. For example, we could:
ask billionaires and governments to issue a huge number of grants to people willing to work on this problem so that millions of smart people could stop going to their pointless jobs and try to solve this,
encourage everyone already involved in AI alignment research to crowdsource tractable sub-problems to the wider public,
encourage Anthropic to identify promising researchers among Claude users (Claude interacts with millions of people daily and could, with appropriate design, find users who show unusual cognitive skills).
In any case, the absence of a good solution is not an argument for a bad one. We need to look harder.
- ^
- ^
See also China’s final warning.
- ^
Recent work on BitNet and Mixture of Experts has already demonstrated that very capable LLMs can be trained on hardware below the H100 tier.
Yeah. Though I’d go further. The US government and leading AI companies have already jointly decided to race hard and win. It’s one bloc (“Anthropic has much more in common with the Department of War than we have differences”—Dario Amodei). This bloc sees itself in the lead and has immense money and power in their sights. They ain’t stopping to sign no treaty.
That’s the big problem I see now, it’s not international. It’s a handful of people in the US who are already far ahead of everyone else in the world, and want to be even further ahead. If this handful of people continue on their course, there’ll be no treaty ever, no matter what the rest of the world does.
“The US government and leading AI companies have already jointly decided to race hard and win.”
Have they ever defined what exactly this race is? What does its finishing line, or the post-race state of the world look like? What are they trying to accomplish?
They want to be the only ones with the nuke that prints money. It’s exactly as simple-minded as it seems. You can tell them all day that it’ll backfire; they’ll listen and continue.
Okay, but then what? Do they all (US companies + US government) believe that—once they have actually built AGSI—there will be a static endpoint at which all their competitors say: “Okay, you’ve invented AGSI now; we all give up”? If so: In what competition, exactly, would they be giving up? Would the rest of the world then stop conducting further AI research? Or would the OpenAI/Anthropic/Google/Musk AI then be deployed to sabotage AI research in the rest of the world? I don’t understand the scenario.
Well, obviously, then they threaten or sanction the rest of the world into not building AI.
Going point by point:
I: It was 21 years between when North Korea signed the nonproliferation treaty and their first nuclear test. And they were very motivated. Seems to me like the treaty actually did something?
The international community was limited to “expressing concern” only because Russia had nukes. For the current war, their interventions have gone far beyond blocking some bank cards. Large amounts of material support for the war doesn’t seem like “no military intervention” to me. Also, nobody believes the Ukraine war represents an existential threat to humanity; if they did, I think you’d see quite a lot more intervention.
II: Different payoff structures “as most people perceive it”? Most people thinking about AI and national security only see it as an issue of getting a military edge via autonomous weapons and mass surveillance. They do not actually think AI progress could lead to something that can function as a successor species. If they did, I think they would be acting very differently. Getting to point where people believe that seems like a major precursor to any treaty. Also, even if they are AGI-pilled, decision makers may not have a tendency towards “And then we will take over the world with AI! Or kill everyone trying!” in their thinking, compared to the worst fears of rationalists.
III: A political consensus is different from a scientific consensus. National security types may have a considerably different reaction to possibilities of doom than most ML researchers.
IV: I don’t think TACO is a sufficient reason to argue that we are actually incapable of enforcing treaties. Treaties still exist, and often are still being at least partially enforced. Iran sanctions happened for decades and continue to happen. But I do agree that a treaty between nuclear powers with adversarial incentives to defect is a hard problem.
V: Are you sure it’s actually replaceable by geographically diffuse CPU clusters? Isn’t part of the whole reason you have to do datacenters because you need low latency? What % of global CPUs would you need to replicate frontier training efforts? If it’s even close to 1%, that seems like a very detectable datacenter to me.
Even if it is replaceable, somehow—OK—should that be a thoughtstopper? Is there no way to gain traction on the problem?
Though I do think limiting algorithmic efficiency improvement by treaty is something we should be concerned about. Are there bottlenecks on that? Do those require large amount of compute to obtain, or validate? Could a cultural norm against algorithmic improvement be inculcated among scientists? Again, just because something seems hard doesn’t mean it’s impossible.
There’s also the danger that one of the various other AGI efforts outside of LLMs that don’t need large amounts of resources might pay off. That seems pretty scary.
VI: I don’t think a flawed treaty only buys us two years? Given how long it would take China to catch up to us if we stopped, it buys us at least a decade. Unless a different state—Russia, India, UAE? - is willing to buy up all our scientists, buy up a whole lot of compute, and start and sustain their own program for a decade, while no one does anything about it?
This also seems to assume that such programs would be undetectable. But the whole point of the treaty proposal is that massive quantities of compute would be traceable. I suppose if a state sponsored AGI research program were to look for methods that were much less compute intense, that would be pretty scary yes.
It also depends on what you think the relative outcomes are. What if a treaty increases the chance of s-risk from autocracies by 1%, but decreases the chance of s-risk from unaligned ASI from the current AI race by 5%?
Re: your “better way”—this basically rounds out to “massively increase technical safety research.” Yet is safety research safe? This is a whole separate issue, but most safety research tends to scale capabilities as much as or more than it scales safety. RLHF was safety research, and we got ChatGPT out of it. If we’re giving compute and money to anyone who can put the word “safety” in a grant proposal, I don’t think we’re going to actually get differentially safe safety research.
Meanwhile, policy and advocacy have gotten 10-50x less funding than technical safety research has. If you think the solutions we have available for governance are so unworkable, maybe we just have not tried hard enough?
I’m not so pessimistic. People outside of Silicon Valley have no problems at all with “If humans make machines vastly smarter than people, that’s an extinction threat”. They just don’t believe we’ll make machines vastly smarter than people. That could change quickly will mass unemployment / other big changes.
Important context for those who are yet to read it. Here is MIRI’s Draft Treaty: https://ifanyonebuildsit.com/treaty
It would be great to have a debate society that was fun to watch, facilitates expressing positions clearly and in their strongest / most relevant forms, and makes the disagreements actually face each other. (Some speculations here, but mainly just someone iterating on hosting lots of debates and building tools to help debaters find truth either together or adversarially.
Treaties are real to the degree that they are backed up by credible threat of enforcement + deterrence for non-compliance, in the same way that ordinary national and local law is real to the degree that it is actually enforced. There are lots of laws on the books in many places that are not actually enforced consistently or at all, for various reasons, some of which are analogous to the issues with international treaties that you gesture at. And this in fact causes all sorts of problems in many cases—disorder, unfairness / injustice, degrading trust and legitimacy of the legal system and the state, etc. But it doesn’t follow that the entire legal system of any given nation is largely fiction, nor that we should stop trying to pass new laws or enforce existing ones, and it definitely doesn’t mean that we should resort to some other kind of system where the state does not have a monopoly on violence. Analogously, it would be premature and fatalistic to give up on international treaties for AGI (and it’s not clear what the alternative could possibly be—if the leaders of superpowers are not the ones ultimately deciding what to do about AGI, who is?)
Also:
“International law” is a vague term that means different things to different people; the linked post you’re criticizing doesn’t use that phrase at all (and is mostly not about the topic), and you don’t say yourself exactly what you mean by it except via example.
The perceived risk could shift sufficiently for substantive responses if mankind runs into a large enough accident without being disempowered or extinguished yet. Whether coordination would work under such a scenario is another question. It’s also not something to count on, to hope for and certainly not to strive for.
Solution if life worked like Little Alchemy: We stop feeding smart young adults into the “Bermuda Triangle of Talent”: “consultancy, finance, and corporate law.” AI replaces these jobs. We use half of the money that rich people would’ve paid human consultants, bankers, and lawyers to pay the AI companies and the other half to pay young adults to work on the AI problem. They convince all the AI companies to stop increasing capabilities beyond what is required to do those specialized jobs. After this succeeds, everybody who did this gains ultimate prestige (helped save humanity), fulfilling the status symbol function that guarantees the rest of their career (worth taking the $300k to $150k pay cut for a few years), and the AI companies can start getting paid the other half of their money. Everyone is happy.
(I vaguely remember seeing that some people don’t like Rutger Bregman, but I think The School for Moral Ambition is a good first shot at tackling the “reroute people away from bullshit jobs into valuable ones” high level. Somebody criticized it for trying to moralwash elites, but it’s not about the “eliteness” of the people involved, just that a major incentive for job selection that impacts competent people is prestige/money)
Who, after all, today speaks of the American sanctions against Belarus?
You choosing this as an example but completely ignoring the far better illustration of this makes your point that there are absolutely no consequences to flouting such agreements if you’re powerful enough better than the rest of your post.