Most “AGI ban” proposals define AGI by outcome: whatever potentially leads to human extinction
Is this actually true? Most AGI ban proposals I hear about define it in terms of training compute or GPU restrictions. Eg in IABIED, they want to ban having ownership of >8 top-of-the-line GPUs. My impression is also that the “AI safety community” largely agrees the EU AI act is pretty bad (also it doesn’t say “no human extinction”).
My impression is also that the “AI safety community” largely agrees the EU AI act is pretty bad (also it doesn’t say “no human extinction”).
Personally, my expectations for the AI Act were low and I was quite pleasantly surprised when I skimmed it—it even mentioned corrigibility, so at least one person who was involved in making it has probably read some lesswrong posts.
Thanks for pushing back! My original claim was probably too broad. What I meant is that in advocacy/outreach circles, I often see “AGI ban” proposals framed in terms of outcomes (“prevent systems that could cause extinction”) rather than operational proxies.
You’re right that some proposals take the compute-restriction route, which I gesture at in the post. Sadly, I haven’t read IABIED yet (still waiting on my copy!), but I agree that a hard GPU threshold, while imperfect and bypassable, is at least tractable in a way “ban extinction-risk AI” isn’t.
On the AI Act: it definitely wasn’t drafted with frontier/GenAI in mind, and it shows. It never mentions “extinction,” but it does give a workable definition of systemic risk: “risk specific to the high-impact capabilities of general-purpose AI models … with reasonably foreseeable negative effects on public health, safety, fundamental rights, or society as a whole, propagated at scale.” That’s the baseline Europe is working with, so my job is to do my best with that!.
This post was mainly pointing to the gap: many “AGI ban” arguments invoke a future, extinction-linked technology without nailing down a workable precursor definition or proxy. That’s the piece I think we need to stress-test.
What I meant is that in advocacy/outreach circles, I often see “AGI ban” proposals framed in terms of outcomes (“prevent systems that could cause extinction”) rather than operational proxies.
In that case, shouldn’t the argument be one about whether this is the right phrase to advocate for, rather than whether its the right thing to write into law? For example, (whether you agree with the policy or not), the Federal Assault Rifle Ban banned many assault rifle “precursors”,
the manufacture, transfer, or possession of “semiautomatic assault weapons”, as defined by the Act. “Weapons banned were identified either by specific make or model (including copies or duplicates thereof, in any caliber), or by specific characteristics that slightly varied according to whether the weapon was a pistol, rifle, or shotgun” (see below).[18] The Act also prohibited the manufacture of “large capacity ammunition feeding devices” (LCAFDs) except for sale to government, law enforcement or military, though magazines made before the effective date (“pre-ban” magazines) were legal to possess and transfer. An LCAFD was defined as “any magazine, belt, drum, feed strip, or similar device manufactured after the date [of the act] that has the capacity of, or that can be readily restored or converted to accept, more than 10 rounds of ammunition.”[18]
Despite the ban being championed and titled an “assault rifle ban”. So its not clear that the five words asking to ban AGI should be extended to include all the details which would make a lawyer or lawmaker happy.
I think responsible advocacy has to take both dimensions seriously. I don’t have a problem with campaigns focusing on x-risk, I think they should. The issue is when a lawmaker or regulator presses for specifics and the answer stays at the level of “we want to ban anything that leads to extinction.” That kind of underspecification takes a toll when you want it translated into legal drafting.
On the example you mention: whatever its flaws, it didn’t just say “ban assault rifles.” It listed specific models, characteristics, and magazine capacities (e.g. >10 rounds) that counted as prohibited precursors. You can agree or disagree with where those lines were drawn, but the definitional work is what made the ban legible in law. I’m not sure what type of advocacy work went into this because I’m not a U.S. citizen or a U.S. lawyer, but I would think that, when prompted about what they wanted included in the legislation, they didn’t stop at “we want to ban assault rifles”…
What I’ve appreciated, for instance, about some PauseAI advocates is that they’ve been clearer with me: their goal is to halt development at the current capability level, and they’ve used a language similar to how the EU GPAI code of practice defines it. I still don’t know how politically feasible that is (as I noted in the post), but it at least gives a concrete frame.
Yes, advocacy work doesn’t need to be as precise as the legal wording that (hopefully) follows if it succeeds. But when definitional questions are raised, I think it’s crucial to be able to point to the thresholds and precursor capabilities you actually want prohibited.
The issue is when a lawmaker or regulator presses for specifics and the answer stays at the level of “we want to ban anything that leads to extinction.” That kind of underspecification takes a toll when you want it translated into legal drafting.
Let me see if I can explain where I’m coming from, by contrasting three bans.
Nuclear non-proliferation. Someone living in Russia or the US in the 1950s or 60s might quite reasonably expect that they and their children would die in nuclear fire. There were a couple of times when we came disturbingly close. We have avoided this fate (so far) because enough powerful decision makers on all sides could visualize and understand the threat. We were also lucky that a few key steps to building a nuclear weapon require nation-state resources, and are hard to do in secret. And enough experts understood the process well enough to find those key control points. Even so, we largely failed at preventing the proliferation of nuclear weapons to small rogue states. Thankfully, we have so far avoided their further use.
The “assault weapons” ban. The policy goal behind this ban was something like “we would like to see fewer gun deaths.” One key challenge of achieving this was that the US still wanted to allow people to own, say, a nice walnut-stock hunting rifle chambered for Winchester .300 Magnum or .450 Bushmaster. We only wanted to ban the “military style” weapons. But it turns out that it’s much harder to stop an angry bull moose than a human being. Some of those family heirloom hunting rifles are effectively extremely powerful, long-range sniper rifles. And some of them are semi-automatic.
But the authors of the “assault weapons” ban still wanted to ban “weapons of war.” (Not including actual full-auto rifles, which were banned long ago.) So how do you distinguish between grandpa’s Winchester .300 Magnum that he used to hunt elk, and an AR-15? Well, it mostly comes down to two things:
How many shots can you fire and how quickly, before you need to do something more complicated than pulling the trigger? The theory is that at least some mass shooters get tackled or otherwised stopped while they’re messing around with reloading. It’s not foolproof, but it is actually a coherent policy goal that someone might want.
Does your weapon look like a scary military weapon? Hint: That link is a trick. It’s a perfectly nice rifle, but it’s a classic .22 bolt action target rifle that has trouble killing a woodchuck humanely.
So in practice, a large chunk of the assault weapon ban is largely aesthetic. And it didn’t stop the mass proliferation of “tactical” gun culture and AR-15s. As annoying as I find the median AR-15 owner at the range, the AR-15 is mainly a cultural problem. More powerful and more dangerous hunting rifles already existed, and nobody ever tried to ban those.
So at least from my perspective, the “assault weapon” ban largely failed to achieve the policy goal of “less gun violence in the US.” And that’s because a lot of the specific things it regulated didn’t connect especially well to the underlying goal.
Superintelligence. For the sake of argument, let’s assume that (1) we have a real chance of building something much smarter than any human in the next 20 years, and (2) we won’t really understand how it works, and our best “alignment” tools will be the AI equivalent of high school sex ed classes for SkyNet. To be precise, assume we can roughly communicate our ideal goals to the AI and we can get it to make the right noises when asked. But ultimately it’s an incomprehensible alien superintelligence, and it’s going to do whatever the hell it wants. Just like human teenagers.
By default, in this scenario, the AI runs rings around us, and it ends up making all the really important decisions about the future without our input. The only way to avoid this is to strictly enforce the First Rule of Demonology: “Don’t call up what you can’t put down.” But as you rightfully point out, that’s not exactly a bright-line policy.
But I don’t think the answer is to write out a complicated set of regulations like the “assault weapons” ban, many of which are unrelated to the underlying policy goal. I think the actual way that we all survive is because the leadership of the US and China feels in their gut that building poorly-understood alien superintelligences is likely to kill both them and their children. Then they’ll make the message clear: “Here’s a long list of things you shouldn’t do unless you want to wind up on the wrong end of a joint military strike. And if you discover a new way to build an alien superintelligence that we didn’t put on this list? Yeah, that’s forbidden, too.”
The real goal is to win over enough senior political and military leadership in the superpowers to actually enforce a “no building incomprehensible alien superintelligences” rule. Probably we can’t actually win that argument yet, because people don’t really believe that we’ll ever be able to build them! But when and if people realize it’s possible to build alien superintelligences (and maybe after the first terrifying near-miss), then I hope we can provide people with a mental framework explaining why this is a terrible idea.
Will this work? Who knows. But better to try to win over the key decision makers, than to just accept that our entire species’ last words will be “Hold my beer” while someone powers up an alien superintelligence that we don’t understand at all.
I actually enjoyed reading this comment a lot, thank you! Particularly well argued!
I’d agree with a lot of this, but:
“I think the actual way that we all survive is because the leadership of the US and China feels in their gut that building poorly-understood alien superintelligences is likely to kill both them and their children. Then they’ll make the message clear: “Here’s a long list of things you shouldn’t do unless you want to wind up on the wrong end of a joint military strike. And if you discover a new way to build an alien superintelligence that we didn’t put on this list? Yeah, that’s forbidden, too.”
As much as I understand the underlying concern—I think this is is actually how it fails.
In most conversations I have with activists, I always circle back to this point: policymakers do not have a lot of leeway when Bigtech’s interests are at play. I understand this varies from jurisdiction to jurisdiction. As an example, I know a few people who participated in the working group for the GPAI Code of Practice in Europe. I kept hearing how the versions were constantly watered down following the feedback of the future signatories ( the Big Labs). Personal frustrations of these experts included the feeling of helplessness: they either conceded, or risked delays or the Code not being signed at all. And, mind you: this is a voluntary framework! Can you imagine how it is in the case of absolute, binding regulation?
Yes, getting the support of government leaders is important. But in terms the wording of the ban; what’s written down and signed, is what ends up in the desk of OpenAI (and others)’s in-house counsel with instructions to find a way around it. Which is kind of the main point of the post: if AIS orgs don’t prioritise legal strategy when putting such proposals forward, they’ll only make it easier for people like me (working for the other side) to argue their way out of it. I don’t want that to happen.
Thank you for your thoughtful and excellent response!
In most conversations I have with activists, I always circle back to this point: policymakers do not have a lot of leeway when Bigtech’s interests are at play.
If this remains true if and when we approach superintelligence, then there’s an excellent chance we all die.
Yes, getting the support of government leaders is important. But in terms the wording of the ban; what’s written down and signed, is what ends up in the desk of OpenAI (and others)’s in-house counsel with instructions to find a way around it.
Again, I fear that this is a future in which we all die. (To be fair, I think there are a lot of futures in which we all die, if we figure out how to build an incomprehensible alien superintelligence.)
Since Yudkowsky and Soares had such fun with parables, let me try one.
FrontierBioCo. You’re a government regulator of the pharma industry, and FrontierBioCo has announced that they’re researching cures for infectious diseases. As part of this, they’re doing gain-of-function research. They’ve announced two targets:
A version of MERS-CoV that increases the existing 30% case fatality rate, while achieving a slightly higher R value than the SARS-CoV-2 Omicron strain in human-to-human transmission, making it one of the most contagious diseases known.
A version of Ebola which can be transmitted via airborne droplets, with extremely high R values.
And when you look closer, you see that they brag about their BSL 4 containment labs, but that their head of biosafety just quit, muttering, “BSL 4 my ass, it isn’t even BSL 2.” And it’s quite clear that FrontierBioCo’s management and legal team have every intention of exploiting regulatory loopholes. And they brag about “moving fast and breaking things”. Including laws and most likely sample vials.
Let’s say you don’t want to die from airborne Ebola or the new MERS-CoV variant codenamed “Armaggedon”. What’s your regulatory strategy here?
My regulatory strategy would start by calling in biologists and military advisors. And I’d ask questions like, “What is the minimum level of response that we are certain guarantees containment of FrontierBioCo’s labs?” And when FrontierBioCo’s lawyers later insist that were technically complying with the law, my response involves disbarment. And trying really hard to find a way to sentence everyone involved to prison. Because I believe that sometimes you need to actually listen to Ellen Ripley if you want to live.
The point of this heavy-handed parable is that as long as the frontier labs are intent on building superintelligence, and as long as we allow them to play cute legal games, then our odds may be very bad indeed.
Which brings me to my next questions...
What does a (mostly) successful technology ban look like? The closest real-world analogy here is nuclear non-proliferation. We haven’t actually stopped the technology from spreading entirely, but we’ve limited the spread to mid-tier nation states, and we’ve successfully avoided anyone using nuclear weapons. There seem to have been a couple of key points:
Mutually Assured Destruction (MAD) actually does seem to work in the medium term, even with regional powers. India and Pakistan haven’t nuked each other yet. So I guess this is a point for the terrifying Cold War planners? (“The whole point of a Doomsday machine is lost if you keep it a secret!”)
Key technology bottlenecks. I’ve worked in national-security-adjacent fields, and one of the scarier claims we heard from the actual national security types was that building basic nuclear weapons mostly isn’t that hard. In particular, many top university engineering departments could apparently do it if they really tried. But there’s one important catch: It’s surprisingly hard for a rogue organization to lay hands on fissile materials without getting noticed by the superpowers.
Note that the fissile material ban is frequently enforced either by crippling economic sanctions or by military strikes on enrichment facilities. This isn’t usually a question settled by who has the most clever lawyers, or by the regulatory fine print.
What do we need to do to prevent someone building an incomprehensible alien superintelligence? Honestly, the scary part is I don’t know. I can see two major routes by which we might get there:
Scaling laws. Perhaps building superintelligence will require scaling up to 10x or 1000x our current training runs. In this case, if you control the data centers, you can prevent the training runs. This is actually closely analogous to controlling nuclear weapons.
Algorithmic improvements. If we’re unlucky, then perhaps superintelligence works more like “reasoning models”. Reasoning models represented a sharp increase in mathematical and coding skills. And it took about 4 months from OpenAI’s announcement of o1-preview until random university researchers could add reasoning support to an existing model for under $5,000. If building a superintelligence is this easy, then we live in what Bolstrom called a “vulnerable world” (PDF). This is the world where you can hypothetically build fusion bombs using pool cleaning chemicals. Again, this is a possible world in which I suspect we all die.
If we live in world (1), then my ideal advice is to immediately and permanently cap the size of training runs. (This may fail, for the reasons you point out. But we should try.) If we live in world (2), then I don’t have any advice beyond “Hug your kids.”
Again, one of my favorite comments in this post! I also appreciate you organising your thoughts at the best of your capacity while acknowledging the unknowns.
“The point of this heavy-handed parable is that as long as the frontier labs are intent on building superintelligence, and as long as we allow them to play cute legal games, then our odds may be very bad indeed.” I obviously agree, painfully so. And I really appreciate the tangible measures you propose (caps on training runs, controls of data centres).
Regarding the “Mutually Assured Destruction point”: About two days ago, Dan Hendrycs released an article called AI Deterrence is our Best Option. Some of the arguments you raised (in connection to your national security experience) reminded me of his Mutual Assured AI Malfunction deterrence framework. I’m curious to know your thoughts!
I am definetly not an international cooperation policy expert[1]. In fact, these discussions and the deep thinking that have gone towards them are giving me a renewed respect for international policy experts. @Harold also raised very good points about the different behavious towards “ambiguity” (or what I perceive as ambiguous) in different legal systems. In his case, speaking from his experience with the Japanese legal system.
In the future, I will focus on leveraging what I know of international private law (contract laws in multinational deals) to re-think legal strategy using these insights.
My legal background is in corporate law, and I have experience working at multinational tech companies and Fintech startups (also, some human rights law experience). Which is why I focus on predictable strategies that tech companies could adopt to “Goodhart” their compliance with AI laws or a prohibition to develop AGI.
Just on the point of MAIM, I would point out that one of the authors of that paper (Alexandr Wang) has seemingly jumped ship from the side of “stop superintelligence being built” [1] to the side of “build superintelligence ASAP”, since he now heads up the somewhat unsubtly named “Meta Superintelligence Labs” as Chief AI officer.
[1]: I mean, as the head of Scale AI (a company that produces AI training data), I’m not sure he was ever on the side of “stop superintelligence from being built”, but he did coauthor the paper apparently.
To be clear, I have never been an actual national security person! But once upon a time, my coworkers occasionally needed to speak with non-proliferation people. (It’s a long story.)
Some of the arguments you raised (in connection to your national security experience) reminded me of his Mutual Assured AI Malfunction deterrence framework. I’m curious to know your thoughts!
I don’t know if that specific deterence regime is workable or not, but it’s a good direction to think about. One difference from nuclear weapons is that if you’re the only country with nukes, then you can sort of “win” a nuclear war (like happened at the end of WW2). But being the only country with an incomprehensible alien superintelligence is more like being the only country with a highly infectious airborne Ebola strain. Actually using it is Unilaterally Assured Destruction, to coin a term.
But let me make one last attempt to turn this intuition into a policy proposal.
Key decision makers need to realize in their gut that “building an incomprehensible alien superintelligence that’s really good at acting” is one of those things like “engineering highly infectious airborne Ebola” or “allowing ISIS to build megaton fusion weapons.” Unfortunately, the frontier labs are very persuasive right now, but the game isn’t over yet.
If the primary threat involves scaling, then we would need to control data centers containing more than a certain number of GPUs. Large numbers of GPUs would need to be treated like large numbers of gas centrifuges, basically. Or maybe specific future types of GPUs will need tighter controls.
If the primary or secondary threat involves improved algorithms, then we may need to treat that like nuclear non-proliferation, too. I know a physics major who once spoke to the aforementioned national security people about nuclear threats, and he asked, “But what about if you did XYZ?” The experts suddenly got quiet, and then they said, “No comment. Also, we would really appreciate it if you never mentioned that again to anybody.” There are things at the edges of the nuclear control regime that aren’t enforceably secret, but that we still don’t want to see posted all over the Internet. Some of the enforcement around this is apparently handled by unofficially explaining to smart people that they should pretty please Shut The Fuck Up. Or maybe certain algorithms need to be classified the way we classify the most important nuclear secrets.
It’s likely that we will need an international deterence regime. There are people with deep expertise in this.
One key point is that nobody here actually knows how to build a superintelligence. What we do have is a lot of people (including world class experts like Geoffrey Hinton) who strongly suspect that we’re close to figuring it out. And we have multiple frontier lab leaders who have stated that they plan to build “superintelligence” this decade. And we have a bunch of people who have noticed that (1) we don’t remotely understand how even current LLMs work, and (2) the AI labs’ plans for controlling a superintelligence are firmly in “underpants gnome” territory. And also even their optimistic employees regularly admit they’re playing Russian roulette with the human race. You don’t need deep knowledge of machine learning to suspect that this is the setup for a bad Hollywood movie about hubris.
But key details of how to build superintelligence are still unknown, happily. So it’s hard to make specific policy prescriptions. Szilard and Einstein could correctly see that fission was dangerous, but they couldn’t propose detailed rules for controlling centrifuges.
I do suspect that corporate legal compliance expertise will play a key role here! And thank you for working on it. We’ll need your expertise. But legal compliance can’t be the only tool in our toolkit. If we tried to enforce nuclear nonproliferation the way we try to enforce the GDPR, we’d probably already be dead. You will need buy-in and back-up of the sort that upholds nuclear non-proliferation. And that’s going to require a major attitude change.
(But I will also hit up my lawyer friends and see if they have more concrete advice.)
Is this actually true? Most AGI ban proposals I hear about define it in terms of training compute or GPU restrictions. Eg in IABIED, they want to ban having ownership of >8 top-of-the-line GPUs. My impression is also that the “AI safety community” largely agrees the EU AI act is pretty bad (also it doesn’t say “no human extinction”).
Personally, my expectations for the AI Act were low and I was quite pleasantly surprised when I skimmed it—it even mentioned corrigibility, so at least one person who was involved in making it has probably read some lesswrong posts.
Thanks for pushing back! My original claim was probably too broad. What I meant is that in advocacy/outreach circles, I often see “AGI ban” proposals framed in terms of outcomes (“prevent systems that could cause extinction”) rather than operational proxies.
You’re right that some proposals take the compute-restriction route, which I gesture at in the post. Sadly, I haven’t read IABIED yet (still waiting on my copy!), but I agree that a hard GPU threshold, while imperfect and bypassable, is at least tractable in a way “ban extinction-risk AI” isn’t.
On the AI Act: it definitely wasn’t drafted with frontier/GenAI in mind, and it shows. It never mentions “extinction,” but it does give a workable definition of systemic risk: “risk specific to the high-impact capabilities of general-purpose AI models … with reasonably foreseeable negative effects on public health, safety, fundamental rights, or society as a whole, propagated at scale.” That’s the baseline Europe is working with, so my job is to do my best with that!.
This post was mainly pointing to the gap: many “AGI ban” arguments invoke a future, extinction-linked technology without nailing down a workable precursor definition or proxy. That’s the piece I think we need to stress-test.
In that case, shouldn’t the argument be one about whether this is the right phrase to advocate for, rather than whether its the right thing to write into law? For example, (whether you agree with the policy or not), the Federal Assault Rifle Ban banned many assault rifle “precursors”,
Despite the ban being championed and titled an “assault rifle ban”. So its not clear that the five words asking to ban AGI should be extended to include all the details which would make a lawyer or lawmaker happy.
I think responsible advocacy has to take both dimensions seriously. I don’t have a problem with campaigns focusing on x-risk, I think they should. The issue is when a lawmaker or regulator presses for specifics and the answer stays at the level of “we want to ban anything that leads to extinction.” That kind of underspecification takes a toll when you want it translated into legal drafting.
On the example you mention: whatever its flaws, it didn’t just say “ban assault rifles.” It listed specific models, characteristics, and magazine capacities (e.g. >10 rounds) that counted as prohibited precursors. You can agree or disagree with where those lines were drawn, but the definitional work is what made the ban legible in law. I’m not sure what type of advocacy work went into this because I’m not a U.S. citizen or a U.S. lawyer, but I would think that, when prompted about what they wanted included in the legislation, they didn’t stop at “we want to ban assault rifles”…
What I’ve appreciated, for instance, about some PauseAI advocates is that they’ve been clearer with me: their goal is to halt development at the current capability level, and they’ve used a language similar to how the EU GPAI code of practice defines it. I still don’t know how politically feasible that is (as I noted in the post), but it at least gives a concrete frame.
Yes, advocacy work doesn’t need to be as precise as the legal wording that (hopefully) follows if it succeeds. But when definitional questions are raised, I think it’s crucial to be able to point to the thresholds and precursor capabilities you actually want prohibited.
Let me see if I can explain where I’m coming from, by contrasting three bans.
Nuclear non-proliferation. Someone living in Russia or the US in the 1950s or 60s might quite reasonably expect that they and their children would die in nuclear fire. There were a couple of times when we came disturbingly close. We have avoided this fate (so far) because enough powerful decision makers on all sides could visualize and understand the threat. We were also lucky that a few key steps to building a nuclear weapon require nation-state resources, and are hard to do in secret. And enough experts understood the process well enough to find those key control points. Even so, we largely failed at preventing the proliferation of nuclear weapons to small rogue states. Thankfully, we have so far avoided their further use.
The “assault weapons” ban. The policy goal behind this ban was something like “we would like to see fewer gun deaths.” One key challenge of achieving this was that the US still wanted to allow people to own, say, a nice walnut-stock hunting rifle chambered for Winchester .300 Magnum or .450 Bushmaster. We only wanted to ban the “military style” weapons. But it turns out that it’s much harder to stop an angry bull moose than a human being. Some of those family heirloom hunting rifles are effectively extremely powerful, long-range sniper rifles. And some of them are semi-automatic.
But the authors of the “assault weapons” ban still wanted to ban “weapons of war.” (Not including actual full-auto rifles, which were banned long ago.) So how do you distinguish between grandpa’s Winchester .300 Magnum that he used to hunt elk, and an AR-15? Well, it mostly comes down to two things:
How many shots can you fire and how quickly, before you need to do something more complicated than pulling the trigger? The theory is that at least some mass shooters get tackled or otherwised stopped while they’re messing around with reloading. It’s not foolproof, but it is actually a coherent policy goal that someone might want.
Does your weapon look like a scary military weapon? Hint: That link is a trick. It’s a perfectly nice rifle, but it’s a classic .22 bolt action target rifle that has trouble killing a woodchuck humanely.
So in practice, a large chunk of the assault weapon ban is largely aesthetic. And it didn’t stop the mass proliferation of “tactical” gun culture and AR-15s. As annoying as I find the median AR-15 owner at the range, the AR-15 is mainly a cultural problem. More powerful and more dangerous hunting rifles already existed, and nobody ever tried to ban those.
So at least from my perspective, the “assault weapon” ban largely failed to achieve the policy goal of “less gun violence in the US.” And that’s because a lot of the specific things it regulated didn’t connect especially well to the underlying goal.
Superintelligence. For the sake of argument, let’s assume that (1) we have a real chance of building something much smarter than any human in the next 20 years, and (2) we won’t really understand how it works, and our best “alignment” tools will be the AI equivalent of high school sex ed classes for SkyNet. To be precise, assume we can roughly communicate our ideal goals to the AI and we can get it to make the right noises when asked. But ultimately it’s an incomprehensible alien superintelligence, and it’s going to do whatever the hell it wants. Just like human teenagers.
By default, in this scenario, the AI runs rings around us, and it ends up making all the really important decisions about the future without our input. The only way to avoid this is to strictly enforce the First Rule of Demonology: “Don’t call up what you can’t put down.” But as you rightfully point out, that’s not exactly a bright-line policy.
But I don’t think the answer is to write out a complicated set of regulations like the “assault weapons” ban, many of which are unrelated to the underlying policy goal. I think the actual way that we all survive is because the leadership of the US and China feels in their gut that building poorly-understood alien superintelligences is likely to kill both them and their children. Then they’ll make the message clear: “Here’s a long list of things you shouldn’t do unless you want to wind up on the wrong end of a joint military strike. And if you discover a new way to build an alien superintelligence that we didn’t put on this list? Yeah, that’s forbidden, too.”
The real goal is to win over enough senior political and military leadership in the superpowers to actually enforce a “no building incomprehensible alien superintelligences” rule. Probably we can’t actually win that argument yet, because people don’t really believe that we’ll ever be able to build them! But when and if people realize it’s possible to build alien superintelligences (and maybe after the first terrifying near-miss), then I hope we can provide people with a mental framework explaining why this is a terrible idea.
Will this work? Who knows. But better to try to win over the key decision makers, than to just accept that our entire species’ last words will be “Hold my beer” while someone powers up an alien superintelligence that we don’t understand at all.
I actually enjoyed reading this comment a lot, thank you! Particularly well argued!
I’d agree with a lot of this, but:
“I think the actual way that we all survive is because the leadership of the US and China feels in their gut that building poorly-understood alien superintelligences is likely to kill both them and their children. Then they’ll make the message clear: “Here’s a long list of things you shouldn’t do unless you want to wind up on the wrong end of a joint military strike. And if you discover a new way to build an alien superintelligence that we didn’t put on this list? Yeah, that’s forbidden, too.”
As much as I understand the underlying concern—I think this is is actually how it fails.
In most conversations I have with activists, I always circle back to this point: policymakers do not have a lot of leeway when Bigtech’s interests are at play. I understand this varies from jurisdiction to jurisdiction. As an example, I know a few people who participated in the working group for the GPAI Code of Practice in Europe. I kept hearing how the versions were constantly watered down following the feedback of the future signatories ( the Big Labs). Personal frustrations of these experts included the feeling of helplessness: they either conceded, or risked delays or the Code not being signed at all. And, mind you: this is a voluntary framework! Can you imagine how it is in the case of absolute, binding regulation?
Yes, getting the support of government leaders is important. But in terms the wording of the ban; what’s written down and signed, is what ends up in the desk of OpenAI (and others)’s in-house counsel with instructions to find a way around it. Which is kind of the main point of the post: if AIS orgs don’t prioritise legal strategy when putting such proposals forward, they’ll only make it easier for people like me (working for the other side) to argue their way out of it. I don’t want that to happen.
Thank you for your thoughtful and excellent response!
If this remains true if and when we approach superintelligence, then there’s an excellent chance we all die.
Again, I fear that this is a future in which we all die. (To be fair, I think there are a lot of futures in which we all die, if we figure out how to build an incomprehensible alien superintelligence.)
Since Yudkowsky and Soares had such fun with parables, let me try one.
FrontierBioCo. You’re a government regulator of the pharma industry, and FrontierBioCo has announced that they’re researching cures for infectious diseases. As part of this, they’re doing gain-of-function research. They’ve announced two targets:
A version of MERS-CoV that increases the existing 30% case fatality rate, while achieving a slightly higher R value than the SARS-CoV-2 Omicron strain in human-to-human transmission, making it one of the most contagious diseases known.
A version of Ebola which can be transmitted via airborne droplets, with extremely high R values.
And when you look closer, you see that they brag about their BSL 4 containment labs, but that their head of biosafety just quit, muttering, “BSL 4 my ass, it isn’t even BSL 2.” And it’s quite clear that FrontierBioCo’s management and legal team have every intention of exploiting regulatory loopholes. And they brag about “moving fast and breaking things”. Including laws and most likely sample vials.
Let’s say you don’t want to die from airborne Ebola or the new MERS-CoV variant codenamed “Armaggedon”. What’s your regulatory strategy here?
My regulatory strategy would start by calling in biologists and military advisors. And I’d ask questions like, “What is the minimum level of response that we are certain guarantees containment of FrontierBioCo’s labs?” And when FrontierBioCo’s lawyers later insist that were technically complying with the law, my response involves disbarment. And trying really hard to find a way to sentence everyone involved to prison. Because I believe that sometimes you need to actually listen to Ellen Ripley if you want to live.
The point of this heavy-handed parable is that as long as the frontier labs are intent on building superintelligence, and as long as we allow them to play cute legal games, then our odds may be very bad indeed.
Which brings me to my next questions...
What does a (mostly) successful technology ban look like? The closest real-world analogy here is nuclear non-proliferation. We haven’t actually stopped the technology from spreading entirely, but we’ve limited the spread to mid-tier nation states, and we’ve successfully avoided anyone using nuclear weapons. There seem to have been a couple of key points:
Mutually Assured Destruction (MAD) actually does seem to work in the medium term, even with regional powers. India and Pakistan haven’t nuked each other yet. So I guess this is a point for the terrifying Cold War planners? (“The whole point of a Doomsday machine is lost if you keep it a secret!”)
Key technology bottlenecks. I’ve worked in national-security-adjacent fields, and one of the scarier claims we heard from the actual national security types was that building basic nuclear weapons mostly isn’t that hard. In particular, many top university engineering departments could apparently do it if they really tried. But there’s one important catch: It’s surprisingly hard for a rogue organization to lay hands on fissile materials without getting noticed by the superpowers.
Note that the fissile material ban is frequently enforced either by crippling economic sanctions or by military strikes on enrichment facilities. This isn’t usually a question settled by who has the most clever lawyers, or by the regulatory fine print.
What do we need to do to prevent someone building an incomprehensible alien superintelligence? Honestly, the scary part is I don’t know. I can see two major routes by which we might get there:
Scaling laws. Perhaps building superintelligence will require scaling up to 10x or 1000x our current training runs. In this case, if you control the data centers, you can prevent the training runs. This is actually closely analogous to controlling nuclear weapons.
Algorithmic improvements. If we’re unlucky, then perhaps superintelligence works more like “reasoning models”. Reasoning models represented a sharp increase in mathematical and coding skills. And it took about 4 months from OpenAI’s announcement of o1-preview until random university researchers could add reasoning support to an existing model for under $5,000. If building a superintelligence is this easy, then we live in what Bolstrom called a “vulnerable world” (PDF). This is the world where you can hypothetically build fusion bombs using pool cleaning chemicals. Again, this is a possible world in which I suspect we all die.
If we live in world (1), then my ideal advice is to immediately and permanently cap the size of training runs. (This may fail, for the reasons you point out. But we should try.) If we live in world (2), then I don’t have any advice beyond “Hug your kids.”
Again, one of my favorite comments in this post! I also appreciate you organising your thoughts at the best of your capacity while acknowledging the unknowns.
“The point of this heavy-handed parable is that as long as the frontier labs are intent on building superintelligence, and as long as we allow them to play cute legal games, then our odds may be very bad indeed.” I obviously agree, painfully so. And I really appreciate the tangible measures you propose (caps on training runs, controls of data centres).
Regarding the “Mutually Assured Destruction point”: About two days ago, Dan Hendrycs released an article called AI Deterrence is our Best Option. Some of the arguments you raised (in connection to your national security experience) reminded me of his Mutual Assured AI Malfunction deterrence framework. I’m curious to know your thoughts!
I am definetly not an international cooperation policy expert[1]. In fact, these discussions and the deep thinking that have gone towards them are giving me a renewed respect for international policy experts. @Harold also raised very good points about the different behavious towards “ambiguity” (or what I perceive as ambiguous) in different legal systems. In his case, speaking from his experience with the Japanese legal system.
In the future, I will focus on leveraging what I know of international private law (contract laws in multinational deals) to re-think legal strategy using these insights.
This has been very helpful!
My legal background is in corporate law, and I have experience working at multinational tech companies and Fintech startups (also, some human rights law experience). Which is why I focus on predictable strategies that tech companies could adopt to “Goodhart” their compliance with AI laws or a prohibition to develop AGI.
Just on the point of MAIM, I would point out that one of the authors of that paper (Alexandr Wang) has seemingly jumped ship from the side of “stop superintelligence being built” [1] to the side of “build superintelligence ASAP”, since he now heads up the somewhat unsubtly named “Meta Superintelligence Labs” as Chief AI officer.
[1]: I mean, as the head of Scale AI (a company that produces AI training data), I’m not sure he was ever on the side of “stop superintelligence from being built”, but he did coauthor the paper apparently.
Also, Dan Hendrycks works at xAI and makes capability benchmarks.
Oh :/. Thank you for bringing this to my attention!
To be clear, I have never been an actual national security person! But once upon a time, my coworkers occasionally needed to speak with non-proliferation people. (It’s a long story.)
I don’t know if that specific deterence regime is workable or not, but it’s a good direction to think about. One difference from nuclear weapons is that if you’re the only country with nukes, then you can sort of “win” a nuclear war (like happened at the end of WW2). But being the only country with an incomprehensible alien superintelligence is more like being the only country with a highly infectious airborne Ebola strain. Actually using it is Unilaterally Assured Destruction, to coin a term.
But let me make one last attempt to turn this intuition into a policy proposal.
Key decision makers need to realize in their gut that “building an incomprehensible alien superintelligence that’s really good at acting” is one of those things like “engineering highly infectious airborne Ebola” or “allowing ISIS to build megaton fusion weapons.” Unfortunately, the frontier labs are very persuasive right now, but the game isn’t over yet.
If the primary threat involves scaling, then we would need to control data centers containing more than a certain number of GPUs. Large numbers of GPUs would need to be treated like large numbers of gas centrifuges, basically. Or maybe specific future types of GPUs will need tighter controls.
If the primary or secondary threat involves improved algorithms, then we may need to treat that like nuclear non-proliferation, too. I know a physics major who once spoke to the aforementioned national security people about nuclear threats, and he asked, “But what about if you did XYZ?” The experts suddenly got quiet, and then they said, “No comment. Also, we would really appreciate it if you never mentioned that again to anybody.” There are things at the edges of the nuclear control regime that aren’t enforceably secret, but that we still don’t want to see posted all over the Internet. Some of the enforcement around this is apparently handled by unofficially explaining to smart people that they should pretty please Shut The Fuck Up. Or maybe certain algorithms need to be classified the way we classify the most important nuclear secrets.
It’s likely that we will need an international deterence regime. There are people with deep expertise in this.
One key point is that nobody here actually knows how to build a superintelligence. What we do have is a lot of people (including world class experts like Geoffrey Hinton) who strongly suspect that we’re close to figuring it out. And we have multiple frontier lab leaders who have stated that they plan to build “superintelligence” this decade. And we have a bunch of people who have noticed that (1) we don’t remotely understand how even current LLMs work, and (2) the AI labs’ plans for controlling a superintelligence are firmly in “underpants gnome” territory. And also even their optimistic employees regularly admit they’re playing Russian roulette with the human race. You don’t need deep knowledge of machine learning to suspect that this is the setup for a bad Hollywood movie about hubris.
But key details of how to build superintelligence are still unknown, happily. So it’s hard to make specific policy prescriptions. Szilard and Einstein could correctly see that fission was dangerous, but they couldn’t propose detailed rules for controlling centrifuges.
I do suspect that corporate legal compliance expertise will play a key role here! And thank you for working on it. We’ll need your expertise. But legal compliance can’t be the only tool in our toolkit. If we tried to enforce nuclear nonproliferation the way we try to enforce the GDPR, we’d probably already be dead. You will need buy-in and back-up of the sort that upholds nuclear non-proliferation. And that’s going to require a major attitude change.
(But I will also hit up my lawyer friends and see if they have more concrete advice.)