Two somewhat different plans for buying time and improving AI outcomes are: “Global Shutdown” and “Global Controlled Takeoff.”

(Some other plans some people believe in include “ad hoc semi-controlled semi-slowed takeoff” and “race, then burn the lead on either superalignment or scary demos” and “decentralized differential defensive tech world.”. I mostly don’t expect those to work, but am mostly not talking about them in this post.)

“Global Shutdown” and “Global Controlled Takeoff” both include an early step of “consolidate all GPUs and similar chips into locations that can be easily monitored.”

The Shut Down plan then says things like “you cannot do any frontier development with the consolidated GPUs” (maybe you can use GPUs to run existing models that seem pretty safe, depends on implimentation details). Also, maybe, any research into new algorithms needs to be approved by an international org, and frontier algorithm development is illegal. (This is maybe hard to enforce, but, it is might dramatically reduce the amount of R&D that goes into it, since you can’t be a billion dollar company who straightforwardly pours tons of resources into it without going to jail)

Controlled Takeoff says instead (as I currently understand advocates to advocate) something like “Frontier research continues, slowly, carefully, leveraging frontier controlled AI to do a ton of alignment research.”

I’m generally pro “Shut It Down”, but I also think Global Controlled Takeoff is much better than the status quo (both because it seems better in isolation, and because achieving it makes Shut Down easier), and I see some of the appeal depending on your exact beliefs.

But, some notes on strategy here.

“What’s more impossible?”

A lot of AI safety arguments boil down to “what seems least impossible?”. Is it more impossible to get a Global Shutdown, or to solve safe superintelligence with anything remotely like our current understanding or the understanding we’re likely to get over the next 5-10 years?

I’ve heard a number of people say flatly “you’re not going to get a global shut down”, with a tone of finality that sounds like they think this is basically impossible.

I’m not entirely sure I’ve correctly tracked which people are saying which things and whether I’m accidentally conflating statements from different people. But I think I’ve heard at least some people say “you’re not getting a shut down” with that tone, who nonetheless advocate for controlled takeoff.

I certainly agree getting a global shutdown is very hard. But it’s not obvious to me that getting a global controlled takeoff is much easier.

Two gears I want to make sure people are tracking:

Gear 1: “Consolidate and monitor the GPUs” is a huge political lift, regardless.

By the time you’ve gotten various world powers and corporations to do this extremely major, expensive action, I think something has significantly changed about the political landscape. I don’t see how you’d get it without world leaders taking AI more “fundamentally seriously”, in a way that would make other expensive plans a lot more tractable.

Gear 2: “You need to compare the tractability of Global Shut Down vs Global Controlled Takeoff That Actually Works, as opposed to Something That Looks Close To But Not Actually A Controlled Takeoff.”

Along with Gear 3: “Shut it down” is much simpler than “Controlled Takeoff.”

A Global Controlled Takeoff That Works has a lot of moving parts.

You need the international agreement to be capable of making any kind of sensible distinctions between safe and unsafe training runs, or even “marginally safer” vs “marginally less safe” training runs.

You need the international agreement to not turn into molochian regulatory-captured horror that perversely reverses the intent of the agreement and creates a class of bureaucrats who don’t know anything about AI and use the agreement to dole out favors.

These problems still exist in some versions of Shut It Down too, to be clear (if you’re trying to also ban algorithmic research – a lot of versions of that seem like they leave room to argue about whether agent foundations or interpretability count). But, they at least get coupled with “no large training runs, period.”

I think “guys, everyone just stop” is a way easier schelling point to coordinate around, than “everyone, we’re going to slow down and try to figure out alignment as best we can using current techniques.”

So, I am not currently convinced that Global Controlled Takeoff That Actually Works is any more politically tractable than Global Shut Down.

(Caveat: Insofar as your plan is “well, we will totally get a molochian moral maze horror, but, it’ll generally move slower and that buys time”, eh, okay, seems reasonable. But, at least be clear to yourself about what you’re aiming for)

Gear 4: Removing pressure to accelerate is valuable for the epistemics of the people doing the AI-assisted alignment (if you’re trying that).

One reason I think the Anthropic plan is actively bad, instead of “at-least-okay-ish,” is that (given how hard they seem to actively oppose any kind of serious regulation that would slow them down), they seem intent on remaining in a world where, while they are supposedly working on aligning the next generation of AI, they have constant economic pressure to ship the next thing soon.

I believe, maybe, you can leverage AI to help you align AI.

I am pretty confident that at least some of the tools you need to navigate aligning unbounded superintelligence (or confidently avoiding creating unbounded superintelligence,) involves “precise conceptual reasoning” of a kind Anthropic-et-all seem actively allergic to. (see also behaviorism vs cognitivism. Anthropic culture seems to actively pride itself on empirics and be actively suspicious of attempts to reason ahead without empirics)

I’m not confident that you need that much precise conceptual reasoning / reasoning ahead. (MIRI has an inside view that says this is… not like impossible hard, but, is hard in a fairly deep way that nobody is showing respect for. I don’t have a clear inside view about “how hard is it”, but I have an inside view that it’s harder than Anthropic’s revealed actions thinks it is)

I think thinking through this and figuring out whether you need conceptual tools that you aren’t currently good at in order to succeed, is very hard, and people are extremely biased about it.

I think the difficult is exacerbated further if your competitor is shipping the next generation of product, and know-in-your-heart that you’re reaching ASL danger levels that at least should give you some pause to think about it, but, the evidence isn’t clear, and it would be extremely convenient for you and your org if your current level of control/alignment was sufficient to run the next training run.

So a lot of what I care most about with Shutdown/Controlled-Takeoff is making it no longer true that there is an economic incentive to rush ahead. (I think either Shutdown-y or Controlled Takeoff-y can both potentially work for this, if there’s actually a trusted third party who is the one that makes calls about whether the next training run is allowed, who has the guns and compute).

Gear 5: Political tractability will change as demos get scarier.

I’m not super thrilled with the “race to the edge, then burn the lead on scary demos” plan (specifically the “racing” part). But, I do think we will get much scarier demos as we approach AGI.

Politicians maybe don’t understand abstract arguments (although I think responses to If Anyone Builds It suggests they at least sometimes do). But I think there are various flavors of Sufficiently Scary Demos that will make the threat much more salient without needing to route through abstract arguments.

I think one of the most important things to be preparing for is leveraging Sufficiently Scary Demos when they arrive. I think this includes beginning to argue seriously now for global treaty shaped things and have them on hand so people go “oh, okay, I guess we do need that thing Those Guys Were Talking About After All” instead of just being bewildered.

Gears rather than Bottom Lines

I won’t make a decisive claim that any of the above should be decisive in anyone’s decisionmaking. I’m still processing the update that I can’t really simulate the entire political disagreement yet and I’m not sure what other gears I’m missing from other people’s perspective.

But, these are all individual gears that seem pretty important to me, which I think should be part of other people’s overall strategizing.

I have a similar point comparing the feasibility of “Global Shut Down” vs “Decentralized Differentially Defensive Tech world that Actually Works”, but, that’s a fairly complex and different argument.

^
to be clear this bias also applies in the MIRI-esque direction. But, they’re not the one rushing ahead inventing AGI.

“Shut It Down” is simpler than “Controlled Takeoff”

“What’s more impossible?”

Gears rather than Bottom Lines