What an actually pessimistic containment strategy looks like

Israel as a nation state has an ongoing national security issue involving Iran.

For the last twenty years or so, Iran has been covertly developing nuclear weapons. Iran is a country with a very low opinion of Israel and is generally diplomatically opposed to its existence. Their supreme leader has a habit of saying things like “Israel is a cancerous tumor of a state” that should be “removed from the region”. Because of these and other reasons, Israel has assessed, however accurately, that if Iran successfully develops nuclear weapons, it stands a not-insignificant chance of using them against Israel.

Israel’s response to this problem has been multi-pronged. Making defense systems that could potentially defeat Iranian nuclear weapons is an important component of their strategy. The country has developed a sophisticated array of missile interception systems like the Iron Dome. Some people even suggest that these systems would be effective against much of the incoming rain of hellfire from an Iranian nuclear state.

But Israel’s current evaluation of the “nuclear defense problem” is pretty pessimistic. Defense isn’t all it has done. Given the size of Israel as a landmass, it would be safe to say that it’s probably not the most important component of Israel’s strategy. It has also tried to delay, or pressure Iran into delaying, its nuclear efforts through other means. For example, it gets its allies to sanction Iran, sabotages its facilities, and tries to convince its nuclear researchers to defect.

In my model, an argument like “well, what’s the point of all this effort, Iran is going to develop nuclear weapons eventually anyways” would not be very satisfying to Israeli military strategists. Firstly, that the Iranians will “eventually” get nuclear weapons is not guaranteed. Secondly, conditional on them doing it, it’s not guaranteed it’ll happen the expected lifetime of the people currently living in Israel, which is a personal win for the people in charge.

Thirdly, even if it’s going to happen tomorrow, every day that Iran does not possess nuclear weapons under this paradigm is a gift. Delaying a hypothetical nuclear holocaust means increasing the life expectancy of every living Israeli.

An argument like “well, what if you actually radicalize the Iranians into hardening their stance on developing nuclear weapons through all of this discouragement” might be pragmatic. But disincentivizing, dissuading, and sabotaging people’s progress toward things generally does what it says on the tin, and Iran is already doing nuclear weapons development. Any “intervention” you can come up with towards Iranian nuclear researchers is probably liable to make things better and not worse. Speaking more generally, there is still an instrumental motivation to get Iran to stop their nuclear weapons program, even if a diplomatic strategy would serve their needs better. Israel’s sub-goal of mulliganing their timeline away from a nuclear Iran is probably reasonable.

There are many people on this website that believe the development of AGI, by anyone in the world, would be much worse in expectation than Iran developing nuclear weapons, even from the perspective of a fiercely anti-Iranian nationalist. There are also some people on this website who additionally believe there is little to no hope for existing AI safety efforts to result in success. Since so far it doesn’t seem like there are any good reasons to believe that it’s harder and more genius-intense to develop nuclear weapons than it is to develop AGI, one might naively assume that these people would be open to a strategy like “get existing top AGI researchers to stop”. After all, that method has had some degree of success with regard to nuclear nonproliferation, and every hour that the catastrophic AGI extinction event doesn’t happen is an hour that billions of people get to continue to live. One would think that this opens up the possibility, and even suggests the strategy, of finding a way to reach and convince the people actually doing the burning of the AGI development commons.

So imagine my surprise when I informally learn that this sort of thinking is quasi-taboo. That people who wholesale devote their entire lives to the cause of preventing an AI catastrophe do not spend much of their time developing outreach programs or supporting nonviolent resistance directed toward DeepMind researchers. That essentially, they’d rather, from their perspective, literally lay down and die without having mounted this sort of direct action.

I find this perspective limiting and self-destructive. The broader goal of alignment, the underlying core goal, is to prevent or delay a global AGI holocaust, not to come up with a complete mathematical model of agents. Neglecting strategies that affect AGI timelines is limiting yourself to the minigame. The researchers at DeepMind ought to be dissuaded or discouraged from continuing to kill everybody, in addition to and in conjuction with efforts to align AI. And the more pessimistic you are about aligning AI, the more opposed you should be to AGI development, the more you should be spending your time figuring out ways to slow it down.

It seems weird and a little bit of a Chesterton’s fence to me that I’m the first person I know of to broach the subject on LessWrong with a post. I think an important reason is that people think these sorts of strategies are infeasible or too risky, which I strongly disagree is the case. To guard against this, I would now like to give an example of such an intervention that I did myself. This way I can provide a specific scenario for people in the comments section to critique instead of whatever strawman people might associate with “direct action”.

EleutherAI is a nonprofit AI capabilities research collective. Their main goal up until now has been to release large language models like the kind that OpenAI has but keeps proprietary. As a side project they occasionally publish capability research on these large language models. They are essentially a “more open” OpenAI, and while they’re smaller and less capable I think most people here would agree that their strategy and behavior before 2022, as opposed to stated goals, were probably more damaging than even OpenAI from an AI alignment perspective.

Interestingly, most of the people involved in this project were not unaware of the concerns surrounding AGI research; in fact they agreed with them! When I entered their discord, I found it counterintuitive that a large portion of their conversations seemed dedicated to rationalist memes, given the modus operandi of the organization. They simply learned not to internalize themselves as doing bad things, for reasons many reading probably understand.

Some people here are nodding their heads grimly; I had not yet discovered this harrowing fact about a lot of ML researchers who are told about the alignment problem. So one day I went into the #ai-alignment (!) discord channel inside the discord server where their members coordinate and said something like:

lc: I don’t think anybody here actually believes AGI is going to end the world. I find it weird that you guys seem to be fully on the LessWrong/​rationalist “AGI bad” train and yet you cofounded an AI capabilities collective. Doesn’t that seem really bad? Aren’t you guys speeding up the death of everybody on the planet?

They gave me a standard post they use as a response. I told them I’d already read the post and that it didn’t make any sense. I explained the whole game surrounding timelines and keeping the universe alive a little bit longer than it otherwise would be. I then had a very polite argument with Leo Gao and a couple other people from the team for an hour or so. By the end some members of the team had made some pretty sincere seeming admissions that the Rotary Embeddings blog-post I linked earlier was bad, and some team members personally admitted to having a maybe-unhealthy interest in publishing cool stuff, no matter how dangerous.

I have no idea if the conversation actually helped long term, but my sense is that it did. Shortly thereafter they took a bunch of actions they alluded to in the blog post, like attempting to use these large language models for actual alignment research instead of just saying that what they were doing was OK because somebody else might after they open sourced them. I also sometimes worry whether or not the research they were doing ever consequented in faster development of AGI in the first place, but an institution could have people to assess things like that. An institution could do A/​B testing on interventions like these. It can talk to people more than once. With enough resources it can even help people (who may legitimately not know what else they can work on) find alternative career paths.

With these kinds of efforts, instead of telling people who might already be working in some benign branch of ML that there’s this huge problem with AGI, who can potentially defect and go into that branch because it sounds cool, you’re already talking to people who, from your perspective, are doing the worst thing in the world. There’s no failure mode where some psychopaths are going to go be intrigued by the “power” of turning the world into paperclips. They’re already working at DeepMind or OpenAI. Personally, I think that failure mode is overblown, but this is one way you get around it.

I don’t have the gumption to create an institution like this from scratch. But if any potential alignment researchers or people-who-would-want-to-be-alignment-researchers-but-aren’t-smart-enough are reading this, I’m begging you to please create one so I can give my marginal time to that. Using your talents to try to develop more math sounds to a lot of people like it might be a waste of effort. I know I’m asking a lot of you, but as far as I can tell, figuring out how to do this well seems like the best thing you can do.

Not all political activism has to be waving flags around and chanting chants. Sometimes activists actually have goals and then accomplish something. I think we should try to learn from those people, as lowly as your opinion might be of them, if we don’t seem to have many other options.