What would we do if alignment were futile?

This piece, which predates ChatGPT, is no longer endorsed by its author.

Eliezer’s recent discussion on AGI alignment is not optimistic.

I consider the present gameboard to look incredibly grim… We can hope there’s a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle

For this post, instead of debating Eliezer’s model, I want to pretend it’s true. Let’s imagine we’ve all seen satisfactory evidence for the following:

  1. AGI is likely to be developed soon*

  2. Alignment is a Hard Problem. Current research is nowhere close to solving it, and this is unlikely to change by the time AGI is developed

  3. Therefore, when AGI is first developed, it will only be possible to build misaligned AGI. We are heading for catastrophe

How we might respond

I don’t think this is an unsolvable problem. In this scenario, there are two ways to avoid catastrophe: massively increase the pace of alignment research, and delay the deployment of AGI.

Massively increase the pace of alignment research via 20x more money

I wouldn’t rely solely on this option. Lots of brilliant and well-funded people are already trying really hard! But I bet we can make up some time here. Let me pull some numbers out of my arse:

  • $100M per year is spent per year on alignment research worldwide (this is a guess, I don’t know the actual number)

  • Our rate of research progress is proportional to the square root of our spending. That is, to double progress, you need to spend 4x as much**

Suppose we spent $2B a year. This would let us accomplish in 5 years what would otherwise have taken 22 years.

$2B a year isn’t realistic today, but it’s realistic in this scenario, where we’ve seen persuasive evidence Eliezer’s model is true. If AI safety is the critical path for humanity’s survival, I bet a skilled fundraiser can make it happen

Of course, skillfully administering the funds is its own issue...

Slow down AGI development

The problem, as I understand it:

  • Lots of groups, like DeepMind, OpenAI, Huawei, and the People’s Liberation Army, are trying to build powerful AI systems

  • No one is very far ahead. For a number of reasons, it’s likely to stay that way

    • We all have access to roughly the same computing power, within an OOM

    • We’re all seeing the same events unfold in the real world, leading us to similar insights

    • Knowledge tends to proliferate among researchers. This is in part a natural tendency of academic work, and in part a deliberate effort by OpenAI

  • When one group achieves the capability to deploy AGI, the others will not be far behind

  • When one group achieves the capability to deploy AGI, they will have powerful incentives to deploy it. AGI is really cool, will make a lot of money, and the first to deploy it successfully might be able to impose their values on the entire world

  • Even if they don’t deploy it, the next group still might. If even one chooses to deploy, a permanent catastrophe strikes

What can we do about this?

1. Persuade OpenAI

First, let’s try the low hanging fruit. OpenAI seems to be full of smart people who want to do the right thing. If Eliezer’s position is true, then I bet some high status rationalist-adjacent figures could be persuaded. In turn, I bet these folks could get a fair listen from Sam Altman/​Elon Musk/​Ilya Sutskever.

Maybe they’ll change their mind. Or maybe Eliezer will change his own mind.

2. Persuade US Government to impose stronger Export Controls

Second, US export controls can buy time by slowing down the whole field. They’d also make it harder to share your research, so the leading team accumulates a bigger lead. They’re easy to impose: it’s a regulatory move, so an act of Congress isn’t required. There are already export controls on narrow areas of AI, like automated imagery analysis. We could impose export controls on areas likely to contribute to AGI and encourage other countries to follow suit.

3. Persuade leading researchers not to deploy misaligned AI

Third, if the groups deploying AGI genuinely believed it would destroy the world, they wouldn’t deploy it. I bet a lot of them are persuadable in the next 2 to 50 years.

4. Use public opinion to slow down AGI research

Fourth, public opinion is a dangerous instrument. It’d make a lot of folks miserable, to give AGI the same political prominence (and epistemic habits) as climate change research. But I bet it could delay AGI by quite a lot.

5. US commits to using the full range of diplomatic, economic, and military action against those who violate AGI research norms

Fifth, the US has a massive array of policy options for nuclear nonproliferation. These range from sanctions (like the ones crippling Iran’s economy) to war. Right now, these aren’t an option for AGI, because the foreign policy community doesn’t understand the threat of misaligned AGI. If we communicate clearly and in their language, we could help them understand.

What now?

I don’t know whether the grim model in Eliezer’s interview is true or not. I think it’s really important to find out.

If it’s false (alignment efforts are likely to work), then we need to know that. Crying wolf does a lot of harm, and most of the interventions I can think of are costly and/​or destructive.

But if it’s true (current alignment efforts are doomed), we need to know that in a legible way. That is, it needs to be as easy as possible for smart people outside the community to verify the reasoning.

*Eliezer says his timeline is “short,” but I can’t find specific figures. Nate Soares gives a very substantial chance of 2 to 20 years and is 85% confident we’ll see AGI by 2070

**Wild guess, loosely based on Price’s Law. I think this works as long as we’re nowhere close to exhausting the pool of smart/​motivated/​creative people who can contribute