OpenAI, DeepMind, Anthropic, etc. should shut down.

Link post

(I expect that the point of this post is already obvious to many of the people reading it. Nevertheless, I believe that it is good to mention important things even if they seem obvious.)

OpenAI, DeepMind, Anthropic, and other AI organizations focused on capabilities, should shut down. This is what would maximize the utility of pretty much everyone, including the people working inside of those organizations.

Let’s call Powerful AI (“PAI”) an AI system capable of either:

  • Steering the world towards what it wants hard enough that it can’t be stopped.

  • Killing everyone “un-agentically”, eg by being plugged into a protein printer and generating a supervirus.

and by “aligned” (or “alignment”) I mean the property of a system that, when it has the ability to {steer the world towards what it wants hard enough that it can’t be stopped}, what it wants is nice things and not goals that entail killing literally everyone (which is the default).

We do not know how to make a PAI which does not kill literally everyone. OpenAI, DeepMind, Anthropic, and others are building towards PAI. Therefore, they should shut down, or at least shut down all of their capabilities progress and focus entirely on alignment.

“But China!” does not matter. We do not know how to build PAI that does not kill literally everyone. Neither does China. If China tries to build AI that kills literally everyone, it does not help if we decide to kill literally everyone first.

“But maybe the alignment plan of OpenAI/​whatever will work out!” is wrong. It won’t. It might work if they were careful enough and had enough time, but they’re going too fast and they’ll simply cause literally everyone to be killed by PAI before they would get to the point where they can solve alignment. Their strategy does not look like that of an organization trying to solve alignment. It’s not just that they’re progressing on capabilities too fast compared to alignment; it’s that they’re pursuing the kind of strategy which fundamentally gets to the point where PAI kills everyone before it gets to saving the world.

Yudkowsky’s Six Dimensions of Operational Adequacy in AGI Projects describes an AGI project with adequate alignment mindset is one where

The project has realized that building an AGI is mostly about aligning it. Someone with full security mindset and deep understanding of AGI cognition as cognition has proven themselves able to originate new deep alignment measures, and is acting as technical lead with effectively unlimited political capital within the organization to make sure the job actually gets done. Everyone expects alignment to be terrifically hard and terribly dangerous and full of invisible bullets whose shadow you have to see before the bullet comes close enough to hit you. They understand that alignment severely constrains architecture and that capability often trades off against transparency. The organization is targeting the minimal AGI doing the least dangerous cognitive work that is required to prevent the next AGI project from destroying the world. The alignment assumptions have been reduced into non-goal-valent statements, have been clearly written down, and are being monitored for their actual truth.

(emphasis mine)

Needless to say, this is not remotely what any of the major AI capabilities organizations look like.

At least Anthropic didn’t particularly try to be a big commercial company making the public excited about AI. Making the AI race a big public thing was a huge mistake on OpenAI’s part, and is evidence that they don’t really have any idea what they’re doing.

It does not matter that those organizations have “AI safety” teams, if their AI safety teams do not have the power to take the one action that has been the obviously correct one this whole time: Shut down progress on capabilities. If their safety teams have not done this so far when it is the one thing that needs done, there is no reason to think they’ll have the chance to take whatever would be the second-best or third-best actions either.

This isn’t just about the large AI capabilities organizations. I expect that there’s plenty of smaller organizations out there headed towards building unaligned PAI. Those should shut down too. If these organizations exist, it must be because the people working there think they have a real chance of making some progress towards more powerful AI. If they are, then that’s real damage to the probability that anyone at all survives, and they should shut down as well in order to stop doing that damage. It does not matter if you think you have only a small negative impact on the probability that anyone survives at all — the actions that maximize your utility are the ones that decrease the probability that PAI kills literally everyone, even if it’s just by a small amount.

Organizations which do not directly work towards PAI but provides services that are instrumental to it — such as EleutherAI, HuggingFace, etc — should also shut down. It does not matter if your work only contributes “somewhat” to PAI killing literally everyone. If the net impact of your work is a higher probability that PAI kills literally everyone, you should “halt, melt, and catch fire”.

If you work at any of those organizations, your two best options to maximize your utility are to find some way to make that organization slower at getting to PAI (eg by advocating for more safety checks that slow down progress, and by yourself being totally unproductive at technical work), or to quit. Stop making excuses and start taking the correct actions. We’re all in this together. Being part of the organization that kills everyone will not do much for you — all you get is a bit more wealth-now, which is useless if you’re dead and useless if alignment is solved and we get utopia.

See also:

Crossposted to EA Forum (22 points, 22 comments)