A TAI which kills all humans might also doom itself

or TAI Murder-Suicide

Losing a conflict with a high-powered cognitive system looks at least as deadly as “everybody on the face of the Earth suddenly falls over dead within the same second.”
- Eliezer Yudkowsky, AGI Ruin: A List of Lethalities (2022)

This is a particularly dramatic picture of the potential existential risk from transformative artificial intelligence (TAI): it could occur completely without warning and be completely unstoppable.

In order to win a conflict, it is not enough to make the other side lose. Both sides could lose. This picture suggests a way that humans could lose, but it does not imply that the TAI would win.

If, starting from something like our current society, “everybody on the face of the Earth suddenly falls over dead,” the TAI would also likely doom itself. Murdering all humans is suicide for the TAI.

Current AI systems are nowhere close to being fully self-sufficient.

They require electricity, supplied by an external grid. The grid has power plants, which are run by humans. There are countless parts which can break, and need to be replaced, and their replacements need to be manufactured. Humans are currently essential for many of the stages of this process.

GPUs themselves have a lifetime of about 5-7 years.[1] After that, they need to be replaced. The manufacturing of cutting edge GPUs involves one of the most complicated supply chains ever created.[2] Each step in the supply chain has only a few key players, and the different steps are localized in different locations across the world. Much of the knowledge involved is local and illegible, stored in the minds of engineers and transferred through apprenticeship and experience. Other organizations have found it difficult to copy this localized expertise.

By killing the people who sustain the grid, manufacture GPUs, and do many other tasks necessary for the functioning of data centers, the TAI would destroy the system that sustains it.

Objection: Couldn’t a TAI figure out how to do all of this on its own?

It potentially could. However, in order to be confident in its ability to recreate its entire supply chain, it would need to empirically test and make sure it could do everything. Otherwise, it would be gambling its future on its own untested skills. Killing all humans without first guaranteeing its own ability to sustain itself seems like unacceptably risking the future.

Developing robots which are as capable as human hands in every way would dramatically reduce the risk to the TAI, but it would not imply that the TAI has these particular skills. Most humans have hands which are fully capable of doing everything that hands do, but learning the particular skills is still hard.

Objection: Why doesn’t the TAI test all of the relevant skills first?

It could. However, this would be much easier for humans to detect. Figuring out if a superintelligent AI is deceiving you about its alignment seems much more difficult than figuring out if it is doing semiconductor manufacturing or maintaining its own electric grid. Humans might ask the TAI to empirically test if it can recreate itself, e.g. by designing von Neumann probes, and this seems like an obviously dangerous request.

Objection: Couldn’t a TAI enslave or perfectly persuade the people with the relevant skills, until it gains those skills itself?

It could, assuming that it can get all of the relevant humans to cooperate. This seems significantly harder than persuading a selected smaller, or less globally (and ideologically?) distributed, group of people.

Objection: Why would the TAI care if it dooms itself?

The picture that I’m engaging with assumes that the TAI has long-term goals. If it does, then survival is important for ensuring that these long-term goals continue to be achieved or pursued.

Despite focusing on this possibility in this post, it is not obvious to me that a TAI would have long-term goals. If it did not, then this argument would not be relevant. However, the argument for why the TAI would want to kill all humans also becomes less relevant. Without long-term goals, there is less pressure towards instrumental convergence, including trying to gain control over as many resources as possible.

Murder-suicide does exist among humans. There are instances in which an intelligent being chooses to destroy others and himself. This is still a problem which should be considered, even if it feels like less of a problem than intelligent beings who kill to increase their power.

A TAI which kills all humans, without first ensuring that it is capable of doing everything required for its supply chain, takes a risk of destroying itself. This is a form of potential murder-suicide, rather than the convergent route to gaining long-term power.

  1. ^
  2. ^

    Sun & Rose. Supply Chain Complexity in the Semiconductor Industry: Assessment from System View and the Impact of Changes. IFAC 48.3. (2015) p. 1210-1215. https://​​www.sciencedirect.com/​​science/​​article/​​pii/​​S2405896315004887.