AI Hiroshima (Does A Vivid Example Of Destruction Forestall Apocalypse?)

After Eliezer’s post, I got to thinking about how to think about AI destroying the world, and how such a fate might be averted. I claim no special expertise, and an epistemic status of great uncertainty.


Humanity has faced (and so far, survived) one potential apocalypse already: that of nuclear war. I refer specifically to the Cold War, where the United States and Russia came very close to launching nuclear weapons against each other several times.

It occurred to me that, by the beginning of the Cold War, the world had a concrete example of the kind of destruction that nuclear weapons were capable of: the annihilation of two Japanese cities at the end of World War II, namely Hiroshima and Nagasaki.

In other words, while there were likely some unknowns, there was no confusion or ambiguity about what nuclear weapons were capable of, or that they represented an incredibly dangerous technology. While the technology could be and was harnessed for a variety of purposes, no one was under any illusions about the potential downsides.

If they needed a reminder, they had only to look at the Human Shadow of Death.


Nuclear War was (and continues to be) a legitimate threat to human civilization.[1] And yet it hasn’t happened yet, and we appear to be past the worst period of risk.

So what lessons might we learn from this, and how might they apply to AI?

The question that motivated this post was simple:

Did the destruction of Hiroshima and Nagasaki make a nuclear war between the United States and Russia more or less likely?

In other words, if World War II had come to an end some other way, and no nuclear bomb had ever been detonated in anger—would the Cold War have been more likely to turn hot?

I believe so, without having researched the topic extensively. In a world with no vivid demonstrations of the horror of nuclear weaponry, how could it not be easier to imagine using them?

How could the examples of Hiroshima and Nagasaki make someone more willing to inflict the same devastation upon other cities, and face that devastation deployed against their own population centers?

I believe, based solely on priors, that the vivid examples of the danger of nuclear warfare helped avert a nuclear apocalypse.


So what about AI?

One of the problems acknowledged by those who seek to align AGI is that the dangers of unaligned intelligences are not salient to politicians, decision-makers, or the populace at large. I see concerns about technological unemployment created by AI, but not concerns about the devastation unaligned intelligences are capable of.

This leads to a question, motivated in part by Eliezer’s comment about how lethally difficult alignment is:

if you can get a powerful AGI that carries out some pivotal superhuman engineering task, with a less than fifty percent change of killing more than one billion people, I’ll take it

I’ll explicitly state that I am in no way proposing a strategy or policy we should follow.

But I do want to ask: would an (otherwise survivable) AI disaster increase the probability that we survive the next century and the rise of AGI?

In other words, in possible future worlds where humanity survived the creation of AGIs, would those humans be likely to look back and find a salient event in their history, like Hiroshima and Nagasaki for nuclear war, that led to a general understanding of the dangers of the technology in a way that made an apocalypse less likely?

And if so, what is the least bad form of that disaster? A powerful AI deployed for military purposes?

An AI for options trading that crashes the global financial system?

A strawberry picker that rips people’s noses off?

I don’t have any answers. But thinking about this, I have picked out a single (dim) silver lining:

If the coming decades include a disaster involving AI—especially one that makes the dangers of the technology salient to worldwide decision- and policy-makers—then I will update, however small an amount, in the direction that we are in one of the possible worlds where humanity survives.

  1. ^

    Citation needed.