Pausing AI is Positive Expected Value

Link post

The PauseAI (⏸️) movement often gets this pushback:

“You’re not factoring in all the benefits of good AI!”

“Stopping AI progress is also a doom scenario!”

To which I reply: If you agree P(doom) from building superintelligent AI before knowing how to align or control it is 5%+, try doing the basic expected-value calculation; you’ll see why your objection is misguided.

First, we need to estimate a few key probabilities and values. These can vary by many orders of magnitude. I’ll pick values that AI optimists hopefully agree are fair:

Probability that AI goes right if capabilities scale to superintelligence by 2034

This is an immediate “fast takeoff” scenario where state-of-the-art AI remains near-inscrutable, yet within a decade it becomes vastly more intelligent than humans on every dimension. I’d personally give this scenario a much lower probability than 50% of going right for humanity, but I’m trying to be generous to AI optimists.

Probability that AI goes right if we delay superintelligence to 2100

An important premise of PauseAI is that if we can give ourselves a few extra years or decades to thoroughly research the fundamental principles of how to align AI — how to robustly specify preferences, how to capture the delicate structure of human values as self-consistent preferences, etc — then we can significantly increase the probability that superintelligent AI goes well.

If you agree that more time for safety research helps safety catch up to capabilities, you can take whatever probability you gave to superintelligent AI going right in 2034 and add 20% (or more) to the probability that it goes right in 2100.

Value of baseline future, where AI never gets beyond human intelligence

Let’s define this as our baseline scenario, because it’s how normies who’ve never even heard of superintelligent AI currently imagine the future. We’ll define the value of other scenarios in relation to the value of this scenario.

If we never let ourselves get superintelligent AI (or it turns out to be too hard to build), there’ll probably still be at least a trillion future human lives worth living.

Value of future where AI goes wrong

If superintelligent AI goes wrong, it could very plausibly wipe out the entire future potential value of Earth-originating life. Compared to the baseline no-ASI scenario, we lose out on at least a trillion future human lives, which I’ll estimate are worth at least a $million each.

Value if superintelligent AI by 2034 goes right

I’ve estimated this as the combined GDP of a trillion current Earths. High enough for you, AI optimists?

This number could plausibly even be MUCH higher, but it doesn’t matter; it won’t change the decision-relevant calculation.

Value if superintelligent AI by 2100 goes right

I subtracted from the 2034 estimate because in this scenario, the extra 66 years it takes us to reach a “good singularity” could forego a $trillion(trillion) worth of additional value when we factor in how the delay caused billions of people on the margin to die of cancer and old age, and endure countless other types of preventable suffering.

But is a tiny fraction of , just 1% to be exact. So even after subtracting that 66-year delay penalty from {value if superintelligent AI by 2034 goes right}, we still get a similar total value estimate of about .

Naturally, when we’re evaluating a decision with the whole future value of the universe at stake, its impact on a particular 66-year time interval barely tilts the scale.


Now we plug the above numbers into the well-known formula for expected value:

Expected Value of Superintelligent AI in 2034

Expected Value of Superintelligent AI in 2100


In this calculation, the extra probability of a good outcome that we get by taking more time with our ASI efforts — e.g. 70% chance of a good outcome by pausing until 2100, instead of only 50% chance by rushing it in 2034 — flows straight to the final expected value.

That’s because the stakes of prolonging current-level suffering by 66 years are much smaller than the stakes of accidentally throwing the entire future in a dumpster, foreclosing the long-term positive outcome of good AI entirely.

Note: The number I used for a bad AI future (relative to the no-AI baseline future), , got drowned out in the calculation by the potential value of a future where AI goes right. If you’re worried about an S-risk scenario (the risk of creating unprecedented astronomical suffering as a result of ASI), then “value of future where AI goes wrong” tips the scale even more toward pausing or stopping AI development.

The original objections—

“You’re not factoring in all the benefits of good AI!”
“Stopping AI progress is also a doom scenario!”

—don’t map to any choice of numbers you could reasonably put into a basic expected value calculation, to conclude that we shouldn’t pause AI capabilities progress right now (or soon).

Feel free to try this calculation with your own numbers instead of mine. The orders of magnitude involved are ridiculously uncertain and wide-ranging. And yet, I don’t think any reasonable choice of numbers will change the conclusion that pausing AI is the right decision.