A shift in arguments for AI risk

Link post

The linked post is work done by Tom Adamczewski while at FHI. I think this sort of expository and analytic work is very valuable, so I’m cross-posting it here (with his permission). Below is an extended summary; for the full document, see his linked blog post.

Many people now work on ensuring that advanced AI has beneficial consequences. But members of this community have made several quite different arguments for prioritising AI.

Early arguments, and in particular Superintelligence, identified the “alignment problem” as the key source of AI risk. In addition, the book relies on the assumption that superintelligent AI is likely to emerge through a discontinuous jump in the capabilities of an AI system, rather than through gradual progress. This assumption is crucial to the argument that a single AI system could gain a “decisive strategic advantage”, that the alignment problem cannot be solved through trial and error, and that there is likely to be a “treacherous turn”. Hence, the discontinuity assumption underlies the book’s conclusion that existential catastrophe is a likely outcome.

The argument in Superintelligence combines three features: (i) a focus on the alignment problem, (ii) the discontinuity assumption, and (iii) the resulting conclusion that an existential catastrophe is likely.

Arguments that abandon some of these features have recently become prominent. They also generally tend to have been made in less detail than the early arguments.

One line of argument, promoted by Paul Christiano and Katja Grace, drops the discontinuity assumption, but continues to view the alignment problem as the source of AI risk. Even under more gradual scenarios, they argue that, unless we solve the alignment problem before advanced AIs are widely deployed in the economy, these AIs will cause human values to eventually fade from prominence. They appear to be agonistic about whether these harms would warrant the label “existential risk”.

Moreover, others have proposed AI risks that are unrelated to the alignment problem. I discuss three of these: (i) the risk that AI might be misused, (ii) that it could make war between great powers more likely, and (iii) that it might lead to value erosion from competition. These arguments don’t crucially rely on a discontinuity, and the risks are rarely existential in scale.

It’s not always clear which of the arguments actually motivates members of the beneficial AI community. It would be useful to clarify which of these arguments (or yet other arguments) are crucial for which people. This could help with evaluating the strength of the case for prioritising AI, deciding which strategies to pursue within AI, and avoiding costly misunderstanding with sympathetic outsiders or sceptics.