A shift in arguments for AI risk

Link post

The linked post is work done by Tom Sit­tler while at FHI. I think this sort of ex­pos­i­tory and an­a­lytic work is very valuable, so I’m cross-post­ing it here (with his per­mis­sion). Below is an ex­tended sum­mary; for the full doc­u­ment, see his linked blog post.

Many peo­ple now work on en­sur­ing that ad­vanced AI has benefi­cial con­se­quences. But mem­bers of this com­mu­nity have made sev­eral quite differ­ent ar­gu­ments for pri­ori­tis­ing AI.

Early ar­gu­ments, and in par­tic­u­lar Su­per­in­tel­li­gence, iden­ti­fied the “al­ign­ment prob­lem” as the key source of AI risk. In ad­di­tion, the book re­lies on the as­sump­tion that su­per­in­tel­li­gent AI is likely to emerge through a dis­con­tin­u­ous jump in the ca­pa­bil­ities of an AI sys­tem, rather than through grad­ual progress. This as­sump­tion is cru­cial to the ar­gu­ment that a sin­gle AI sys­tem could gain a “de­ci­sive strate­gic ad­van­tage”, that the al­ign­ment prob­lem can­not be solved through trial and er­ror, and that there is likely to be a “treach­er­ous turn”. Hence, the dis­con­ti­nu­ity as­sump­tion un­der­lies the book’s con­clu­sion that ex­is­ten­tial catas­tro­phe is a likely out­come.

The ar­gu­ment in Su­per­in­tel­li­gence com­bines three fea­tures: (i) a fo­cus on the al­ign­ment prob­lem, (ii) the dis­con­ti­nu­ity as­sump­tion, and (iii) the re­sult­ing con­clu­sion that an ex­is­ten­tial catas­tro­phe is likely.

Ar­gu­ments that aban­don some of these fea­tures have re­cently be­come promi­nent. They also gen­er­ally tend to have been made in less de­tail than the early ar­gu­ments.

One line of ar­gu­ment, pro­moted by Paul Chris­ti­ano and Katja Grace, drops the dis­con­ti­nu­ity as­sump­tion, but con­tinues to view the al­ign­ment prob­lem as the source of AI risk. Even un­der more grad­ual sce­nar­ios, they ar­gue that, un­less we solve the al­ign­ment prob­lem be­fore ad­vanced AIs are widely de­ployed in the econ­omy, these AIs will cause hu­man val­ues to even­tu­ally fade from promi­nence. They ap­pear to be ag­o­nis­tic about whether these harms would war­rant the la­bel “ex­is­ten­tial risk”.

More­over, oth­ers have pro­posed AI risks that are un­re­lated to the al­ign­ment prob­lem. I dis­cuss three of these: (i) the risk that AI might be mi­sused, (ii) that it could make war be­tween great pow­ers more likely, and (iii) that it might lead to value ero­sion from com­pe­ti­tion. Th­ese ar­gu­ments don’t cru­cially rely on a dis­con­ti­nu­ity, and the risks are rarely ex­is­ten­tial in scale.

It’s not always clear which of the ar­gu­ments ac­tu­ally mo­ti­vates mem­bers of the benefi­cial AI com­mu­nity. It would be use­ful to clar­ify which of these ar­gu­ments (or yet other ar­gu­ments) are cru­cial for which peo­ple. This could help with eval­u­at­ing the strength of the case for pri­ori­tis­ing AI, de­cid­ing which strate­gies to pur­sue within AI, and avoid­ing costly mi­s­un­der­stand­ing with sym­pa­thetic out­siders or scep­tics.