The Main Sources of AI Risk?

There are so many causes or sources of AI risk that it’s get­ting hard to keep them all in mind. I pro­pose we keep a list of the main sources (that we know about), such that we can say that if none of these things hap­pen, then we’ve mostly elimi­nated AI risk (as an ex­is­ten­tial risk) at least as far as we can de­ter­mine. Here’s a list that I spent a cou­ple of hours enu­mer­at­ing and writ­ing down. Did I miss any­thing im­por­tant?

  1. In­suffi­cient time/​re­sources for AI safety (for ex­am­ple caused by in­tel­li­gence ex­plo­sion or AI race)

  2. In­suffi­cient global co­or­di­na­tion, lead­ing to the above

  3. Misspeci­fied or in­cor­rectly learned goals/​values

  4. In­ner optimizers

  5. ML differ­en­tially ac­cel­er­at­ing easy to mea­sure goals

  6. Paul Chris­ti­ano’s “in­fluence-seek­ing be­hav­ior” (a com­bi­na­tion of 3 and 4 above?)

  7. AI gen­er­ally ac­cel­er­at­ing in­tel­lec­tual progress in a wrong di­rec­tion (e.g., ac­cel­er­at­ing un­safe/​risky tech­nolo­gies more than knowl­edge/​wis­dom about how to safely use those tech­nolo­gies)

  8. Me­taeth­i­cal error

  9. Me­taphilo­soph­i­cal error

  10. Other kinds of philo­soph­i­cal er­rors in AI de­sign (e.g., giv­ing AI a wrong prior or de­ci­sion the­ory)

  11. Other de­sign/​cod­ing er­rors (e.g., ac­ci­den­tally putting a minus sign in front of util­ity func­tion, sup­pos­edly cor­rigible AI not ac­tu­ally be­ing cor­rigible)

  12. Do­ing acausal rea­son­ing in a wrong way (e.g., failing to make good acausal trades, be­ing acausally ex­torted, failing to acausally in­fluence oth­ers who can be so in­fluenced)

  13. Hu­man-con­trol­led AIs end­ing up with wrong val­ues due to in­suffi­cient “metaphilo­soph­i­cal pa­ter­nal­ism

  14. Hu­man-con­trol­led AIs caus­ing eth­i­cal dis­asters (e.g., large scale suffer­ing that can’t be “bal­anced out” later) prior to reach­ing moral/​philo­soph­i­cal maturity

  15. In­ten­tional cor­rup­tion of hu­man values

  16. Un­in­ten­tional cor­rup­tion of hu­man values

  17. Mind crime (dis­value un­in­ten­tion­ally in­curred through morally rele­vant simu­la­tions in AIs’ minds)

  18. Pre­ma­ture value lock-in (i.e., freez­ing one’s cur­rent con­cep­tion of what’s good into a util­ity func­tion)

  19. Ex­tor­tion be­tween AIs lead­ing to vast disvalue

  20. Distri­bu­tional shifts caus­ing ap­par­ently safe/​al­igned AIs to stop be­ing safe/​aligned

  21. Value drift and other kinds of er­ror as AIs self-mod­ify, or AIs failing to solve value al­ign­ment for more ad­vanced AIs

  22. Treach­er­ous turn /​ loss of prop­erty rights due to in­suffi­cient com­pet­i­tive­ness of hu­mans & hu­man-al­igned AIs

  23. Grad­ual loss of in­fluence due to in­suffi­cient com­pet­i­tive­ness of hu­mans & hu­man-al­igned AIs

  24. Utility max­i­miz­ers /​ goal-di­rected AIs hav­ing an eco­nomic and/​or mil­i­tary com­pet­i­tive ad­van­tage due to rel­a­tive ease of co­op­er­a­tion/​co­or­di­na­tion, defense against value cor­rup­tion and other forms of ma­nipu­la­tion and at­tack, lead­ing to one or more of the above

  25. In gen­eral, the most com­pet­i­tive type of AI be­ing too hard to al­ign or to safely use

  26. Com­pu­ta­tional re­sources be­ing too cheap, lead­ing to one or more of the above

(With this post I mean to (among other things) re-em­pha­size the dis­junc­tive na­ture of AI risk, but this list isn’t fully dis­junc­tive (i.e., some of the items are sub­cat­e­gories or causes of oth­ers), and I mostly gave a source of AI risk its own num­ber in the list if it seemed im­por­tant to make that source more salient. Maybe once we have a list of ev­ery­thing that is im­por­tant, it would make sense to cre­ate a graph out of it.)

Added on 6/​13/​19:

  1. Failure to learn how to deal with al­ign­ment in the many-hu­mans, many-AIs case even if sin­gle-hu­man, sin­gle-AI al­ign­ment is solved (sug­gested by William Saun­ders)

  2. Eco­nomics of AGI caus­ing con­cen­tra­tion of power amongst hu­man overseers

  3. In­abil­ity to spec­ify any ‘real-world’ goal for an ar­tifi­cial agent (sug­gested by Michael Co­hen)

  4. AI sys­tems end up con­trol­led by a group of hu­mans rep­re­sent­ing a small range of hu­man val­ues (ie. an ide­olog­i­cal or re­li­gious group that im­poses val­ues on ev­ery­one else) (sug­gested by William Saun­ders)

Added on 2/​3/​2020:

  1. Failing to solve the com­mit­ment races prob­lem, i.e. build­ing AI in such a way that some sort of dis­as­trous out­come oc­curs due to un­wise pre­ma­ture com­mit­ments (or un­wise hes­i­ta­tion in mak­ing com­mit­ments!). This over­laps sig­nifi­cantly with #27, #19, and #12.

Added on 3/​11/​2020:

  1. De­mons in im­perfect search (similar, but dis­tinct from, in­ner op­ti­miz­ers.) See here for illus­tra­tion.

[Edit on 1/​28/​2020: This list was cre­ated by Wei Dai. Daniel Koko­ta­jlo offered to keep it up­dated and pret­tify it over time, and so was added as a coau­thor.]