What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address.

Re­cently, as PCSOCMLx, I (co-)hosted a ses­sion with the goal of ex­plain­ing, de­bat­ing, and dis­cussing what I view as “the case for AI x-risk”. Speci­fi­cally, my goal was/​is to make the case for the “out-of-con­trol AI kil­ling ev­ery­one” type of AI x-risk, since many or most ML re­searchers already ac­cept that there are sig­nifi­cant risks from mi­suse of AI that should be ad­dressed. I’m shar­ing my out­line, since it might be use­ful to oth­ers, and in or­der to get feed­back on it. Please tell me what you think it does right/​wrong!

EDIT: I no­ticed I (and oth­ers I’ve spo­ken to about this) haven’t been clear enough about dis­t­in­guish­ing CLAIMS and ARGUMENTS. I’m hop­ing to make this clearer in the fu­ture.

Some back­ground/​context

I es­ti­mate I’ve spent ~100-400 hours dis­cussing AI x-risk with ma­chine learn­ing re­searchers dur­ing the course of my MSc and PhD. My cur­rent im­pres­sion is that re­jec­tion of AI x-risk by ML re­searchers is mostly due to a com­bi­na­tion of:

  • Mi­sun­der­stand­ing of what I view as the key claims (e.g. be­liev­ing “the case for x-risk hinges on short-timelines and/​or fast take-off”).

  • Ig­no­rance of the ba­sis for AI x-risk ar­gu­ments (e.g. no fa­mil­iar­ity with the ar­gu­ment from in­stru­men­tal con­ver­gence).

  • Differ­ent philo­soph­i­cal ground­ings (e.g. not feel­ing able/​com­pel­led to try and rea­son us­ing prob­a­bil­ities and ex­pected value; not valu­ing fu­ture lives very much; an un­ex­am­ined ap­par­ent be­lief that cur­rent “real prob­lems” should always take prece­dence of fu­ture “hy­po­thet­i­cal con­cerns” re­sult­ing in “whataboutism”).

I sus­pect that ig­no­rance about the level of sup­port for AI x-risk con­cerns among other re­searchers also plays a large role, but it’s less clear… I think peo­ple don’t like to be seen to be bas­ing their opinions on other re­searchers’. Un­der­ly­ing all of this seems to be a men­tal move of “out­right re­jec­tion” based on AI x-risk failing many pow­er­ful and use­ful heuris­tics. AI x-risk is thus com­monly viewed as a Pas­cal’s mug­ging: “plau­si­ble” but not plau­si­ble enough to com­pel any con­sid­er­a­tion or ac­tion. A com­mon at­ti­tude is that AI take-over has a “0+ep­silon” chance of oc­cur­ring.I’m hop­ing that be­ing more clear and mod­est in the claims I/​we aim to es­tab­lish can help move dis­cus­sions with re­searchers for­ward. I’ve re­cently been lean­ing heav­ily on the un­pre­dictabil­ity of the fu­ture and mak­ing ~0 men­tion of my own es­ti­mates about the like­li­hood of AI x-risk, with good re­sults.

The 3 core claims:

1) The de­vel­op­ment of ad­vanced AI in­creases the risk of hu­man ex­tinc­tion (by a non-triv­ial amount, e.g. 1%), for the fol­low­ing rea­sons:

  • Good­hart’s law

  • In­stru­men­tal goals

  • Safety-perfor­mance trade-offs (e.g. ca­pa­bil­ity con­trol vs. mo­ti­va­tion con­trol)

2) To miti­gat­ing this ex­is­ten­tial risk (x-risk) we need progress in 3 ar­eas:

  • Know­ing how to build safe sys­tems (“con­trol prob­lem”)

  • Know­ing that we know how to build safe sys­tems (“jus­tified con­fi­dence”)

  • Prevent­ing peo­ple from build­ing un­safe sys­tems (“global co­or­di­na­tion”)

3) Miti­gat­ing AI x-risk seems like an eth­i­cal pri­or­ity be­cause it is:

  • high impact

  • neglected

  • challeng­ing but tractable


Un­for­tu­nately, only 3 peo­ple showed up to our ses­sion (de­spite some­thing like 30 ex­press­ing in­ter­est). So I didn’t learn to much about the effec­tive­ness of this pre­sen­ta­tion. My 2 main take-aways are:

  • Some­what un­sur­pris­ingly, claim 1 had the least sup­port. While I find this claim and the sup­port­ing ar­gu­ments quite com­pel­ling and in­tu­itive, there seem to be in­fer­en­tial gaps that I strug­gle to ad­dress quickly/​eas­ily. A key stick­ing point seems to be the lack of a highly plau­si­ble con­crete sce­nario. I think it might also re­quire more dis­cus­sion of epistemics in or­der to move peo­ple from “I un­der­stand the ba­sis for con­cern” to “I be­lieve there is a non-triv­ial chance of an out-of-con­trol AI kil­ling ev­ery­one”.

  • The phrase “eth­i­cal pri­or­ity” raises alarm bells for peo­ple, and should be re­placed of clar­ified. Once I clar­ified that I meant it in the same way as “com­bat­ing cli­mate change is an eth­i­cal pri­or­ity”, peo­ple seemed to ac­cept it.

Some more de­tails on the event:

The ti­tle for our ses­sion was: The case for AI as an ex­is­ten­tial risk, and a call for dis­cus­sion and de­bate. Our blurb was: A grow­ing num­ber of re­searchers are con­cerned about sce­nar­ios in which ma­chines, in­stead of peo­ple, con­trol the fu­ture. What is the ba­sis for these con­cerns, and are they well-founded? I be­lieve they are, and we have an obli­ga­tion as a com­mu­nity to ad­dress them. I can lead with a few min­utes sum­ma­riz­ing the case for that view. We can then dis­cuss what nu­ances, ob­jec­tions, and take-aways.I also started with some ba­sic back­ground to make sure peo­ple un­der­stood the topic:

  • X-risk = risk of hu­man ex­tinc­tion

  • The 3 kinds of risk (mi­suse, ac­ci­dent, struc­tural)

  • The spe­cific risk sce­nario I’m con­cerned with: out of con­trol AI