Resources I send to AI researchers about AI safety

This is my list of re­sources I send to ma­chine learn­ing (ML) re­searchers in pre­sent­ing ar­gu­ments about AI safety. New re­sources have been com­ing out fast, and I’ve also been user-test­ing these, so the top part of this post are my up­dated (Nov 2022) recom­men­da­tions. The rest of the post (origi­nally posted June 2022) has been mod­ified for or­ga­ni­za­tion but mostly left for refer­ence; I make oc­ca­sional ad­di­tions to it (last up­dated May 2023).

Core recom­mended resources

Core read­ings for ML re­searchers[1]

Ar­gu­ments for risk from ad­vanced AI systems

Ori­ent­ing

Re­search directions

Core read­ings for the public

Core read­ings for EAs

(Read­ings that are more philo­soph­i­cal and in­volve x-risk and dis­cus­sion of AGI-like sys­tems, so ex­pected to be less liked by ML re­searchers (I have some limited data sug­gest­ing this), but they’re anec­do­tally well-liked by EAs)

Get­ting in­volved for EAs

If you haven’t read Char­lie’s writeup about re­search, or Gabe’s writeup about en­g­ineer­ing, worth a look! Richard Ngo’s AGI safety ca­reer ad­vice is also good. Also if you’re in­ter­ested in the­ory, see John Went­worth’s writeup about in­de­pen­dent re­search, and Vivek wrote some al­ign­ment ex­er­cises to try (also see John Went­worth’s work in gen­eral). With re­spect to out­reach, I’d try to use a more tech­ni­cal pitch than what Vael used; I think Sam Bow­man’s pitch is pretty great, and Mar­ius also has a nice writeup of his pitch (not spe­cific to NLP).


Full list of recom­mended resources

Th­ese read­ing choices are drawn from the var­i­ous other read­ing lists (also Vic­to­ria Krakovna’s); this is not origi­nal in any way, just some­thing to draw from if you’re try­ing to send some­one some of the more ac­cessible re­sources.

Public-oriented

Cen­tral Arguments

Tech­ni­cal Work on AI alignment

How does this lead to xrisk /​ kil­ling peo­ple though?

Fore­cast­ing (When might ad­vanced AI be de­vel­oped?)

Cal­ibra­tion and Forecasting

Com­mon Misconceptions

Coun­ter­ar­gu­ments to AI safety (messy doc):

Col­lec­tion of pub­lic sur­veys about AI


Mis­cel­la­neous older text

Text I’m no longer us­ing but still use for refer­ence some­times.

If you’re in­ter­ested in get­ting into this:

In­tro­duc­tion to large-scale risks from hu­man­ity, in­clud­ing “ex­is­ten­tial risks” that could lead to the ex­tinc­tion of humanity

Chap­ter 3 is on nat­u­ral risks, in­clud­ing risks of as­ter­oid and comet im­pacts, su­per­vol­canic erup­tions, and stel­lar ex­plo­sions. Ord ar­gues that we can ap­peal to the fact that we have already sur­vived for 2,000 cen­turies as ev­i­dence that the to­tal ex­is­ten­tial risk posed by these threats from na­ture is rel­a­tively low (less than one in 2,000 per cen­tury).

Chap­ter 4 is on an­thro­pogenic risks, in­clud­ing risks from nu­clear war, cli­mate change, and en­vi­ron­men­tal dam­age. Ord es­ti­mates these risks as sig­nifi­cantly higher, each pos­ing about a one in 1,000 chance of ex­is­ten­tial catas­tro­phe within the next 100 years. How­ever, the odds are much higher that cli­mate change will re­sult in non-ex­is­ten­tial catas­tro­phes, which could in turn make us more vuln­er­a­ble to other ex­is­ten­tial risks.

Chap­ter 5 is on fu­ture risks, in­clud­ing en­g­ineered pan­demics and ar­tifi­cial in­tel­li­gence. Wor­ry­ingly, Ord puts the risk of en­g­ineered pan­demics caus­ing an ex­is­ten­tial catas­tro­phe within the next 100 years at roughly one in thirty. With any luck the COVID-19 pan­demic will serve as a “warn­ing shot,” mak­ing us bet­ter able to deal with fu­ture pan­demics, whether en­g­ineered or not. Ord’s dis­cus­sion of ar­tifi­cial in­tel­li­gence is more wor­ry­ing still. The risk here stems from the pos­si­bil­ity of de­vel­op­ing an AI sys­tem that both ex­ceeds ev­ery as­pect of hu­man in­tel­li­gence and has goals that do not co­in­cide with our flour­ish­ing. Draw­ing upon views held by many AI re­searchers, Ord es­ti­mates that the ex­is­ten­tial risk posed by AI over the next 100 years is an alarm­ing one in ten.

Chap­ter 6 turns to ques­tions of quan­tify­ing par­tic­u­lar ex­is­ten­tial risks (some of the prob­a­bil­ities cited above do not ap­pear un­til this chap­ter) and of com­bin­ing these into a sin­gle es­ti­mate of the to­tal ex­is­ten­tial risk we face over the next 100 years. Ord’s es­ti­mate of the lat­ter is one in six.

How AI could be an ex­is­ten­tial risk

  • AI al­ign­ment re­searchers dis­agree a weirdly high amount about how AI could con­sti­tute an ex­is­ten­tial risk, so I hardly think the ques­tion is set­tled. Some plau­si­ble ones peo­ple are con­sid­er­ing (copied from the pa­per)

  • “Su­per­in­tel­li­gence”

    • A sin­gle AI sys­tem with goals that are hos­tile to hu­man­ity quickly be­comes suffi­ciently ca­pa­ble for com­plete world dom­i­na­tion, and causes the fu­ture to con­tain very lit­tle of what we value, as de­scribed in “Su­per­in­tel­li­gence”. (Note from Vael: Where the AI has an in­stru­men­tal in­cen­tive to de­stroy hu­mans and uses its plan­ning ca­pa­bil­ities to do so, for ex­am­ple via syn­thetic biol­ogy or nan­otech­nol­ogy.)

  • Part 2 of “What failure looks like

    • This in­volves mul­ti­ple AIs ac­ci­den­tally be­ing trained to seek in­fluence, and then failing catas­troph­i­cally once they are suffi­ciently ca­pa­ble, caus­ing hu­mans to be­come ex­tinct or oth­er­wise per­ma­nently lose all in­fluence over the fu­ture. (Note from Vael: I think we might have to pair this with some­thing like “and in loss of con­trol, the en­vi­ron­ment then be­comes un­in­hab­it­able to hu­mans through pol­lu­tion or con­sump­tion of im­por­tant re­sources for hu­mans to sur­vive”)

  • Part 1 of “What failure looks like

    • This in­volves AIs pur­su­ing easy-to-mea­sure goals, rather than the goals hu­mans ac­tu­ally care about, caus­ing us to per­ma­nently lose some in­fluence over the fu­ture. (Note from Vael: I think we might have to pair this with some­thing like “and in loss of con­trol, the en­vi­ron­ment then be­comes un­in­hab­it­able to hu­mans through pol­lu­tion or con­sump­tion of im­por­tant re­sources for hu­mans to sur­vive”)

  • War

    • Some kind of war be­tween hu­mans, ex­ac­er­bated by de­vel­op­ments in AI, causes an ex­is­ten­tial catas­tro­phe. AI is a sig­nifi­cant risk fac­tor in the catas­tro­phe, such that no catas­tro­phe would be oc­curred with­out the de­vel­op­ments in AI. The prox­i­mate cause of the catas­tro­phe is the de­liber­ate ac­tions of hu­mans, such as the use of AI-en­abled, nu­clear or other weapons. See Dafoe (2018) for more de­tail. (Note from Vael: Though there’s a re­cent ar­gu­ment that it may be un­likely for nu­clear weapons to cause an ex­tinc­tion event, and in­stead it would just be catas­troph­i­cally bad. One could still do it with syn­thetic biol­ogy though, prob­a­bly, to get all of the re­mote peo­ple.)

  • Misuse

    • In­ten­tional mi­suse of AI by one or more ac­tors causes an ex­is­ten­tial catas­tro­phe (ex­clud­ing cases where the catas­tro­phe was caused by mi­suse in a war that would not have oc­curred with­out de­vel­op­ments in AI). See Karnofsky (2016) for more de­tail.

  • Other

Gover­nance, aimed at highly ca­pa­ble sys­tems in ad­di­tion to to­day’s systems

It seemed like a lot of your thoughts about AI risk went through gov­er­nance, so wanted to men­tion what the space looks like (spoiler: it’s preparadig­matic) if you haven’t seen that yet!

AI Safety in China

AI Safety com­mu­nity build­ing, stu­dent-fo­cused (see aca­demic efforts above)

If they’re cu­ri­ous about other ex­is­ten­tial /​ global catas­trophic risks:

Large-scale risks from syn­thetic biol­ogy

Large-scale risks from nuclear

Why I don’t think we’re on the right timescale to worry most about cli­mate change:


List for “Prevent­ing Hu­man Ex­tinc­tion” class

I’ve also in­cluded a list of re­sources that I had stu­dents read through for the course Stan­ford first-year course “Prevent­ing Hu­man Ex­tinc­tion”.

When might ad­vanced AI be de­vel­oped?

Why might ad­vanced AI be a risk?

Think­ing about mak­ing ad­vanced AI go well (tech­ni­cal)

Think­ing about mak­ing ad­vanced AI go well (gov­er­nance)

Op­tional (large-scale risks from AI)

Nat­u­ral sci­ence sources

  1. ^
  2. ^

    I swear I didn’t set out to self-pro­mote here—it’s just do­ing weirdly well on user test­ing for both EAs and ML re­searchers at the mo­ment (this is partly be­cause it’s rel­a­tively cur­rent; I ex­pect it’ll do less well over time)

    Note: I’ve writ­ten a new ver­sion of this talk that goes over the AI risk ar­gu­ments through March 2023, and there’s a new web­site talk­ing about my in­ter­view find­ings (ai-risk-dis­cus­sions.org).

  3. ^

    Hi X,

    [warm in­tro­duc­tion]

    In the in­ter­ests of in­creas­ing op­tions, I wanted to reach out and say that I’d be par­tic­u­larly happy to help you ex­plore syn­thetic biol­ogy path­ways more, if you were so in­clined. I think it’s pretty plau­si­ble we’ll get an­other worse pan­demic in our life­times, and worth in­vest­ing a ca­reer or part of a ca­reer to work on it. Espe­cially since so few peo­ple will make that choice, so a sin­gle per­son prob­a­bly mat­ters a lot com­pared to en­ter­ing other more pop­u­lar ca­reers.

    No wor­ries if you’re not in­ter­ested though—this is just one op­tion out of many. I’m emailing you in a batch in­stead of in­di­vi­d­u­ally so that hope­fully you feel em­pow­ered to ig­nore this email and be done with this class :P. Re­gard­less, thanks for a great quar­ter and hope you have great sum­mers!

    If you are in­ter­ested:

Crossposted to EA Forum (42 points, 0 comments)