2019 AI Alignment Literature Review and Charity Comparison

Cross-posted to the EA fo­rum here.


As in 2016, 2017 and 2018, I have at­tempted to re­view the re­search that has been pro­duced by var­i­ous or­gani­sa­tions work­ing on AI safety, to help po­ten­tial donors gain a bet­ter un­der­stand­ing of the land­scape. This is a similar role to that which GiveWell performs for global health char­i­ties, and some­what similar to a se­cu­ri­ties an­a­lyst with re­gards to pos­si­ble in­vest­ments.

My aim is ba­si­cally to judge the out­put of each or­gani­sa­tion in 2019 and com­pare it to their bud­get. This should give a sense of the or­gani­sa­tions’ av­er­age cost-effec­tive­ness. We can also com­pare their fi­nan­cial re­serves to their 2019 bud­gets to get a sense of ur­gency.

I’d like to apol­o­gize in ad­vance to ev­ery­one do­ing use­ful AI Safety work whose con­tri­bu­tions I may have over­looked or mis­con­strued. As ever I am painfully aware of the var­i­ous cor­ners I have had to cut due to time con­straints from my job, as well as be­ing dis­tracted by 1) an­other ex­is­ten­tial risk cap­i­tal al­lo­ca­tion pro­ject, 2) the mir­a­cle of life and 3) com­puter games.

How to read this document

This doc­u­ment is fairly ex­ten­sive, and some parts (par­tic­u­larly the method­ol­ogy sec­tion) are the same as last year, so I don’t recom­mend read­ing from start to finish. In­stead, I recom­mend nav­i­gat­ing to the sec­tions of most in­ter­est to you.

If you are in­ter­ested in a spe­cific re­search or­gani­sa­tion, you can use the table of con­tents to nav­i­gate to the ap­pro­pri­ate sec­tion. You might then also want to Ctrl+F for the or­gani­sa­tion acronym in case they are men­tioned el­se­where as well.

If you are in­ter­ested in a spe­cific topic, I have added a tag to each pa­per, so you can Ctrl+F for a tag to find as­so­ci­ated work. The tags were cho­sen some­what in­for­mally so you might want to search more than one, es­pe­cially as a piece might seem to fit in mul­ti­ple cat­e­gories.

Here are the un-sci­en­tifi­cally-cho­sen hash­tags:

  • Agent Foundations

  • AI_Theory

  • Amplification

  • Careers

  • CIRL

  • De­ci­sion_Theory

  • Eth­i­cal_Theory

  • Forecasting

  • Introduction

  • Misc

  • ML_safety

  • Other_Xrisk

  • Overview

  • Philosophy

  • Politics

  • RL

  • Security

  • Shortterm

  • Strategy

New to Ar­tifi­cial In­tel­li­gence as an ex­is­ten­tial risk?

If you are new to the idea of Gen­eral Ar­tifi­cial In­tel­li­gence as pre­sent­ing a ma­jor risk to the sur­vival of hu­man value, I recom­mend this Vox piece by Kel­sey Piper.

If you are already con­vinced and are in­ter­ested in con­tribut­ing tech­ni­cally, I recom­mend this piece by Ja­cob Stein­heart, as un­like this doc­u­ment Ja­cob cov­ers pre-2019 re­search and or­ganises by topic, not or­gani­sa­tion.

Re­search Organisations

FHI: The Fu­ture of Hu­man­ity Institute

FHI is an Oxford-based Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2005 by Nick Bostrom. They are af­fili­ated with Oxford Univer­sity. They cover a wide va­ri­ety of ex­is­ten­tial risks, in­clud­ing ar­tifi­cial in­tel­li­gence, and do poli­ti­cal out­reach. Their re­search can be found here.

Their re­search is more varied than MIRI’s, in­clud­ing strate­gic work, work di­rectly ad­dress­ing the value-learn­ing prob­lem, and cor­rigi­bil­ity work.

In the past I have been very im­pressed with their work.


Drexler’s Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral In­tel­li­gence is a mas­sive doc­u­ment ar­gu­ing that su­per­in­tel­li­gent AI will be de­vel­oped for in­di­vi­d­ual dis­crete ser­vices for spe­cific finite tasks, rather than as gen­eral-pur­pose agents. Ba­si­cally the idea is that it makes more sense for peo­ple to de­velop spe­cial­ised AIs, so these will hap­pen first, and if/​when we build AGI these ser­vices can help con­trol it. To some ex­tent this seems to match what is hap­pen­ing—we do have many spe­cial­ised AIs—but on the other hand there are teams work­ing di­rectly on AGI, and of­ten in ML ‘build an ML sys­tem that does it all’ ul­ti­mately does bet­ter than one fea­tur­ing hand-crafted struc­ture. While most books are full of fluff and should be blog posts, this is a su­per dense doc­u­ment—a bit like Su­per­in­tel­li­gence in this re­gard—and even more than most re­search I strug­gle to sum­ma­rize it here—so I recom­mend read­ing it. See also Scott’s com­ments here. It also ad­mirably hy­per­linked so one does not have to read from start to finish. #Forecasting

Aschen­bren­ner’s Ex­is­ten­tial Risk and Eco­nomic Growth builds a model for eco­nomic growth, fea­tur­ing in­vest­ment in con­sump­tion and safety. As time goes on, diminish­ing marginal util­ity of con­sump­tion means that more and more is in­vested in safety over in­cre­men­tal con­sump­tion. It de­rives some neat re­sults, like whether or not we al­most cer­tainly go ex­tinct de­pends on whether safety in­vest­ments scale faster than the risk from con­sump­tion, and that gen­er­ally speed­ing things up is bet­ter, be­cause if there is a tem­po­rary risky phase it gets us through it faster—whereas if risk never con­verges to zero we will go ex­tinct any­way. Over­all I thought this was an ex­cel­lent pa­per. #Strategy

Carey’s How use­ful is Quan­tiliza­tion for Miti­gat­ing Speci­fi­ca­tion-Gam­ing ex­tends and tests Tay­lor’s pre­vi­ous work on us­ing quan­ti­sa­tion to re­duce overfit­ting. The pa­per first proves some ad­di­tional re­sults and then runs some em­piri­cal tests with plau­si­ble real-life sce­nar­ios, show­ing that the tech­nique does a de­cent job im­prov­ing true perfor­mance (by avoid­ing ex­ces­sive op­ti­mi­sa­tion on the im­perfect proxy). How­ever, the fact that they some­times un­der­performed the imi­ta­tor baseline makes me worry that maybe the op­ti­mi­sa­tion al­gorithms were just not well suited to the task. Over­all I thought this was an ex­cel­lent pa­per. #ML_safety

O’Keefe’s Stable Agree­ments in Tur­bu­lent Times: A Le­gal Toolkit for Con­strained Tem­po­ral De­ci­sion Trans­mis­sion pro­vides an in­tro­duc­tion to the var­i­ous ways cur­rent law al­lows con­tracts to be can­cel­led or ad­justed af­ter they have been made. For ex­am­ple, if sub­se­quent cir­cum­stances have changed so dra­mat­i­cally that the fun­da­men­tal na­ture of the con­tract has changed. The idea is that this helps pro­mote sta­bil­ity by get­ting closer to ‘what we re­ally meant’ than the literal text of the agree­ment. It is in­ter­est­ing but I am scep­ti­cal it is very helpful for AI Align­ment, where forc­ing one group /​ AI that has sud­denly be­come much more pow­er­ful to abide by their pre­vi­ous com­mit­ments seems like more of a challenge; post hoc re-writ­ing of con­tracts seems like a recipe for the pow­er­ful to seize from the left be­hind. #Politics

Arm­strong’s Re­search Agenda v0.9: Syn­the­sis­ing a hu­man’s prefer­ences into a util­ity func­tion lays out what Stu­art thinks is a promis­ing di­rec­tion for safe AGI de­vel­op­ment. To avoid the im­pos­si­bil­ity of de­duc­ing val­ues from be­havi­our, we build agents with ac­cu­rate mod­els of the way hu­man minds rep­re­sent the world, and ex­tract (par­tial) prefer­ences from there. This was very in­ter­est­ing, and I recom­mend read­ing it in con­junc­tion with this re­sponse from Steiner. #AI_Theory

Ken­ton et al.’s Gen­er­al­iz­ing from a few en­vi­ron­ments in Safety-Crit­i­cal Re­in­force­ment Learn­ing runs an ex­per­i­ment on how well some ML al­gorithms can gen­er­al­ise to avoid catas­tro­phes. This aimed to get at the risk of agents do­ing some­thing catas­trophic when ex­posed to new en­vi­ron­ments af­ter test­ing. I don’t re­ally un­der­stand how it is get­ting at this though—the haz­ard (lava) is the same in train and test, and the poor catas­tro­phe-avoidance seems to sim­ply be the re­sult of the weak penalty placed on it dur­ing train­ing (-1). #ML_safety

Cihon’s Stan­dards for AI Gover­nance: In­ter­na­tional Stan­dards to En­able Global Co­or­di­na­tion in AI Re­search & Devel­op­ment ad­vo­cates for the in­clu­sion of safety-re­lated el­e­ments into in­ter­na­tional stan­dards (like those cre­ated by the IEEE). I’m not sure I see how these are di­rectly helpful for the long-term prob­lem while we don’t yet have a tech­ni­cal solu­tion—I gen­er­ally think of these sorts of stan­dards as man­dat­ing best prac­tices, but in this case we need to de­velop those best prac­tices. #Politics

Garfinkel & Dafoe’s How does the offense-defense bal­ance scale? dis­cuss and model the way that mil­i­tary effec­tive­ness var­i­ous with in­vest­ment in offence and defence. They dis­cuss a va­ri­ety of con­flict modes, in­clud­ing in­va­sions, cy­ber, mis­siles and drones. It seems that, in their model, cy­ber­hack­ing is ba­si­cally the same as in­va­sions with vary­ing sparse defences (due to the very large num­ber of pos­si­ble zero-day ‘at­tack beaches’. #Misc

FHI also pro­duced sev­eral pieces of re­search on bio­eng­ineered pathogens which are likely of in­ter­est to many read­ers – for ex­am­ple Nel­son here – but which I have not had time to read.

FHI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:


FHI didn’t re­ply to my emails about dona­tions, and seem to be more limited by tal­ent than by money.

If you wanted to donate to them any­way, here is the rele­vant web page.

CHAI: The Cen­ter for Hu­man-Aligned AI

CHAI is a UC Berkeley based AI Safety Re­search or­gani­sa­tion founded in 2016 by Stu­art Rus­sell.. They do ML-ori­en­tated safety re­search, es­pe­cially around in­verse re­in­force­ment learn­ing, and cover both near and long-term fu­ture is­sues.

As an aca­demic or­gani­sa­tion their mem­bers pro­duce a very large amount of re­search; I have only tried to cover the most rele­vant be­low. It seems they do a bet­ter job en­gag­ing with academia than many other or­gani­sa­tions.

Ro­hin Shah, now with ad­di­tional help, con­tinue to pro­duce the AI Align­ment Newslet­ter, cov­er­ing in de­tail a huge num­ber of in­ter­est­ing new de­vel­op­ments, es­pe­cially new pa­pers.

They are ex­pand­ing some­what to other uni­ver­si­ties out­side Berkeley.


Shah et al.’s On the Fea­si­bil­ity of Learn­ing, Rather than As­sum­ing, Hu­man Bi­ases for Re­ward In­fer­ence ar­gues that learn­ing hu­man val­ues and bi­ases at the same time, while im­pos­si­ble in the­ory, is ac­tu­ally pos­si­ble in prac­tice. At­ten­tive read­ers will re­call Arm­strong and Min­der­mann’s pa­per ar­gu­ing that it is im­pos­si­ble to co-learn hu­man bias and val­ues be­cause any be­havi­our is con­sis­tent with any val­ues—if we can freely vary the bi­ases—and vice versa. This pa­per ba­si­cally ar­gues that, like the No Free Lunch the­o­rem, in prac­tice this just doesn’t mat­ter that much, ba­si­cally by as­sum­ing that the agent is close-to-op­ti­mal. (They also dis­cuss the po­ten­tial of us­ing some guaran­teed-op­ti­mal be­havi­our as ground truth, but I am scep­ti­cal this would work, as I think hu­mans are of­ten at their most ir­ra­tional when it comes to the most im­por­tant top­ics, e.g. love). Em­piri­cally, in their grid­world tests their agent did a de­cent job learn­ing—for rea­sons I didn’t re­ally un­der­stand. Over­all I thought this was an ex­cel­lent pa­per. #CIRL

Turner et al.’s Con­ser­va­tive Agency at­tempts to pre­vent agents from do­ing ir­re­versible dam­age by mak­ing them con­sider a port­fo­lio of ran­domly gen­er­ated util­ity func­tions—for which ir­re­versible dam­age is prob­a­bly bad for at least one of them. Notably, this port­fo­lio did *not* in­clude the true util­ity func­tion. I find the re­sult a lit­tle hard to un­der­stand—I ini­tially as­sumed they were rely­ing on clus­ter­ing of plau­si­ble util­ity func­tions, but it seems that they ac­tu­ally sam­pled at ran­dom from the en­tire space of pos­si­ble func­tions! I don’t re­ally un­der­stand how they avoid Arm­strong + Min­der­mann type prob­lems, but ap­par­ently they did! It seems like this line of at­tack pushes us to­wards Univer­sal Drives, as some­thing many util­ity func­tions will have in com­mon. Over­all I thought this was an ex­cel­lent pa­per. #ML_safety

Car­roll et al.’s On the Utility of Learn­ing about Hu­mans for Hu­man-AI Co­or­di­na­tion dis­cusses the differ­ences be­tween com­pet­i­tive ver­sus col­lab­o­ra­tive learn­ing. If you just want to be re­ally good at a com­pet­i­tive game, self-play is great, be­cause you get bet­ter by play­ing bet­ter and bet­ter ver­sions of your­self. How­ever, if you have to col­lab­o­rate with a hu­man this is bad be­cause your train­ing doesn’t fea­ture flawed part­ners (in the limit) and min-max­ing doesn’t work. They do an ex­per­i­ment show­ing that an agent taught about how hu­mans act does bet­ter than one which learnt col­lab­o­rat­ing with it­self. This seems use­ful if you think that CIRL/​am­plifi­ca­tion ap­proaches will be valuable, and also pro­motes teach­ing AIs to un­der­stand hu­man val­ues. There is also a blog post here #CIRL

Chan et al.‘s The As­sis­tive Multi-Armed Ban­dit at­tempts to do value learn­ing with hu­mans who are them­selves value learn­ing. They do this by hav­ing the agent some­times ‘in­ter­cept’ on a multi-armed ban­dit prob­lem, and show that this some­times im­proves perfor­mance if the agent un­der­stands how the hu­man is learn­ing. #CIRL

Rus­sell’s Hu­man Com­pat­i­ble; Ar­tifi­cial In­tel­li­gence and the Prob­lem of Con­trol is an in­tro­duc­tory book aimed at the in­tel­li­gent lay­man. As befits the au­thor, it be­gins with a lot of good fram­ing around in­tel­li­gence and agency. The writ­ing style is good. #Overview

Shah et al.’s Prefer­ences Im­plicit in the State of the World at­tempts to use the fact that hu­man en­vi­ron­ments are already semi-op­ti­mised to ex­tract ad­di­tional ev­i­dence about hu­man prefer­ences. Prac­ti­cally, this ba­si­cally means simu­lat­ing many paths the hu­mans could have taken prior to t=0 and us­ing these as ev­i­dence as to the hu­man’s val­ues. The core of the pa­per is a good in­sight—“it is easy to for­get these prefer­ences, since these prefer­ences are already satis­fied in our en­vi­ron­ment.” #CIRL

CHAI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:


They have been funded by var­i­ous EA or­gani­sa­tions in­clud­ing the Open Philan­thropy Pro­ject and recom­mended by the Founders Pledge.

They spent $1,450,000 in 2018 and $2,000,000 in 2019, and plan to spend around $2,150,000 in 2020. They have around $4650000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 2.2 years of run­way.

If you wanted to donate to them, here is the rele­vant web page.

MIRI: The Ma­chine In­tel­li­gence Re­search Institute

MIRI is a Berkeley based in­de­pen­dent AI Safety Re­search or­gani­sa­tion founded in 2000 by Eliezer Yud­kowsky and cur­rently led by Nate Soares. They were re­spon­si­ble for much of the early move­ment build­ing for the is­sue, but have re­fo­cused to con­cen­trate on re­search for the last few years. With a fairly large bud­get now, they are the largest pure-play AI al­ign­ment shop. Their re­search can be found here. Their an­nual sum­mary can be found here.

In gen­eral they do very ‘pure’ math­e­mat­i­cal work, in com­par­i­son to other or­gani­sa­tions with more ‘ap­plied’ ML or strat­egy fo­cuses. I think this is es­pe­cially no­table be­cause of the ir­re­place­abil­ity of the work. It seems quite plau­si­ble that some is­sues in AI safety will arise early on and in a rel­a­tively be­nign form for non-safety-ori­en­tated AI ven­tures (like au­tonomous cars or Minecraft helpers) – how­ever the work MIRI does largely does not fall into this cat­e­gory. I have also his­tor­i­cally been im­pressed with their re­search.

Their agent foun­da­tions work is ba­si­cally try­ing to de­velop the cor­rect way of think­ing about agents and learn­ing/​de­ci­sion mak­ing by spot­ting ar­eas where our cur­rent mod­els fail and seek­ing to im­prove them. This in­cludes things like think­ing about agents cre­at­ing other agents.

In their an­nual write-up they sug­gest that progress was slower than ex­pected in 2019. How­ever I as­sign lit­tle weight to this as I think most of the cross-sec­tional vari­a­tion in or­gani­sa­tion re­ported sub­jec­tive effec­tive­ness comes from var­i­ance in how op­ti­mistic/​salesy/​ag­gres­sive they are, rather than ac­tu­ally in­di­cat­ing much about ob­ject-level effec­tive­ness.

MIRI, in col­lab­o­ra­tion with CFAR, runs a se­ries of four-day work­shop/​camps, the AI Risk for Com­puter Scien­tists work­shops, which gather math­e­mat­i­ci­ans/​com­puter sci­en­tists who are po­ten­tially in­ter­ested in the is­sue in one place to learn and in­ter­act. This sort of work­shop seems very valuable to me as an on-ramp for tech­ni­cally tal­ented re­searchers, which is one of the ma­jor bot­tle­necks in my mind. In par­tic­u­lar they have led to hires for MIRI and other AI Risk or­gani­sa­tions in the past. I don’t have any first-hand ex­pe­rience how­ever.

They also sup­port MIRIx work­shops around the world, for peo­ple to come to­gether to dis­cuss and hope­fully con­tribute to­wards MIRI-style work.


Hub­inger et al.’s Risks from Learned Op­ti­miza­tion in Ad­vanced Ma­chine Learn­ing Sys­tems in­tro­duces the idea of a Mesa-Op­ti­mizer—a sub-agent of an op­ti­mizer that is it­self an op­ti­mizer. A vague hand-wave of an ex­am­ple might be for-profit cor­po­ra­tions re­ward­ing their sub­sidi­aries based on seg­ment PnL, or in­deed evolu­tion cre­at­ing hu­mans, which then go on to cre­ate AI. Ne­c­es­sar­ily the­o­ret­i­cal, the pa­per mo­ti­vates the idea, in­tro­duces a lot of ter­minol­ogy, and de­scribes con­di­tions that might make mesa-op­ti­misers more or less likely—for ex­am­ple, more di­verse en­vi­ron­ments make mesa-op­ti­mi­sa­tion more likely. In par­tic­u­lar, they dis­t­in­guish be­tween differ­ent forms of mis-al­ign­ment—e.g. be­tween meta, ob­ject-level and mesa, vs be­tween mesa and be­havi­oural ob­jec­tives. There is a se­quence on the fo­rum about it here. Over­all I thought this was an ex­cel­lent pa­per. Re­searchers from FHI, OpenAI were also named au­thors on the pa­per. #Agent Foundations

Kosoy’s Del­ega­tive Re­in­force­ment Learn­ing: Learn­ing to Avoid Traps with a Lit­tle Help pro­duces an al­gorithm that de­vi­ates only bound­edly from op­ti­mal with a hu­man in­ter­ven­ing to pre­vent it stum­bling into ir­re­vo­ca­bly bad ac­tions. The idea is ba­si­cally that the hu­man in­ter­venes to pre­vent the re­ally bad ac­tions, but be­cause the hu­man has some chance of se­lect­ing the op­ti­mal ac­tion af­ter­wards, the loss of ex­plo­ra­tion value is limited. This at­tempts to avoid the prob­lem that ‘ideal in­tel­li­gence’ AIXI has whereby it might drop an anvil on its head. I found the proof a bit hard to fol­low, so I’m not sure how tight the bound is in prac­tice. Notably, this doesn’t pro­tect us if the agent tries to pre­vent the hu­man from in­ter­ven­ing. Re­lated. #ML_safety

There were two analy­ses of FDT from aca­demic philoso­phers this year (re­viewed el­se­where in this doc­u­ment). In both cases I felt their crit­i­cisms rather missed the mark, which is a pos­i­tive for the MIRI ap­proach. How­ever, they did con­vinc­ingly ar­gue that MIRI re­searchers hadn’t prop­erly un­der­stood the aca­demic work they were cri­tiquing, an iso­la­tion which has prob­a­bly got­ten worse with MIRI’s cur­rent se­crecy. MIRI sug­gested I point out that Cheat­ing Death In Da­m­as­cus had re­cently been ac­cepted in The Jour­nal of Philos­o­phy, a top philos­o­phy jour­nal, as ev­i­dence of (hope­fully!) main­stream philo­soph­i­cal en­gage­ment.

MIRI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

Non-dis­clo­sure policy

Last year MIRI an­nounced their policy of nondis­clo­sure-by-de­fault:

[G]oing for­ward, most re­sults dis­cov­ered within MIRI will re­main in­ter­nal-only un­less there is an ex­plicit de­ci­sion to re­lease those re­sults, based usu­ally on a spe­cific an­ti­ci­pated safety up­side from their re­lease.

I wrote about this at length last year, and my opinion hasn’t changed sig­nifi­cantly since then, so I will just re­cap briefly.

On the pos­i­tive side we do not want peo­ple to be pres­sured into pre­ma­ture dis­clo­sure for the sake of fund­ing. This space is suffi­ciently full of in­fo­haz­ards that se­crecy might be nec­es­sary, and in its ab­sence re­searchers might pru­dently shy away from work­ing on po­ten­tially risky things—in the same way that no-one in busi­ness sends sen­si­tive in­for­ma­tion over email any more. MIRI are in ex­actly the sort of situ­a­tion that you would ex­pect might give rise to the need for ex­treme se­crecy. If se­cret re­search is a nec­es­sary step en route to sav­ing the world, it will have to be done by some­one, and it is not clear there is any­one very much bet­ter.

On the other hand, I don’t think we can give peo­ple money just be­cause they say they are do­ing good things, be­cause of the risk of abuse. There are many other rea­sons for not pub­lish­ing any­thing. Some sim­ple al­ter­na­tive hy­poth­e­sis in­clude “we failed to pro­duce any­thing pub­lish­able” or “it is fun to fool our­selves into think­ing we have ex­cit­ing se­crets” or “we are do­ing bad things and don’t want to get caught.” The fact that MIRI’s re­searchers ap­pear in­tel­li­gent sug­gest they at least think they are do­ing im­por­tant and in­ter­est­ing is­sues, but his­tory has many ex­am­ples of tal­ented reclu­sive teams spend­ing years work­ing on pointless stuff in splen­did iso­la­tion.

Ad­di­tion­ally, by hid­ing the high­est qual­ity work we risk im­pov­er­ish­ing the field, mak­ing it look un­pro­duc­tive and unattrac­tive to po­ten­tial new re­searchers.

One pos­si­ble solu­tion would be for the re­search to be done by im­pec­ca­bly de­on­tolog­i­cally moral peo­ple, whose moral code you un­der­stand and trust. Un­for­tu­nately I do not think this is the case with MIRI. (I also don’t think it is the case with many other or­gani­sa­tions, so this is not a spe­cific crit­i­cism of MIRI, ex­cept in­so­much as you might have held them to a higher stan­dard than oth­ers).


They spent $3,750,000 in 2018 and $6,000,000 in 2019, and plan to spend around $6,800,000 in 2020. They have around $9,350,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.4 years of run­way.

They have been sup­ported by a va­ri­ety of EA groups in the past, in­clud­ing OpenPhil

If you wanted to donate to MIRI, here is the rele­vant web page.

GCRI: The Global Catas­trophic Risks Institute

GCRI is a globally-based in­de­pen­dent Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2011 by Seth Baum and Tony Bar­rett. They cover a wide va­ri­ety of ex­is­ten­tial risks, in­clud­ing ar­tifi­cial in­tel­li­gence, and do policy out­reach to gov­ern­ments and other en­tities. Their re­search can be found here. Their an­nual sum­mary can be found here.

In 2019 they ran an ad­vis­ing pro­gram where they gave guidance to peo­ple from around the world who wanted to help work on catas­trophic risks.

In the past I have praised them for pro­duc­ing a re­mark­ably large vol­ume of re­search; this slowed down some­what dur­ing 2019 de­spite tak­ing on a sec­ond full-time staff mem­ber, which they at­tributed partly to timing is­sues (e.g. pieces due to be re­leased soon), and partly to fo­cus­ing on qual­ity over quan­tity.


Baum et al.’s Les­sons for Ar­tifi­cial In­tel­li­gence from Other Global Risks anal­o­gises AI risk to sev­eral other global risks: biotech, nukes, global warm­ing and as­ter­oids. In each case it dis­cusses how ac­tion around the risk pro­gressed, in par­tic­u­lar the role of gain­ing ex­pert con­sen­sus and nav­i­gat­ing vested in­ter­ests. #Strategy

Baum’s The Challenge of An­a­lyz­ing Global Catas­trophic Risks in­tro­duces the idea of catas­trophic risks and dis­cusses some gen­eral is­sues. It ar­gues for the need to quan­tify var­i­ous risks, and ways to pre­sent these to poli­cy­mak­ers. #Other_Xrisk

Baum’s Risk-Risk Trade­off Anal­y­sis of Nu­clear Ex­plo­sives for As­teroid Deflec­tion dis­cusses how to com­pare the pro­tec­tion from as­ter­oids that nukes offer vs their po­ten­tial to ex­ac­er­bate war. #Other_Xrisk


Dur­ing De­cem­ber 2018 they re­ceived a $250,000 dona­tion from Gor­don Ir­lam.

They spent $140,000 in 2018 and $250,000 in 2019, and plan to spend around $250,000 in 2020. They have around $310,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.2 years of run­way.

If you want to donate to GCRI, here is the rele­vant web page.

CSER: The Cen­ter for the Study of Ex­is­ten­tial Risk

CSER is a Cam­bridge based Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2012 by Jaan Tal­linn, Martin Rees and Huw Price, and then es­tab­lished by Seán Ó hÉigeartaigh with the first hire in 2015. They are cur­rently led by Cather­ine Rhodes and are af­fili­ated with Cam­bridge Univer­sity. They cover a wide va­ri­ety of ex­is­ten­tial risks, in­clud­ing ar­tifi­cial in­tel­li­gence, and do poli­ti­cal out­reach. Their re­search can be found here. Their an­nual sum­mary can be found here and here.

CSER also par­ti­ci­pated in a lot of differ­ent out­reach events, in­clud­ing to the UK par­li­a­ment and by host­ing var­i­ous work­shops, as well as sub­mit­ting (along with other orgs) to the EU’s con­sul­ta­tion, as sum­marised in this post. I’m not sure how to judge the value of these.

CSER’s re­searchers seem to se­lect a some­what widely rang­ing group of re­search top­ics, which I worry may re­duce their effec­tive­ness.

Cather­ine Rhodes co-ed­ited a vol­ume of pa­pers on ex­is­ten­tial risks, in­clud­ing many by other groups men­tioned in this re­view.


Kacz­marek & Beard’s Hu­man Ex­tinc­tion and Our Obli­ga­tions to the Past pre­sents an ar­gu­ment that even peo­ple who hold per­son-af­fect­ing views should think ex­tinc­tion is bad be­cause it un­der­mines the sac­ri­fices of our an­ces­tors. My guess is that most read­ers are not in need of per­suad­ing that ex­tinc­tion is bad, but I thought this was an in­ter­est­ing ad­di­tional ar­gu­ment. The core idea is that if some­one makes a large sac­ri­fice to en­able some good, we have a pro tanto rea­son not to squan­der that sac­ri­fice. I’m not sure how many peo­ple will be per­suaded by this idea, but as a piece of philos­o­phy I thought this was a clever idea, and it is definitely good to pro­mote the idea that past gen­er­a­tions have value (speak­ing as a fu­ture mem­ber of a past gen­er­a­tion). Carl Shul­man also offered re­lated ar­gu­ments here. #Philosophy

Beard’s Perfec­tion­ism and the Repug­nant Con­clu­sion ar­gues against one sup­posed re­jec­tion of the Repug­nant Con­clu­sion, namely that some goods are lex­i­co­graph­i­cally su­pe­rior to or­di­nary welfare. The pa­per makes the clever ar­gu­ment that the very large, barely-worth-liv­ing group might ac­tu­ally have more of these goods if they were offset by (lex­i­co­graph­i­cally sec­ondary) nega­tive welfare. It was also the first time (to my rec­ol­lec­tion) that I’ve come across the Ridicu­lous Con­clu­sion. #Philosophy

Avin’s Ex­plor­ing Ar­tifi­cial In­tel­li­gence Fu­tures lists and dis­cusses differ­ent ways of in­tro­duc­ing peo­ple to the fu­ture of AI. Th­ese in­clude fic­tion, games, ex­pert anal­y­sis, pol­ling and work­shops. He also pro­vides var­i­ous pros and cons of the differ­ent tech­niques, which seemed gen­er­ally ac­cu­rate to me. #Strategy

Belfield’s How to re­spond to the po­ten­tial mal­i­cious uses of ar­tifi­cial in­tel­li­gence? in­tro­duces AI and AI risk. This short ar­ti­cle fo­cuses mainly on short-term risks. #Introduction

Weitzdörfer & Beard’s Law and Policy Re­sponses to Disaster-In­duced Fi­nan­cial Distress dis­cusses the prob­lem of in­debt­ed­ness fol­low­ing the de­struc­tion of col­lat­eral in the 2011 earth­quake in Ja­pan. They ex­plain the speci­fics of the situ­a­tion in ex­treme de­tail, and I was pleas­antly sur­prised by their fi­nal recom­men­da­tions, which mainly con­cerned re­mov­ing bar­ri­ers to in­surance pen­e­tra­tion. #Politics

Kemp’s Me­di­a­tion Without Mea­sures: Con­flict Re­s­olu­tion in Cli­mate Di­plo­macy dis­cusses the lack of for­mal de­ci­sion-mak­ing pro­ce­dure for in­ter­na­tional cli­mate change treaties. Un­for­tu­nately I wasn’t able to ac­cess the ar­ti­cle. #Other_Xrisk

Avin & Amadae’s Au­ton­omy and ma­chine learn­ing at the in­ter­face of nu­clear weapons, com­put­ers and peo­ple dis­cusses the po­ten­tial dan­gers of in­cor­po­rat­ing nar­row AI into nu­clear weapon sys­tems. #Shortterm

CSER’s Policy se­ries Manag­ing global catas­trophic risks: Part 1 Un­der­stand in­tro­duces the idea of Xrisk for poli­cy­mak­ers. This is the first re­port in a se­ries, and as such is quite in­tro­duc­tory. It mainly fo­cuses on non-AI risks. #Politics

Tza­chor’s The Fu­ture of Feed: In­te­grat­ing Tech­nolo­gies to De­cou­ple Feed Pro­duc­tion from En­vi­ron­men­tal Im­pacts dis­cusses a new tech­nol­ogy for pro­duc­ing an­i­mal feed­stock to re­place soy­beans. This could be Xrisk rele­vant if some non-AI risk made it hard to feed an­i­mals. How­ever, I am some­what scep­ti­cal of the pre­sen­ta­tion of this as a *likely* risk as both a fu­ture short­age of soy­beans and a dra­mat­i­cally more effi­cient tech­nol­ogy for feed­ing live­stock would both pre­sum­ably be of in­ter­est to pri­vate ac­tors, and show up in soy­bean fu­ture prices. #Other_Xrisk

Beard’s What Is Un­fair about Unequal Brute Luck? An In­ter­gen­er­a­tional Puz­zle dis­cusses Luck Egal­i­tar­i­anism. #Philosophy

Quigley’s Univer­sal Own­er­ship in the An­thro­pocene ar­gues that be­cause in­vestors own di­ver­sified port­fo­lios they effec­tively in­ter­nal­ise ex­ter­nal­ities, and hence should push for var­i­ous poli­ti­cal changes. The idea is ba­si­cally that even though pol­lut­ing might be in a com­pany’s best in­ter­est, it hurts the other com­pa­nies the in­vestor owns, so it is over­all against the best in­ter­ests of the in­vestor. As such, in­vestors should push com­pa­nies to pol­lute less and so on. The pa­per seems to ba­si­cally as­sume that such ‘uni­ver­sal in­vestors’ would be in­cen­tivised to sup­port left-wing poli­cies on a wide va­ri­ety of is­sues. How­ever, it some­how fails to men­tion even cur­so­rily the fact that the core is­sue has been well stud­ied by economists: when all the com­pa­nies in an in­dus­try try to co­or­di­nate for mu­tual benefit, it is called a car­tel, and the #1 way of achiev­ing mu­tual benefit is rais­ing prices to near-monopoly lev­els. It would be ex­tremely sur­pris­ing to me if some­one, act­ing as a self-in­ter­ested owner of all the world’s shoe com­pa­nies (for ex­am­ple) found it more prof­itable to pro­tect bio­di­ver­sity than to raise the price of shoes. For­tu­nately, in prac­tice uni­ver­sal in­vestors are quite sup­port­ive of com­pe­ti­tion. #Other_Xrisk

CSER re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:


They spent £789,000 in 2017-2018 and £801,000 in 2018-2019, and plan to spend around £1,100,000 in 2019-20 and £880,000 in 2020-21. It seems that similar to GPI maybe ‘run­way’ is not that mean­ingful—they sug­gested it be­gins to de­cline from early 2021 and all their cur­rent grants end by mid-2024.

If you want to donate to them, here is the rele­vant web page.


Ought is a San Fran­cisco based in­de­pen­dent AI Safety Re­search or­gani­sa­tion founded in 2018 by An­dreas Stuh­lmüller. They re­search meth­ods of break­ing up com­plex, hard-to-ver­ify tasks into sim­ple, easy-to-ver­ify tasks—to ul­ti­mately al­low us effec­tive over­sight over AIs. This in­cludes build­ing com­puter sys­tems and re­cruit­ing test sub­jects. I think of them as ba­si­cally test­ing Paul Chris­ti­ano’s ideas. Their re­search can be found here. Their an­nual sum­mary can be found here.

Last year they were fo­cused on fac­tored gen­er­a­tion – try­ing to break down ques­tions so that dis­tributed teams could pro­duce the an­swer. They have moved on to fac­tored eval­u­a­tion – us­ing similar dis­tributed ideas to try to eval­u­ate ex­ist­ing an­swers, which seems a sig­nifi­cantly eas­ier task (by anal­ogy to P<=NP). It seems to my non-ex­pert eye that fac­tored gen­er­a­tion did not work as well as they ex­pected – they men­tion the re­quired trees be­ing ex­tremely large, and my ex­pe­rience is that or­ganis­ing vol­un­teers and get­ting them to ac­tu­ally do what they said they would has his­tor­i­cally been a great strug­gle for many or­gani­sa­tions. How­ever I don’t think we should hold nega­tive re­sults in in­ves­ti­ga­tions against or­gani­sa­tions; nega­tive re­sults are valuable, and it might be the case that all progress in this difficult do­main comes from ex ante long­shots. If noth­ing else, even if Paul is to­tally wrong about the whole idea it would be use­ful to dis­cover this sooner rather than later!

They pro­vided an in­ter­est­ing ex­am­ple of what their work looks like in prac­tice here, and a de­tailed pre­sen­ta­tion on their work here.

They also worked on us­ing ML, rather than hu­mans, as the agent who an­swered the bro­ken-down ques­tions, in this case by us­ing GPT-2, which seems like a clever idea.

Paul Chris­ti­ano wrote a post ad­vo­cat­ing donat­ing to them here.


Evans et al.‘s Ma­chine Learn­ing Pro­jects for Iter­ated Distil­la­tion and Am­plifi­ca­tion pro­vides three po­ten­tial re­search pro­jects for peo­ple who want to work on Am­plifi­ca­tion, as well as an in­tro­duc­tion to Am­plifi­ca­tion. The pro­jects are math­e­mat­i­cal de­com­po­si­tion (which seems very nat­u­ral), de­com­po­si­tion com­puter pro­grams (similar to how all pro­grams can be de­com­posed into logic gates, al­though I don’t re­ally un­der­stand this one) and adap­tive com­pu­ta­tion, where you figure out how much com­pu­ta­tion to ded­i­cate to differ­ent is­sues. In gen­eral I like out­lin­ing these sorts of ‘shovel-ready’ pro­jects, as it makes it eas­ier for new re­searchers, and seems rel­a­tively un­der-ap­pre­ci­ated. Re­searchers from FHI were also named au­thors on the pa­per. #Amplification

Roy’s AI Safety Open Prob­lems pro­vides a list of lists of ‘shovel-ready’ pro­jects for peo­ple to work on. If you like X (which I do in this case), meta-X is surely even bet­ter! #Ought


They spent $500,000 in 2018 and $1,000,000 in 2019, and plan to spend around $2,500,000 in 2020. They have around $1,800,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 0.7 years of run­way.

They have re­ceived fund­ing from a va­ri­ety of EA sources, in­clud­ing the Open Philan­thropy Pro­ject.


OpenAI is a San Fran­cisco based in­de­pen­dent AI Re­search or­gani­sa­tion founded in 2015 by Sam Alt­man. They are one of the lead­ing AGI re­search shops, with a sig­nifi­cant fo­cus on safety.

Ear­lier this year they an­nounced GPT 2, a lan­guage model that was much bet­ter at ‘un­der­stand­ing’ hu­man text than pre­vi­ous at­tempts, that was no­tably good at gen­er­at­ing text that seemed hu­man-gen­er­ated—good enough that it was in­dis­t­in­guish­able to hu­mans who weren’t con­cen­trat­ing. This was es­pe­cially no­table be­cause OpenAI chose not to im­me­di­ately re­lease GPT 2 due to the po­ten­tial for abuse. I thought this was a no­ble effort to start con­ver­sa­tions among ML re­searchers about re­lease norms, though my im­pres­sion is that many thought OpenAI was just grand­stand­ing, and I per­son­ally was scep­ti­cal of the harm po­ten­tial—though a GPT 2 based in­tel­li­gence did go on to al­most take over LW, prov­ing that the ‘be­ing a good LW com­menter’ is a hard goal. Out­side re­searchers were able to (partly?) repli­cate it, but in a sur­pris­ingly heart­en­ing turn of events were per­suaded not to re­lease their re­con­struc­tion by re­searchers from OpenAI and MIRI. OpenAI even­tu­ally re­leased a much larger ver­sion of their sys­tem—you can see it and read their fol­low-up re­port on the con­trol­led re­lease pro­cess here.

You can play with (one ver­sion of) the model here.


Clark & Had­field’s Reg­u­la­tory Mar­kets for AI Safety sug­gests a model for the pri­vati­sa­tion of AI reg­u­la­tion. Ba­si­cally the idea is that gov­ern­ments will con­tract with and set out­comes for a small num­ber of pri­vate reg­u­la­tors, which will then de­vise spe­cific rules that need to be ob­served by ML shops. This al­lows the ex-ante reg­u­la­tion to be more nim­ble than if it was done pub­li­cly, while re­tain­ing the ex-post out­come guaran­tees. It re­minded me of the sys­tem of au­di­tors for pub­lic com­pa­nies to en­sure ac­count­ing ac­cu­racy) or David Fried­man’s work on poly­cen­tric law. I can cer­tainly see why pri­vate com­pa­nies might be more effec­tive as reg­u­la­tors than gov­ern­ment bod­ies. How­ever, I’m not sure how use­ful this would be in an AGI sce­nario, where the goals and ex-post mea­sure­ment for the pri­vate reg­u­la­tors are likely to be­come out­dated and ir­rele­vant. I’m also scep­ti­cal that gov­ern­ments would be will­ing to pro­gres­sively give up reg­u­la­tory pow­ers; I sus­pect that if this sys­tem was to be adopted it would have to pre-empt gov­ern­ment reg­u­la­tion. #Politics

Chris­ti­ano’s What failure looks like pro­vides two sce­nar­ios that Paul thinks rep­re­sent rea­son­ably likely out­comes of Align­ment go­ing wrong. Notably nei­ther ex­actly match the clas­sic re­cur­sively self-im­prov­ing FOOM case. The first is ba­si­cally that we de­velop bet­ter and bet­ter op­ti­mi­sa­tion tech­niques, but due to our in­abil­ity to cor­rectly spec­ify what we want, we end up with worse and worse Good­heart’s Law situ­a­tions, end­ing up in Red-Queen style Moloch sce­nario. The sec­ond is that we cre­ate al­gorithms that try to in­crease their in­fluence (as per the fun­da­men­tal drives). At first they do so se­cretly, but even­tu­ally (likely in re­sponse to some form of catas­tro­phe re­duc­ing hu­man­ity’s ca­pa­bil­ity to sup­press them) their strat­egy abruptly changes to­wards world dom­i­na­tion. I thought this was an in­sight­ful post, and recom­mend read­ers also read the com­ments by Dai and Shul­man, as well as this post. #Fore­cast­ing

Chris­ti­ano’s AI al­ign­ment land­scape is a talk Paul gave at EA Global giv­ing an overview of the is­sue. It is in­ter­est­ing both for see­ing how he maps out all the differ­ent com­po­nents of the prob­lem and which he thinks are tractable and im­por­tant, and also for how his Am­plifi­ca­tion ap­proach falls out from this. #Overview

Irv­ing & Askell’s AI Safety Needs So­cial Scien­tists raise the is­sue of AI al­ign­ment re­quiring bet­ter un­der­stand­ing of hu­mans as well as ML knowl­edge. Be­cause hu­mans are bi­ased, etc., the more ac­cu­rate our model of hu­man prefer­ences the bet­ter we can de­sign AIs to al­ign with it. It is quite fo­cused on Am­plifi­ca­tion as a way of mak­ing hu­man prefer­ences more leg­ible. I thought the ar­ti­cle could have been im­proved with more ac­tion­able re­search pro­jects for so­cial sci­en­tists. Ad­di­tion­ally, the ar­ti­cle makes the need for so­cial sci­en­tists seem some­what tired to a De­bate-style ap­proach, whereas it seems to me po­ten­tially more broad. #Strategy

OpenAI Re­searchers also con­tributed to the fol­low­ing pa­pers led by other or­gani­sa­tions:


OpenAI was ini­tially funded with money from Elon Musk as a not-for-profit. They have since cre­ated an un­usual cor­po­rate struc­ture in­clud­ing a for-profit en­tity, in which Microsoft is in­vest­ing a billion dol­lars.

Given the strong fund­ing situ­a­tion at OpenAI, as well as their safety team’s po­si­tion within the larger or­gani­sa­tions, I think it would be difficult for in­di­vi­d­ual dona­tions to ap­pre­cia­bly sup­port their work. How­ever it could be an ex­cel­lent place to ap­ply to work.

Google DeepMind

Deep­Mind is a Lon­don based AI Re­search or­gani­sa­tion founded in 2010 by Demis Hass­abis, Shane Legg and Mustafa Suley­man and cur­rently led by Demis Hass­abis. They are af­fili­ated with Google. As well as be­ing ar­guably the most ad­vanced AI re­search shop in the world, Deep­Mind has a very so­phis­ti­cated AI Safety team, cov­er­ing both ML safety and AGI safety.

This year Deep­Mind build an agent that could beat hu­mans at Star­craft II. This is im­pres­sive be­cause it is a com­plex, in­com­plete in­for­ma­tion game that hu­mans are very com­pet­i­tive at. How­ever, the AI did have some ad­van­tages over hu­mans by hav­ing di­rect API ac­cess.


Ever­itt & Hut­ter’s Re­ward Tam­per­ing Prob­lems and Solu­tions in Re­in­force­ment Learn­ing: A Causal In­fluence Di­a­gram Per­spec­tive dis­cusses the prob­lem of agents wire­head­ing in an RL set­ting, along with sev­eral pos­si­ble solu­tions. They use causal in­fluence di­a­grams to high­light the differ­ence be­tween ‘good’ ways for agents to in­crease their re­ward func­tion and ‘bad’ ways, and have a nice toy grid­world ex­am­ple. The solu­tions they dis­cuss seemed to me to of­ten be fairly stan­dard ideas from the AI safety com­mu­nity—thinks like teach­ing the AI to max­imise the goal in­stan­ti­ated by its re­ward func­tion at the start, rather than what­ever hap­pens to be in that box later, or us­ing in­differ­ence re­sults—but they in­tro­duce them to an RL set­ting, and the pa­per does a good job cov­er­ing a lot of ground. There is more dis­cus­sion of the pa­per here. Over­all I thought this was an ex­cel­lent pa­per. #RL

Ever­itt et al.’s Model­ing AGI Safety Frame­works with Causal In­fluence Di­a­grams in­tro­duces the idea of us­ing Causal In­fluence Di­a­grams to clar­ify think­ing around AI safety pro­pos­als and make it eas­ier to com­pare pro­pos­als with differ­ent con­cep­tual back­grounds in a stan­dard way. They in­tro­duce the idea, and show how to rep­re­sent ideas like RL, CIRL, Coun­ter­fac­tual Or­a­cles and De­bate. Causal In­fluence Di­a­grams have been used in sev­eral other pa­pers this year, like Cat­e­go­riz­ing Wire­head­ing in Par­tially Embed­ded Agents. #AI_Theory

Ever­itt et al.’s Un­der­stand­ing Agent In­cen­tives us­ing Causal In­fluence Di­a­grams. Part I: Sin­gle Ac­tion Set­tings dis­cusses us­ing causal in­fluence di­a­grams to dis­t­in­guish things agents want to ob­serve vs things they want to con­trol. They use this to show the safety im­prove­ment from coun­ter­fac­tual or­a­cles. It also pre­sents a nat­u­ral link be­tween near-term and long-term safety con­cerns. #AI_Theory

Sut­ton’s The Bit­ter Les­son ar­gues that his­tory sug­gests mas­sive amounts of com­puter and rel­a­tively gen­eral struc­tures perform bet­ter than hu­man-de­signed spe­cial­ised sys­tems. He uses ex­am­ples like the his­tory of vi­sion and chess, and it seems fairly per­sua­sive, though I won­der a lit­tle if these are cherry-picked—e.g. in fi­nance we gen­er­ally do have to make con­sid­er­able use of hu­man-com­pre­hen­si­ble fea­tures. This is not di­rectly an AI safety pa­per, but it does have clear im­pli­ca­tions. #Forecasting

Ue­sato et al.’s Ri­gor­ous Agent Eval­u­a­tion: An Ad­ver­sar­ial Ap­proach to Un­cover Catas­trophic Failures at­tempt to make it eas­ier to find catas­trophic failure cases. They do this ad­ver­sar­i­ally with pre­vi­ous ver­sions of the al­gorithm, based on the idea that it is cheaper to find dis­asters there, but they will be re­lated to the failure modes of the later in­stan­ti­a­tions. This seems like an in­ter­est­ing idea, but seems like it would strug­gle with cases where in­creas­ing agent ca­pa­bil­ities lead to new failure modes—e.g. the Treach­er­ous Turn we are wor­ried about. #ML_safety

Ngo’s Tech­ni­cal AGI safety re­search out­side AI pro­vides a list of tech­ni­cally use­ful top­ics for peo­ple who are not ML re­searchers to work on. The top­ics se­lected look good—many similar to work AIIm­pacts or Ought do. I think lists like this are very use­ful for open­ing the field up to new re­searchers. #Overview

Re­searchers from Deep­Mind were also named on the fol­low­ing pa­pers:


Be­ing part of Google, I think it would be difficult for in­di­vi­d­ual donors to di­rectly sup­port their work. How­ever it could be an ex­cel­lent place to ap­ply to work.

AI Safety camp

AISC is an in­ter­na­tion­ally based in­de­pen­dent res­i­den­tial re­search camp or­gani­sa­tion founded in 2018 by Linda Linse­fors and cur­rently led by Colin Bested. They bring to­gether peo­ple who want to start do­ing tech­ni­cal AI re­search, host­ing a 10-day camp aiming to pro­duce pub­lish­able re­search. Their re­search can be found here.

To the ex­tent they can provide an on-ramp to get more tech­ni­cally profi­cient re­searchers into the field I think this is po­ten­tially very valuable. But I ob­vi­ously haven’t per­son­ally ex­pe­rienced the camps, or even spo­ken to any­one who has.


Ma­jha et al.’s Cat­e­go­riz­ing Wire­head­ing in Par­tially Embed­ded Agents dis­cusses the wire­head­ing prob­lem for agents who can mess with their re­ward chan­nel or be­liefs. They model this us­ing causal agent di­a­grams, sug­gest a pos­si­ble solu­tion (mak­ing re­wards a func­tion of world-be­liefs, not ob­ser­va­tions) and show that this does not work us­ing very sim­ple grid­world AIXIjs im­ple­men­ta­tions. #AI_Theory

Ko­varik et al.’s AI Safety De­bate and Its Ap­pli­ca­tions dis­cusses us­ing ad­ver­sar­i­ally De­bat­ing AIs as a method for al­ign­ment. It pro­vides a very ac­cessible in­tro­duc­tion to De­bat­ing AIs, and im­ple­ments some ex­ten­sions to the prac­ti­cal MNIST work from the origi­nal pa­per. #Amplification

Man­cuso et al.‘s De­tect­ing Spiky Cor­rup­tion in Markov De­ci­sion Pro­cesses sug­gests that we can ad­dress cor­rupted re­ward sig­nals for RL by re­mov­ing ‘spikey’ re­wards. This is an at­tempt to get around im­pos­si­bil­ity re­sults by iden­ti­fy­ing a sub­class where they don’t hold. I can see this be­ing use­ful in some cases like re­ward tam­per­ing, where the re­ward from fid­dling with $AGENT_UTILITY is likely to be very spiky. How­ever if hu­man val­ues are frag­ile then it seems plau­si­ble that the ‘True’ re­ward sig­nal should also be spikey. #ML_safety

Perry & Uuk’s AI Gover­nance and the Poli­cy­mak­ing Pro­cess: Key Con­sid­er­a­tions for Re­duc­ing AI Risk in­tro­duces the field of AI gov­er­nance, and dis­cusses is­sues about how policy is im­ple­mented in prac­tice, like the ex­is­tence of win­dows in time for in­sti­tu­tional change. #Politics


Their web­site sug­gests they are seek­ing dona­tions, but they did not re­ply when I en­quired with the ‘con­tact us’ email.

They are run by vol­un­teers, and were funded by the LTFF.

If you want to donate the web page is here.

FLI: The Fu­ture of Life Institute

FLI is a Bos­ton-based in­de­pen­dent ex­is­ten­tial risk or­ga­ni­za­tion, fo­cus­ing on out­reach, founded in large part to help or­ganise the re­grant­ing of $10m from Elon Musk.

They have a pod­cast on AI Align­ment here, and ran the Benefi­cial AI con­fer­ence in Jan­uary.

One of their big pro­jects this year has been pro­mot­ing the stig­ma­ti­sa­tion of, and ul­ti­mately the ban­ning of, Lethal Au­tonomous Weapons. As well as pos­si­bly be­ing good for its own sake, this might help build in­sti­tu­tional ca­pac­ity to ban po­ten­tially dan­ger­ous tech­nolo­gies that trans­fer au­tonomous away from hu­mans. You can read their state­ment on the sub­ject to the UN here. On the other hand, the de­sir­a­bil­ity of this policy is not en­tirely un­con­tro­ver­sial – see for ex­am­ple Bo­gosian’s On AI Weapons. There is also lengthy dis­cus­sion by Ster­benz and Trager here.

Krakovna’s ICLR Safe ML Work­shop Re­port sum­marises the re­sults from a work­shop on safety that Vic­to­ria co-ran at ICLR. You can see a list of all the pa­pers here. #ML_safety


AIIm­pacts is a Berkeley based AI Strat­egy or­gani­sa­tion founded in 2014 by Katja Grace. They are af­fili­ated with (a pro­ject of, with in­de­pen­dent fi­nanc­ing from) MIRI. They do var­i­ous pieces of strate­gic back­ground work, es­pe­cially on AI Timelines—it seems their pre­vi­ous work on the rel­a­tive rar­ity of dis­con­tin­u­ous progress has been rel­a­tively in­fluen­tial. Their re­search can be found here.


Katja im­pressed upon me that most of their work this year went into as-yet-un­pub­lished work, but this is what is pub­lic:

Long & Davis’s Con­ver­sa­tion with Ernie Davis is an in­ter­view tran­script with Davis, an NYU com­puter sci­ence pro­fes­sor who is an AI risk scep­tic. Un­for­tu­nately I didn’t think they quite got into the heart of the dis­agree­ment—they seem to work out the cruz is how much power su­pe­rior in­tel­li­gence gives you, but then move on. #Forecasting

Long & Ber­gal’s Ev­i­dence against cur­rent meth­ods lead­ing to hu­man level ar­tifi­cial in­tel­li­gence lists a va­ri­ety of ar­gu­ments for why cur­rent AI tech­niques are in­suffi­cient for AGI. It’s ba­si­cally a list of ‘things AI might need that we don’t have yet’, a lot of which com­ing from Mar­cus’s Crit­i­cal Ap­praisal. #Forecasting

Korzekwa’s The un­ex­pected difficulty of com­par­ing AlphaS­tar to hu­mans analy­ses AlphaS­tar’s perfor­mance against hu­man StarCraft play­ers. It con­vinc­ingly, in my in­ex­pert judge­ment, ar­gues that the ‘un­fair’ ad­van­tages of AlphaS­tar—like the clicks-per-minute rate, and lack of visi­bil­ity re­stric­tions—we sig­nifi­cant con­trib­u­tors to AlphaS­tar’s suc­cess. As such, on an ap­ples-to-ap­ples ba­sis it seems that hu­mans have not yet been defeated at Star­craft. #Misc

AI Im­pacts’s His­tor­i­cal Eco­nomic Growth Trends ar­gues that his­tor­i­cally eco­nomic growth has been su­per-lin­ear in pop­u­la­tion size. As such we should ex­pect ac­cel­er­at­ing growth ‘by de­fault’ - “Ex­trap­o­lat­ing this model im­plies that at a time when the econ­omy is grow­ing 1% per year, growth will di­verge to in­finity af­ter about 200 years”. This is very in­ter­est­ing to me as it con­tra­dicts what I sug­gested here. Notably growth has slowed since 1950, per­haps for an­thropic rea­sons. #Forecasting

AI Im­pacts’s AI Con­fer­ence At­ten­dance plots at­ten­dance at the ma­jor AI con­fer­ences over time to show the re­cent rapid growth in the field us­ing a rel­a­tively sta­ble mea­sure. #Forecasting


They spent $316,398 in 2019, and plan to spend around $325,000 in 2020. They have around $269,590 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 0.8 years of run­way.

In the past they have re­ceived sup­port from EA or­gani­sa­tions like OpenPhil and FHI.

MIRI ad­ministers their fi­nances on their be­half; dona­tions can be made here.

GPI: The Global Pri­ori­ties Institute

GPI is an Oxford-based Aca­demic Pri­ori­ties Re­search or­gani­sa­tion founded in 2018 by Hilary Greaves and part of Oxford Univer­sity. They do work on philo­soph­i­cal is­sues likely to be very im­por­tant for global pri­ori­ti­sa­tion, much of which is, in my opinion, rele­vant to AI Align­ment work. Their re­search can be found here.


MacAskill (ar­ti­cle) & Dem­ski (ex­ten­sive com­ments)‘s A Cri­tique of Func­tional De­ci­sion The­ory gives some crit­i­cisms of FDT. He makes a va­ri­ety of ar­gu­ments, though I gen­er­ally found them un­con­vinc­ing. For ex­am­ple, the ‘Bomb’ ex­am­ple seemed to be ba­si­cally ques­tion-beg­ging on New­comb’s prob­lem, and his Scots vs English ex­am­ple (where Scot­tish peo­ple choose to one-box be­cause of their an­ces­tral mem­ory of the Darien scheme) seems to me to be a case of peo­ple not ac­tu­ally em­ploy­ing FDT at all. And some of his ar­gu­ments—like that it is too com­pli­cated for hu­mans to ac­tu­ally calcu­late—seem like the same ar­gu­ments he would re­ject as crit­i­cisms of util­i­tar­i­anism, and not rele­vant to some­one work­ing on AGI. I listed this as co-writ­ten by Abram Dem­ski be­cause he is ac­knowl­edged in the post, and his com­ments at the bot­tom are as de­tailed as wor­thy as the main post it­self, and I recom­mend read­ing the two to­gether. Re­searchers from MIRI were also named au­thors on the pa­per. #De­ci­sion_Theory

Greaves & Cot­ton-Bar­ratt’s A bar­gain­ing-the­o­retic ap­proach to moral un­cer­tainty lays out for­mal­ism and dis­cusses us­ing Nash Equil­ibrium be­tween ‘ne­go­ti­at­ing’ moral val­ues as an al­ter­na­tive ap­proach to moral un­cer­tainty. It dis­cusses some sub­tle points about the se­lec­tion of the BATNA out­come. One in­ter­est­ing sec­tion was on small vs grand wor­lds—whether split­ting the world up into sub-dilem­mas made a differ­ence. For ex­pected-value type ap­proaches the an­swer is no, but for ne­go­ti­at­ing strate­gies the an­swer is yes, be­cause the differ­ent moral the­o­ries might trade so as to in­fluence the dilem­mas that mat­tered most to them. This re­minded me of an ar­gu­ment from Wei Dai that agents who cared about to­tal value, find­ing them­selves in a small world, might acausally trade with av­er­age value agents in large wor­lds. Pre­sum­ably a prac­ti­cal im­pli­ca­tion might be that EAs should ad­here to con­ven­tional moral stan­dards with even higher than usual moral fidelity, in ex­change for shut­ting up and mul­ti­ply­ing on EA is­sues. The pa­per also makes in­ter­est­ing points about the fa­nat­i­cism ob­jec­tion and the differ­ence be­tween moral and em­piri­cal risk. Re­searchers from FHI were also named au­thors on the pa­per. #De­ci­sion_Theory

MacAskill et al.’s The Ev­i­den­tial­ist’s Wager ar­gues that De­ci­sion-The­o­retic un­cer­tainty in a large uni­verse favours EDT over CDT. This is be­cause your de­ci­sion only has lo­cal causal im­pli­ca­tions, but global ev­i­den­tial im­pli­ca­tions. The ar­ti­cle then goes into de­tail mo­ti­vat­ing the idea and dis­cussing var­i­ous com­pli­ca­tions and ob­jec­tions. It seems to push EDT in an FDT-di­rec­tion, though pre­sum­ably they still di­verge on smok­ing le­sion ques­tions. Re­searchers from FHI, FRI were also named au­thors on the pa­per. #De­ci­sion_Theory

Mo­gensen’s ‘The only eth­i­cal ar­gu­ment for pos­i­tive 𝛿 ’? ar­gues that pos­i­tive pure time prefer­ence could be jus­tified through agent-rel­a­tive obli­ga­tions. This was an in­ter­est­ing pa­per to me, and sug­gests some in­ter­est­ing (ex­tremely spec­u­la­tive) ques­tions—e.g. can we, by in­creas­ing out re­lat­ed­ness to our an­ces­tors, acausally in­fluence them into treat­ing us bet­ter? #Philosophy

Mo­gensen’s Dooms­day rings twice at­tempts to sal­vage the Dooms­day ar­gu­ment by sug­gest­ing we should up­date us­ing SSA twice. He ar­gues the sec­ond such up­date—on the fact that the pre­sent-day seems un­usu­ally in­fluen­tial—can­not be ‘can­cel­led out’ by SIA. #Philosophy


They spent £600,000 in 2018/​2019 (aca­demic year), and plan to spend around £1,400,000 in 2019/​2020. They sug­gested that as part of Oxford Univer­sity ‘cash on hand’ or ‘run­way’ were not re­ally mean­ingful con­cepts for them, as they need to fully-fund all em­ploy­ees for mul­ti­ple years.

If you want to donate to GPI, you can do so here.

FRI: The Foun­da­tional Re­search In­sti­tute

FRI is a Lon­don (pre­vi­ously Ger­many) based Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2013 cur­rently led by Ste­fan Torges and Jonas Vol­lmer. They are part of the Effec­tive Altru­ism Foun­da­tion (EAF) and do re­search on a num­ber of fun­da­men­tal long-term is­sues, some re­lated how to re­duce the risks of very bad AGI out­comes.

In gen­eral they adopt what they re­fer to as ‘suffer­ing-fo­cused’ ethics, which I think is a quite mis­guided view. How­ever, they seem to have ap­proached this thought­fully.

Ap­par­ently this year they are more fo­cused on re­search, vs move­ment-build­ing and dona­tion-rais­ing in pre­vi­ous years.


FRI re­searchers were not lead au­thor on any work di­rectly rele­vant to AI Align­ment (un­like last year, where they had four pa­pers).

FRI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:


EAF (of which they are a part) spent $836,622 in 2018 and $1,125,000 in 2019, and plan to spend around $995,000 in 2020. They have around $1,430,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.4 years of run­way.

Ac­cord­ing to their web­site, their fi­nances are not sep­a­rated from those of the EAF, and it is not pos­si­ble to ear-mark dona­tions. In the past this has made me worry about fun­gi­bil­ity; dona­tions fund­ing other EAF work. How­ever ap­par­ently EAF ba­si­cally doesn’t do any­thing other than FRI now.

If you wanted to donate to FRI, you could do so here.

Me­dian Group

Me­dian is a Berkeley based in­de­pen­dent AI Strat­egy or­gani­sa­tion founded in 2018 by Jes­sica Tay­lor, Bryce Hidy­smith, Jack Gal­lagher, Ben Hoff­man, Col­leen McKen­zie, and Baeo Malt­in­sky. They do re­search on var­i­ous risks, in­clud­ing AI timelines. Their re­search can be found here.


Malt­in­sky et al.’s Fea­si­bil­ity of Train­ing an AGI us­ing Deep RL:A Very Rough Es­ti­mate build a model for how plau­si­ble one method of achiev­ing AGI is. The the­ory is that you could ba­si­cally simu­late a bunch of peo­ple and have them work on the prob­lem. Their model sug­gests this is not a cred­ible way of pro­duc­ing AGI in the near term. I like the way they in­cluded their code in the ac­tual re­port. #Forecasting

Tay­lor et al.’s Re­vis­it­ing the In­sights model im­proved their In­sights model from last year. If you re­call this ba­si­cally used a pareto dis­tri­bu­tion for of many ge­nius in­sights were re­quired to get us to AGI. #Forecasting

The fol­low­ing was writ­ten by Jes­sica but not as an offi­cial Me­dian piece:

Tay­lor’s The AI Timelines Scam ar­gues that there are sys­tem­atic bi­ases that lead peo­ple to ex­ag­ger­ate how short AI timelines are. One is that peo­ple who es­pouse short timelines tend to also ar­gue for some amount of se­crecy due to In­fo­haz­ards, which makes their work hard for out­siders to au­dit. A sec­ond is that cap­i­tal al­lo­ca­tors tend to fund those who dream BIG, lead­ing to sys­tem­atic ex­ag­ger­a­tion of your field’s po­ten­tial. I think both are rea­son­able points, but I think she is too quick to use the term ‘scam’ - as in Scott’s Against Lie In­fla­tion. Speci­fi­cally, while it is true that se­crecy is a great cover for medi­ocrity, it is un­for­tu­nately also ex­actly what a morally vir­tu­ous agent would have to do in the pres­ence of in­fo­haz­ards. In­deed, such peo­ple might be ar­tifi­cially limited in what they can say, mak­ing short time hori­zons ap­pear ar­tifi­cially de­void of cred­ible ar­gu­ments. I am more sym­pa­thetic to her sec­ond ar­gu­ment, but even there to the ex­tent that 1) fields se­lect for peo­ple who be­lieve in them and 2) peo­ple be­lieve what is use­ful for them to be­lieve I think it is a bit harsh to call it a ‘scam’. #Forecasting


They spent ~$0 in 2018 and 2019, and plan to spend above $170000 in 2020. They have around $170000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) un­der 1 years of run­way.

Me­dian doesn’t seem to be so­lic­it­ing dona­tions from the gen­eral pub­lic at this time.

CSET: The Cen­ter for Se­cu­rity and Emerg­ing Technology

CSET is a Wash­ing­ton based Think Tank founded in 2019 by Ja­son Ma­theny (ex IARPA), af­fili­ated with the Univer­sity of Ge­orge­town. They analyse new tech­nolo­gies for their se­cu­rity im­pli­ca­tions and provide ad­vice to the US gov­ern­ment. At the mo­ment they are mainly fo­cused on near-term AI is­sues. Their re­search can be found here.

As they ap­par­ently launched with $55m from the Open Philan­thropy Pro­ject, and sub­se­quently raised money from the Hewlett Foun­da­tion, I am as­sum­ing they do not need more dona­tions at this time.

Lev­er­hulme Cen­ter for the Fu­ture of Intelligence

Lev­er­hulme is a Cam­bridge based Re­search or­gani­sa­tion founded in 2015 and cur­rently led by Stephen Cave. They are af­fili­ated with Cam­bridge Univer­sity and closely liked to CSER. They do work on a va­ri­ety of AI re­lated causes, mainly on near-term is­sues but also some long-term. You can find their pub­li­ca­tions here.


Lev­er­hulme-af­fili­ated re­searchers pro­duced work on a va­ri­ety of top­ics; I have only here sum­marised that which seemed the most rele­vant.

Her­nan­dez-Orallo et al.’s Sur­vey­ing Safety-rele­vant AI Char­ac­ter­is­tics pro­vides a sum­mary of the prop­er­ties of AI sys­tems that are rele­vant for safety. This in­cludes both in­nate prop­er­ties of the sys­tem (like abil­ity to self-mod­ify or in­fluence its re­ward sig­nal) and of the en­vi­ron­ment. Some of these char­ac­ter­is­tics are rel­a­tively well-es­tab­lished in the liter­a­ture, but oth­ers seemed rel­a­tively new (to me at least). A few but not most seemed only re­ally rele­vant to near-time safety is­sues (like the need for spare bat­ter­ies). Re­searchers from CSER, Lev­er­hulme were also named au­thors on the pa­per. #Overview

Cave & Ó hÉigeartaigh’s Bridg­ing near- and long-term con­cerns about AI at­tempt to unify short-term and long-term AI risk con­cerns. For ex­am­ple, they ar­gue that solv­ing short-term is­sues can help with long-term ones, and that long-term is­sues will even­tu­ally be­come short-term is­sues. How­ever, I am in­clined to agree with the re­view here by Habryka that a lot of the work here is be­ing done by cat­e­goris­ing un­em­ploy­ment and au­tonomous ve­hi­cles as long-term, and then ar­gu­ing that they share many fea­tures with short-term is­sues. I agree that they have a lot in com­mon; how­ever this seems to be be­cause un­em­ploy­ment and cars are also short-term is­sues—or short-term non-is­sues in my mind. The pa­per does not pre­sent a com­pel­ling ar­gu­ment for why short-term is­sues have a lot in com­mon with ex­is­ten­tial risk work, which is what we care about. But per­haps this is be­ing too harsh, and the pa­per is bet­ter un­der­stood perfor­ma­tively; it is not at­tempt­ing to ar­gue that the two camps are nat­u­rally al­lied, but rather at­tempt­ing to make them al­lies. Re­searchers from CSER, Lev­er­hulme were also named au­thors on the pa­per. #Strategy

Whit­tle­stone et al.‘s The Role and Limits of Prin­ci­ples in AI Ethics: Towards a Fo­cus on Ten­sions points out that many of the ‘val­ues’ that laypeo­ple say AI sys­tems should ob­serve, like ‘fair­ness’, are fre­quently in con­flict. This is cer­tainly a big im­prove­ment over the typ­i­cal ar­ti­cle on the sub­ject. #Shortterm

Lev­er­hulme re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

BERI: The Berkeley Ex­is­ten­tial Risk Ini­ti­a­tive

BERI is a Berkeley-based in­de­pen­dent Xrisk or­gani­sa­tion, founded and led by An­drew Critch. They provide sup­port to var­i­ous uni­ver­sity-af­fili­ated (FHI, CSER, CHAI) ex­is­ten­tial risk groups to fa­cil­i­tate ac­tivi­ties (like hiring en­g­ineers and as­sis­tants) that would be hard within the uni­ver­sity con­text, alongside other ac­tivi­ties—see their FAQ for more de­tails.


BERI used to run a grant-mak­ing pro­gram where they helped Jaan Tal­linn al­lo­cate money to Xrisk causes. Mid­way through this year, BERI de­cided to hand this off to the Sur­vival and Flour­ish­ing Fund, a donor-ad­vised fund cur­rently ad­vised by the same team who run BERI.

In this time pe­riod (De­cem­ber 2018-Novem­ber 2019) BERI granted $1,615,933, mainly to large Xrisk or­gani­sa­tions. The largest sin­gle grant was $600,000 to MIRI.


A num­ber of pa­pers we re­viewed this year were sup­ported by BERI, for ex­am­ple:

Be­cause this sup­port tended not to be men­tioned on the front page of the ar­ti­cle (un­like di­rect af­fili­a­tion) it is quite pos­si­ble that I missed other pa­pers they sup­ported also.


BERI have told me they are not seek­ing pub­lic sup­port at this time. If you wanted to donate any­way their donate page is here.

AI Pulse

The Pro­gram on Un­der­stand­ing Law, Science, and Ev­i­dence (PULSE) is part of the UCLA School of Law, and con­tains a group work­ing on AI policy. They were founded in 2017 with a $1.5m grant from OpenPhil.

Their web­site lists a few pieces of re­search, gen­er­ally on more near-term AI policy is­sues. A quick read sug­gested they were gen­er­ally fairly well done. How­ever, they don’t seem to have up­loaded any­thing since Fe­bru­ary.


Ster­benz & Trager’s Au­tonomous Weapons and Co­er­cive Threats dis­cusses the im­pact of Lethal Au­tonomous Weapons on diplo­macy. #Shortterm

Grotto’s Ge­net­i­cally Mod­ified Or­ganisms: A Pre­cau­tion­ary Tale for AI Gover­nance dis­cusses the his­tory of GMO reg­u­la­tion in the US and EU. He brings up some in­ter­est­ing points about the highly con­tin­gent his­tory be­hind the differ­ent ap­proaches taken. How­ever, I am some­what scep­ti­cal GMOs are that good a com­par­i­son, given their fun­da­men­tally differ­ent na­ture. #Strategy

Other Research

I would like to em­pha­size that there is a lot of re­search I didn’t have time to re­view, es­pe­cially in this sec­tion, as I fo­cused on read­ing or­gani­sa­tion-dona­tion-rele­vant pieces. So please do not con­sider it an in­sult that your work was over­looked!

Naude & Dimitri’s The race for an ar­tifi­cial gen­eral in­tel­li­gence: im­pli­ca­tions for pub­lic policy ex­tends the model in Rac­ing to the Precipice (Arm­strong et al.) After a lengthy in­tro­duc­tion to AI al­ign­ment, they make a for­mal model, con­clud­ing that a win­ner-take-all con­test will have very few teams com­pet­ing (which is good) In­ter­est­ingly if the teams are con­cerned about cost min­imi­sa­tion this re­sult no longer holds, as the ‘best’ team might not in­vest 100%, so the sec­ond-best team still has a chance, but the pres­ence of in­ter­me­di­ate prizes is pos­i­tive, as they in­cen­tivise more in­vest­ment. They sug­gest pub­lic pro­cure­ment to steer AI de­vel­op­ment in a safe di­rec­tion, and an un­safety-tax. (as a very minor aside, I was a lit­tle sur­prised to see the AIIm­pacts sur­vey cited as a source for ex­pected Sin­gu­lar­ity timing given that it does not men­tion the word.) Over­all I thought this was an ex­cel­lent pa­per. #Strategy

Stein­hardt’s AI Align­ment Re­search Overview pro­vides a de­tailed ac­count of the differ­ent com­po­nents of AI Align­ment work. I think this prob­a­bly takes over from Amodei et al.’s Con­crete Prob­lems (on which Ja­cob was a co-au­thor) as my favour in­tro­duc­tion to tech­ni­cal work, for helping new re­searchers lo­cate them­selves, with the one pro­viso that it is only in Google Docs form at the mo­ment. He pro­vides a use­ful tax­on­omy, goes into sig­nifi­cant de­tail on the differ­ent prob­lems, and sug­gests pos­si­ble av­enues of at­tack. The only area that struck me as a lit­tle light was on some of the MIRI-style agent foun­da­tions is­sues. Over­all I thought this was an ex­cel­lent pa­per. #Overview

Piper’s The case for tak­ing AI se­ri­ously as a threat to hu­man­ity is an in­tro­duc­tion to AI safety for Vox read­ers. In my opinion it is the best non-tech­ni­cal in­tro­duc­tion to the is­sue I have seen. It has be­come my go-to for link­ing peo­ple and read­ing groups. The ar­ti­cle does a good job in­tro­duc­ing the is­sues in a per­sua­sive and com­mon-sense way with­out much loss of fidelity. My only gripe is the ar­ti­cle un­ques­tion­ingly re­peats an ar­gu­ment about crim­i­nal jus­tice ‘dis­crim­i­na­tion’ which has, in my opinion, been de­bunked (see here and the Wash­ing­ton Post ar­ti­cle linked at the bot­tom), but per­haps this is a nec­es­sary con­ces­sion when writ­ing for Vox, and is only a very small part of the ar­ti­cle. Over­all I thought this was an ex­cel­lent pa­per. #Introduction

Co­hen et al.’s Asymp­tot­i­cally Unam­bi­tious Ar­tifi­cial Gen­eral In­tel­li­gence am­bi­tiously aims to provide an al­igned AI al­gorithm. They do this by ba­si­cally us­ing an ex­tremely my­opic form of boxed or­a­cle AIXI, that doesn’t care about any re­wards af­ter the box has been opened—so all it cares about is get­ting re­wards for an­swer­ing the ques­tion well in­side the box. It is in­differ­ent to what the hu­man does with the re­ward once out­side the box. This as­sumes the AIXI can­not in­fluence the world with­out de­tectably open­ing the box. This also aims to avoid the re­ward-hack­ing prob­lems of AIXI. You might also en­joy the com­ments here. #AI_Theory

Sny­der-Beat­tie et al.’s An up­per bound for the back­ground rate of hu­man ex­tinc­tion uses a Laplace’s law of suc­ces­sion-style ap­proach to bound non-an­thro­pogenic Xrisk. Given how long mankind has sur­vived so far, they con­clude that this is ex­tremely un­likely to be greater than 1/​14000, and prob­a­bly much lower. Notably, they ar­gue that these es­ti­mates are not sig­nifi­cantly bi­ased by an­thropic is­sues, be­cause high base ex­tinc­tion rates mean lucky hu­man ob­servers would be clus­tered in wor­lds where civil­i­sa­tion also de­vel­oped very quickly, and hence also ob­serve short his­to­ries. Ob­vi­ously they can only provide an up­per bound us­ing such meth­ods, so I see the pa­per as mainly pro­vid­ing ev­i­dence we should in­stead fo­cus on an­thro­pogenic risks, for which no such bound can ex­ist. Re­searchers from FHI were also named au­thors on the pa­per. #Forecasting

Dai’s Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to pro­vides a list of open philo­soph­i­cal ques­tions that mat­ter for AI safety. This seems use­ful in­so­much as there are peo­ple ca­pa­ble of work­ing on many differ­ent philo­soph­i­cal is­sues and will­ing to be redi­rected to more use­ful ones. #Overview

Dai’s Two Ne­glected Prob­lems in Hu­man-AI Safety dis­cusses two dan­ger modes for oth­er­wise be­nign-seem­ing ap­proval-ori­en­tated AIs. I thought this was good as it is po­ten­tially a very ‘sneeky’ way in which hu­man value might be lost, at the hands of agents which oth­er­wise ap­peared ex­tremely cor­rigible etc. #Forecasting

Agrawal et al.’s Scal­ing up Psy­chol­ogy via Scien­tific Re­gret Min­i­miza­tion:A Case Study in Mo­ral De­ci­sion-Mak­ing sug­gests that, in cases with large amounts data plus noise, hu­man-in­ter­pretable mod­els could be eval­u­ated rel­a­tive to ML pre­dic­tions rather than the un­der­ly­ing data di­rectly. In par­tic­u­lar, they do this with the big Mo­ral Ma­chine dataset, com­par­ing sim­ple hu­man-in­ter­pretable rules (like hu­mans are worth more than an­i­mals, or crim­i­nals are worth less) with their NN. This sug­gests a multi-step pro­gram for friendli­ness: 1) gather data 2) train ML on data 3) eval­u­ate sim­ple hu­man-evaluable rules on ML 4) have hu­mans eval­u­ate these rules. Re­searchers from CHAI were also named au­thors on the pa­per. #Eth­i­cal_Theory

Krueger et al.‘s Mislead­ing Meta-Ob­jec­tives and Hid­den In­cen­tives for Distri­bu­tional Shift dis­cusses the dan­ger of RL agents be­ing in­cen­tivised to in­duce dis­tri­bu­tional shift. This is in con­trast to what I think of as the ‘stan­dard’ worry about dis­tri­bu­tional shift, namely aris­ing as a side effect of in­creas­ing agent op­ti­mi­sa­tion power. They then in­tro­duce a model to demon­strate this be­havi­our, but I had a lit­tle trou­ble un­der­stand­ing ex­actly how this bit was meant to work. Re­searchers from Deep­mind were also named au­thors on the pa­per. #ML_safety

Zhang & Dafoe’s Ar­tifi­cial In­tel­li­gence: Amer­i­can At­ti­tudes and Trends sur­veys the views of or­di­nary peo­ple about AI. They used YouGov, who I gen­er­ally re­gard as one of the best pol­ling agen­cies. The sur­vey did a good job of show­ing that the gen­eral pub­lic is gen­er­ally very ig­no­rant and sus­cep­ti­ble to fram­ing effects. Re­spon­dents ba­si­cally thought that ev­ery­one po­ten­tial AI ‘prob­lem’ was roughly equally im­por­tant. When read­ing this I think it is worth keep­ing the gen­eral liter­a­ture on voter ir­ra­tional­ity in mind—e.g. Bryan Ca­plan’s The Myth of the Ra­tional Voter or Scott’s Noisy Poll Re­sults and Rep­tilian Mus­lim Cli­ma­tol­o­gists from Mars. Re­searchers from FHI were also named au­thors on the pa­per. #Politics

Cot­tier & Shah’s Clar­ify­ing some key hy­pothe­ses in AI al­ign­ment is a map of the con­nec­tions be­tween differ­ent ideas in AI safety. Re­searchers from CHAI were also named au­thors on the pa­per. #Overview

Ovadya & Whit­tle­stone’s Re­duc­ing Mal­i­cious Use of Syn­thetic Me­dia Re­search: Con­sid­er­a­tions and Po­ten­tial Re­lease Prac­tices for Ma­chine Learn­ing dis­cusses var­i­ous ways of im­prov­ing the safety of ML re­search re­lease. While synth me­dia is the titu­lar sub­ject, most of it is more gen­eral, with fairly de­tailed de­scrip­tions of var­i­ous strate­gies. While I don’t think synth me­dia is very im­por­tant, it could be use­ful for build­ing norms in ML that would ap­ply to AGI work also. The pa­per dis­cusses bioethics at length, e.g. how they use IRBs. My per­sonal im­pres­sion of IRBs is they are largely pointless and have lit­tle to do with ethics, func­tion­ing mainly to slow things down and tick boxes, but then again that might be de­sir­able for AI re­search! Re­searchers from CSER, Lev­er­hulme were also named au­thors on the pa­per. #Security

Sch­warz’s On Func­tional De­ci­sion The­ory is a blog post by one of the philoso­phers who re­viewed Eliezer and Nate’s pa­per on FDT. It ex­plains his ob­jec­tions, and why the pa­per was re­jected from the philos­o­phy jour­nal he was a re­viewer for. The key thing I took away from it was that MIRI did not do a good job of lo­cat­ing their work within the broader liter­a­ture—for ex­am­ple, he ar­gues that FDT seems like it might ac­tu­ally be a spe­cial case of CDT as con­strued by some philoso­phers, which E&N should have ad­dressed, and el­se­where he sug­gests E&N’s crit­i­cisms of CDT and EDT pre­sent straw­men. He also made some in­ter­est­ing points, for ex­am­ple that it seems ‘FDT will some­times recom­mend choos­ing a par­tic­u­lar act be­cause of the ad­van­tages of choos­ing a differ­ent act in a differ­ent kind of de­ci­sion prob­lem’. How­ever most of the sub­stan­tive crit­i­cisms were not very per­sua­sive to me. Some seemed to al­most beg the ques­tion, and at other times he es­sen­tially faulted FDT for ad­dress­ing di­rectly is­sues which any de­ci­sion the­ory will ul­ti­mately have to ad­dress, like log­i­cal coun­ter­fac­tu­als, or what is a ‘Fair’ sce­nario. He also pre­sented a sce­nario, ‘Pro­cre­ation’, as an in­tended Re­duc­tio of FDT that ac­tu­ally seems to me like a sce­nario where FDT works bet­ter than CDT does. #De­ci­sion_Theory

LeCun et al.’s De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More was a pub­lic de­bate on Face­book be­tween ma­jor figures in AI on the AI safety is­sue. Many of these have been promi­nently dis­mis­sive in the past, so this was good to see. Un­for­tu­nately a lot of the de­bate was not at a very high level. It seemed that the scep­tics gen­er­ally agreed it was im­por­tant to work on AI safety, just that this work was likely to hap­pen by de­fault. #Misc

Dai’s Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to pro­vides a list of is­sues for philoso­phers who want to work on the cause with­out math back­grounds. I think this is po­ten­tially very use­ful if brought to the no­tice of the rele­vant peo­ple, as the top­ics on the list seem use­ful things to work on, and I can eas­ily imag­ine peo­ple not be­ing aware of all of them. #Overview

Walsh’s End Times: A Brief Guide to the End of the World is a pop­u­lar sci­ence book on ex­is­ten­tial risk. AI risk is one of the seven is­sues ad­dressed, in an ex­tended and well-re­searched chap­ter. While I might quib­ble with one or two points, over­all I thought this was a good in­tro­duc­tion. The main qual­ifier for your opinion here is how valuable you think out­reach to the ed­u­cated lay­man is. #Introduction

Szlam et al.‘s Why Build an As­sis­tant in Minecraft? sug­gest a re­search pro­gram for build­ing an in­tel­li­gent as­sis­tant for Minecraft. The pro­gram doesn’t ap­pear to be di­rectly mo­ti­vated by AI al­ign­ment, but it does seem un­usual in the de­gree to which al­ign­ment-type-is­sues would have to be solved for it to suc­ceed—thereby hope­fully in­cen­tivis­ing main­stream ML guys to work on them. In par­tic­u­lar, they want the agent to be able to work out ‘what you wanted’ from a nat­u­ral lan­guage text chan­nel, which is clearly linked to the Value Align­ment prob­lem, and similar is­sues like the higher op­ti­mi­sa­tion power of the agent are likely to oc­cur. The idea that the agent should be ‘fun’ is also po­ten­tially rele­vant! The au­thors also re­leased an en­vi­ron­ment to make mak­ing these as­sis­tants eas­ier. #Misc

Ku­mar et al.’s Failure Modes in Ma­chine Learn­ing is a Microsoft doc­u­ment dis­cussing a va­ri­ety of ways ML sys­tems can go wrong. It in­cludes both in­ten­tional (e.g. hack­ing) and un­in­ten­tional (e.g. the sort of thing we worry about). #Misc

Sevilla & Moreno’s Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Re­search ex­am­ines whether Quan­tum Com­put­ing would be use­ful for AI Align­ment. They con­sider three rele­vant prop­er­ties of QC and sev­eral ap­proaches to AI Align­ment, and con­clude that QC is not es­pe­cially rele­vant. #Forecasting

Col­lins’s Prin­ci­ples for the Ap­pli­ca­tion of Hu­man In­tel­li­gence analy­ses the prob­lems of bi­ased and non-trans­par­ent de­ci­sion mak­ing by nat­u­ral in­tel­li­gence sys­tems. #Shortterm

Cap­i­tal Allocators

One of my goals with this doc­u­ment is to help donors make an in­formed choice be­tween the differ­ent or­gani­sa­tions. How­ever, it is quite pos­si­ble that you re­gard this as too difficult, and wish in­stead to donate to some­one else who will al­lo­cate on your be­half. This is of course much eas­ier; now in­stead of hav­ing to solve the Or­gani­sa­tion Eval­u­a­tion Prob­lem, all you need to do is solve the dra­mat­i­cally sim­pler Or­gani­sa­tion Eval­u­a­tor Or­gani­sa­tion Eval­u­a­tion Prob­lem.

A helpful map from Issa Rice shows how at the mo­ment the com­mu­nity has only man­aged to achieve del­ega­tive fund­ing chains 6 links long. If you donate to Pa­trick Brinich-Lan­glois, we can make this chain sig­nifi­cantly longer! In re­al­ity this is a quite mis­lead­ing way of phras­ing the is­sue of course, as for most of these or­gani­sa­tions the ‘flow-through’ is a rel­a­tively small frac­tion. I do think it is valid to be con­cerned about sub-op­ti­mally high lev­els of in­ter­me­di­a­tion how­ever, which if noth­ing else re­duces donor con­trol. This seems to me to be a weak ar­gu­ment against del­e­gat­ing dona­tions.

LTFF: Long-term fu­ture fund

LTFF is a globally based EA grant­mak­ing or­gani­sa­tion founded in 2017, cur­rently led by Matt Wage and af­fili­ated with CEA. They are one of four funds set up by CEA to al­low in­di­vi­d­ual donors to benefit from spe­cial­ised cap­i­tal al­lo­ca­tors; this one fo­cuses on long-term fu­ture is­sues, in­clud­ing a large fo­cus on AI Align­ment. Their web­site is here. There are write-ups for their first two grant rounds in 2019 here and here, and com­ments here and here. Ap­par­ently they have done an­other $400,000 round since then but the de­tails are not yet pub­lic.

In the past I have been scep­ti­cal of the fund, as it was run by some­one who already had ac­cess to far more cap­i­tal (OpenPhil), and the grants were both in­fre­quent and rel­a­tively con­ser­va­tive – giv­ing to large or­gani­sa­tions that in­di­vi­d­ual donors are perfectly ca­pa­ble of eval­u­at­ing them­selves. Over the last year, how­ever, things have sig­nifi­cantly changed. The fund is now run by four peo­ple, and the grants have been to a much wider va­ri­ety of causes, many of which would sim­ply not be ac­cessible to in­di­vi­d­ual donors.

The fund man­agers are:

  • Matt Wage

  • He­len Toner

  • Oliver Habryka

  • Alex Zhu

Oliver Habryka es­pe­cially has been ad­mirably open with lengthy write-ups about his thoughts on the differ­ent grants, and I ad­mire his com­mit­ment to in­tel­lec­tual in­tegrity (you might en­joy his com­ments here). I am less fa­mil­iar with the other fund man­agers. All the man­agers are, to my knowl­edge, un­paid.

In gen­eral most of the grants seem at least plau­si­bly valuable to me, and many seemed quite good in­deed. As there is ex­ten­sive dis­cus­sion in the links above I shan’t dis­cuss my opinions of in­di­vi­d­ual grants in de­tail.

I at­tempted to clas­sify the recom­mended (in­clud­ing those not ac­cepted by CEA) by type and ge­og­ra­phy. Note that ‘train­ing’ means pay­ing an in­di­vi­d­ual to self-study. I have de­liber­ately omit­ted the ex­act per­centages be­cause this is an in­for­mal clas­sifi­ca­tion.

Of these cat­e­gories, I am most ex­cited by the In­di­vi­d­ual Re­search, Event and Plat­form pro­jects. I am gen­er­ally some­what scep­ti­cal of pay­ing peo­ple to ‘level up’ their skills.

I can un­der­stand why the fund man­agers gave over a quar­ter of the funds to ma­jor or­gani­sa­tions – they thought these or­gani­sa­tions were a good use of cap­i­tal! How­ever, to my mind this un­der­mines the pur­pose of the fund. (Many) in­di­vi­d­ual donors are perfectly ca­pa­ble of eval­u­at­ing large or­gani­sa­tions that pub­li­cly ad­ver­tise for dona­tions. In donat­ing to the LTFF, I think (many) donors are hop­ing to be fund­ing smaller pro­jects that they could not di­rectly ac­cess them­selves. As it is, such donors will prob­a­bly have to con­sider such or­gani­sa­tion al­lo­ca­tions a mild ‘tax’ – to the ex­tent that differ­ent large or­gani­sa­tions are cho­sen then they would have picked them­selves.

For a similar anal­y­sis, see Gaens­bauer’s com­ment here. I think his ‘coun­ter­fac­tu­ally unique’ (73%) roughly maps onto my ‘non-or­gani­sa­tion’.

CFAR, which the fund man­agers recom­mended $300,000, was the largest sin­gle in­tended benefi­ciary with just over 20% of the recom­men­da­tions.

All grants have to be ap­proved by CEA be­fore they are made; his­tor­i­cally they have ap­proved al­most all. In gen­eral I think these re­jec­tions im­proved the pro­cess. In ev­ery in­stance they were sub­se­quently funded by pri­vate donors any­way, but this does not seem to be a prob­lem for donors to the LTFF whose cap­i­tal is pro­tected. Notably this means the funds only paid out $150,000 to CFAR (10%), as the bal­ance was made up by a pri­vate donor af­ter CEA did not ap­prove the sec­ond grant.

I was not im­pressed that one grant that saw harsh and ac­cu­rate crit­i­cism on the fo­rum af­ter the first round was re-sub­mit­ted for the sec­ond round. Ex post this didn’t mat­ter as CEA re­jected it on sub­stan­tive grounds the sec­ond time, but it makes me some­what con­cerned about a risk of some of the cap­i­tal go­ing to­wards giv­ing sinecures to peo­ple who are in the com­mu­nity, rather than ob­jec­tive merit. But if CEA will con­sis­tently block this waste maybe this is not such a big is­sue, and the grant in ques­tion only rep­re­sented 1.3% of the to­tal for the year.

If you wish to donate to the LTFF you can do so here.

OpenPhil: The Open Philan­thropy Project

The Open Philan­thropy Pro­ject (sep­a­rated from Givewell in 2017) is an or­gani­sa­tion ded­i­cated to ad­vis­ing Cari and Dustin Moskovitz on how to give away over $15bn to a va­ri­ety of causes, in­clud­ing ex­is­ten­tial risk. They have made ex­ten­sive dona­tions in this area and prob­a­bly rep­re­sent both the largest pool of EA-al­igned cap­i­tal and the largest team of EA cap­i­tal al­lo­ca­tors.

They also re­cently an­nounced they would be work­ing with Ben Delo as well.

This year they im­ple­mented a spe­cial com­mit­tee for de­ter­min­ing grants to EA-re­lated or­gani­sa­tions.


You can see their grants for AI Risk here. It lists only made four AI Risk grants in 2019, though I think that their $500k grant to ESPR (The Euro­pean Sum­mer Pro­gram on Ra­tion­al­ity) should be con­sid­ered an AI Risk rele­vant grant also.:

In con­trast there are 11 AI Risk grants listed for 2018, though the to­tal dol­lar value is lower.

The OpenPhil AI Fel­low­ship ba­si­cally fully funds AI PhDs for stu­dents who want to work on the long term im­pacts of AI. One thing that I had mi­s­un­der­stood pre­vi­ously was these fel­low­ships are not in­tended to be spe­cific to AI safety, though pre­sum­ably their re­cip­i­ents are more likely to work on safety than the av­er­age ML PhD stu­dent. They funded 7 schol­ar­ships in 2018 and 8 in 2019

Due to a con­flict of in­ter­est I can­not make any eval­u­a­tion of their effec­tive­ness.


Most of their re­search con­cerns their own grant­ing, and in an un­usual failure of nom­i­na­tive de­ter­minism is non-pub­lic ex­cept for the short write-ups linked above.

Za­bel & Muehlhauser’s In­for­ma­tion se­cu­rity ca­reers for GCR re­duc­tion ar­gue that work­ing in In­foSec could be a use­ful ca­reer for re­duc­ing Xrisks, es­pe­cially AI and Bio. This is partly to help pre­vent AGI/​synth bio knowl­edge fal­ling into the hands of mal­i­cious hack­ers (though most ML re­search seems to be very open), and partly be­cause the field teaches var­i­ous skills that are use­ful for AI safety, both high-level like Eliezer’s Se­cu­rity Mind­set and tech­ni­cal like crypto. They sug­gested that there was a short­age of such peo­ple will­ing to work on Xrisk right now, and per­haps in the fu­ture, due to lu­cra­tive al­ter­na­tive em­ploy­ment op­tions. Re­searchers from Google Brain were also named au­thors on the pa­per. #Careers


To my knowl­edge they are not cur­rently so­lic­it­ing dona­tions from the gen­eral pub­lic, as they have a lot of money from Dustin and Cari, so in­cre­men­tal fund­ing is less of a pri­or­ity than for other or­gani­sa­tions. They could be a good place to work how­ever!

SFF: The Sur­vival and Flour­ish­ing Fund

SFF is a donor ad­vised fund, ad­vised by the peo­ple who make up BERI’s Board of Direc­tors. SFF was ini­tially funded in 2019 by a grant of ap­prox­i­mately $2 mil­lion from BERI, which in turn was funded by dona­tions from philan­thropist Jaan Tal­linn.


In its grant­mak­ing SFF used an in­no­va­tive al­lo­ca­tion pro­cess to com­bine the views of many grant eval­u­a­tors (de­scribed here). SFF has run two grant rounds thus far. The first ($880k in to­tal) fo­cused on large or­gani­sa­tions:

  • 80,000 Hours: $280k

  • CFAR: $110k

  • CSER: $40k

  • FLI: $130k

  • GCRI: $60k

  • LessWrong: $260k

The sec­ond round, re­quiring writ­ten ap­pli­ca­tions, dis­tributed money to a much wider va­ri­ety of pro­jects. The web­site lists 28 re­cip­i­ents, of which many but not all were AI rele­vant. The largest grant was for $300k to the Longevity Re­search In­sti­tute.

Due to a con­flict of in­ter­est I can­not eval­u­ate the effec­tive­ness of their grant­mak­ing.

Other News

80,000 Hours’s AI/​ML safety re­search job board col­lects var­i­ous jobs that could be valuable for peo­ple in­ter­ested in AI safety. At the time of writ­ing it listed 35 po­si­tions, all of which seemed like good op­tions that it would be valuable to have sen­si­ble peo­ple fill. I sus­pect most peo­ple look­ing for AI jobs would find some on here they hadn’t heard of oth­er­wise, though of course for any given per­son many will not be ap­pro­pri­ate. They also have job boards for other EA causes. #Careers

Brown & Sand­holm’s Su­per­hu­man AI for mul­ti­player poker pre­sent an AI that can beat pro­fes­sion­als in non-limit Texas hold’em. My un­der­stand­ing was that this was seen as sig­nifi­cantly harder than limit poker, so this rep­re­sents some­thing of a mile­stone. Un­like var­i­ous Deep­mind vic­to­ries at clas­sic games, this doesn’t seem to have re­quired much com­pute. #Misc

Chivers’s The AI Does Not Hate You: Su­per­in­tel­li­gence, Ra­tion­al­ity and the Race to Save the World is a jour­nal­is­tic ex­am­i­na­tion of the ra­tio­nal­ist com­mu­nity and the ex­is­ten­tial risk ar­gu­ment. I con­fess I haven’t ac­tu­ally read the book, and have very low ex­pec­ta­tions for jour­nal­ists in this re­gard, though Chivers is gen­er­ally very good, and by all ac­counts this is a very fair and in­for­ma­tive book. I’ve heard peo­ple recom­mend it as an ex­plainer to their par­ents. #Introduction

EU’s Ethics Guidelines for Trust­wor­thy Ar­tifi­cial In­tel­li­gence is a se­ries of ethics guidelines for AI in the EU. They re­ceived in­put from many groups, in­clud­ing CSER and Jaan Tal­linn. They are (at this time) op­tional guidelines, and pre­sum­ably will not ap­ply to UK AI com­pa­nies like Deep­mind af­ter Brexit. The guidelines seemed largely fo­cused on ba­nal state­ments about non-dis­crim­i­na­tion etc.; I could not find any men­tion of ex­is­ten­tial risk in the guidelines. In gen­eral I am not op­ti­mistic about poli­ti­cal solu­tions and this did not change my mind. #Politics

Kauf­man’s Uber Self-Driv­ing Crash con­vinc­ingly ar­gues that Uber was grossly neg­li­gent when their car hit and kil­led Elaine Herzberg last year. #Shortterm

Sch­midt et al.’s Na­tional Se­cu­rity Com­mis­sion on Ar­tifi­cial In­tel­li­gence In­terim Re­port sur­veys AI from a US defence per­spec­tive. It con­tains a few oblique refer­ences to AI risk. #Politics

Cum­mings’s On the refer­en­dum #31: Pro­ject Maven, pro­cure­ment, lol­la­palooza re­sults & nu­clear/​AGI safety dis­cusses var­i­ous im­por­tant trends, in­clud­ing a so­phis­ti­cated dis­cus­sion of AGI safety. This is mainly note­wor­thy be­cause the au­thor is the mas­ter­mind of Brexit and the re­cent Con­ser­va­tive land­slide in the UK, and per­haps the most in­fluen­tial man in the UK as a re­sult. #Strategy

Method­olog­i­cal Thoughts

In­side View vs Out­side View

This doc­u­ment is writ­ten mainly, but not ex­clu­sively, us­ing pub­li­cly available in­for­ma­tion. In the tra­di­tion of ac­tive man­age­ment, I hope to syn­the­sise many pieces of in­di­vi­d­u­ally well known facts into a whole which pro­vides new and use­ful in­sight to read­ers. Ad­van­tages of this are that 1) it is rel­a­tively un­bi­ased, com­pared to in­side in­for­ma­tion which in­vari­ably favours those you are close to so­cially and 2) most of it is leg­ible and ver­ifi­able to read­ers. The dis­ad­van­tage is that there are prob­a­bly many per­ti­nent facts that I am not a party to! Wei Dai has writ­ten about how much dis­cus­sion now takes place in pri­vate google doc­u­ments – for ex­am­ple this Drexler piece ap­par­ently; in most cases I do not have ac­cess to these. If you want the in­side scoop I am not your guy; all I can sup­ply is ex­te­rior scoop­ing.

Many cap­i­tal al­lo­ca­tors in the bay area seem to op­er­ate un­der a sort of Great Man the­ory of in­vest­ment, whereby the most im­por­tant thing is to iden­tify a guy who is re­ally clever and ‘gets it’. I think there is some merit in this; how­ever, I think I be­lieve in it much less than they do. Per­haps as a re­sult of my in­sti­tu­tional in­vest­ment back­ground, I place a lot more weight on his­tor­i­cal re­sults. In par­tic­u­lar, I worry that this ap­proach leads to over-fund­ing skil­led rhetori­ci­ans and those the in­vestor/​donor is so­cially con­nected to.

Judg­ing or­gani­sa­tions on their his­tor­i­cal out­put is nat­u­rally go­ing to favour more ma­ture or­gani­sa­tions. A new startup, whose value all lies in the fu­ture, will be dis­ad­van­taged. How­ever, I think that this is the cor­rect ap­proach for donors who are not tightly con­nected to the or­gani­sa­tions in ques­tion. The newer the or­gani­sa­tion, the more fund­ing should come from peo­ple with close knowl­edge. As or­gani­sa­tions ma­ture, and have more eas­ily ver­ifi­able sig­nals of qual­ity, their fund­ing sources can tran­si­tion to larger pools of less ex­pert money. This is how it works for star­tups turn­ing into pub­lic com­pa­nies and I think the same model ap­plies here. (I ac­tu­ally think that even those with close per­sonal knowl­edge should use his­tor­i­cal re­sults more, to help over­come their bi­ases.)

This judge­ment in­volves analysing a large num­ber of pa­pers re­lat­ing to Xrisk that were pro­duced dur­ing 2019. Hope­fully the year-to-year volatility of out­put is suffi­ciently low that this is a rea­son­able met­ric; I have tried to in­di­cate cases where this doesn’t ap­ply. I also at­tempted to in­clude pa­pers dur­ing De­cem­ber 2018, to take into ac­count the fact that I’m miss­ing the last month’s worth of out­put from 2019, but I can’t be sure I did this suc­cess­fully.

This ar­ti­cle fo­cuses on AI risk work. If you think other causes are im­por­tant too, your pri­ori­ties might differ. This par­tic­u­larly af­fects GCRI, FHI and CSER, who both do a lot of work on other is­sues which I at­tempt to cover but only very cur­so­rily.

We fo­cus on pa­pers, rather than out­reach or other ac­tivi­ties. This is partly be­cause they are much eas­ier to mea­sure; while there has been a large in­crease in in­ter­est in AI safety over the last year, it’s hard to work out who to credit for this, and partly be­cause I think progress has to come by per­suad­ing AI re­searchers, which I think comes through tech­ni­cal out­reach and pub­lish­ing good work, not pop­u­lar/​poli­ti­cal work.


My im­pres­sion is that policy on most sub­jects, es­pe­cially those that are more tech­ni­cal than emo­tional is gen­er­ally made by the gov­ern­ment and civil ser­vants in con­sul­ta­tion with, and be­ing lob­bied by, out­side ex­perts and in­ter­ests. Without ex­pert (e.g. top ML re­searchers in academia and in­dus­try) con­sen­sus, no use­ful policy will be en­acted. Push­ing di­rectly for policy seems if any­thing likely to hin­der ex­pert con­sen­sus. At­tempts to di­rectly in­fluence the gov­ern­ment to reg­u­late AI re­search seem very ad­ver­sar­ial, and risk be­ing pat­tern-matched to ig­no­rant techno­pho­bic op­po­si­tion to GM foods or other kinds of progress. We don’t want the ‘us-vs-them’ situ­a­tion that has oc­curred with cli­mate change, to hap­pen here. AI re­searchers who are dis­mis­sive of safety law, re­gard­ing it as an im­po­si­tion and en­cum­brance to be en­dured or evaded, will prob­a­bly be harder to con­vince of the need to vol­un­tar­ily be ex­tra-safe—es­pe­cially as the reg­u­la­tions may ac­tu­ally be to­tally in­effec­tive.

The only case I can think of where sci­en­tists are rel­a­tively happy about puni­tive safety reg­u­la­tions, nu­clear power, is one where many of those ini­tially con­cerned were sci­en­tists them­selves. Given this, I ac­tu­ally think policy out­reach to the gen­eral pop­u­la­tion is prob­a­bly nega­tive in ex­pec­ta­tion.

If you’re in­ter­ested in this, I’d recom­mend you read this blog post from last year.


I think there is a strong case to be made that open­ness in AGI ca­pac­ity de­vel­op­ment is bad. As such I do not as­cribe any pos­i­tive value to pro­grams to ‘de­moc­ra­tize AI’ or similar.

One in­ter­est­ing ques­tion is how to eval­u­ate non-pub­lic re­search. For a lot of safety re­search, open­ness is clearly the best strat­egy. But what about safety re­search that has, or po­ten­tially has, ca­pa­bil­ities im­pli­ca­tions, or other in­fo­haz­ards? In this case it seems best if the re­searchers do not pub­lish it. How­ever, this leaves fun­ders in a tough po­si­tion – how can we judge re­searchers if we can­not read their work? Maybe in­stead of do­ing top se­cret valuable re­search they are just slack­ing off. If we donate to peo­ple who say “trust me, it’s very im­por­tant and has to be se­cret” we risk be­ing taken ad­van­tage of by char­latans; but if we re­fuse to fund, we in­cen­tivize peo­ple to re­veal pos­si­ble in­fo­haz­ards for the sake of money. (Is it even a good idea to pub­li­cise that some­one else is do­ing se­cret re­search?)

With re­gard to pub­lished re­search, in gen­eral I think it is bet­ter for it to be open ac­cess, rather than be­hind jour­nal pay­walls, to max­imise im­pact. Re­duc­ing this im­pact by a sig­nifi­cant amount in or­der for the re­searcher to gain a small amount of pres­tige does not seem like an effi­cient way of com­pen­sat­ing re­searchers to me. Thank­fully this does not oc­cur much with CS pa­pers as they are all on arXiv, but it is an is­sue for some strat­egy pa­pers.

Similarly, it seems a bit of a waste to have to charge for books – ebooks have, af­ter all, no marginal cost – if this might pre­vent some­one from read­ing use­ful con­tent. There is also the same abil­ity for au­thors to trade off pub­lic benefit against pri­vate gain – by charg­ing more for their book, they po­ten­tially earn more, but at the cost of lower reach. As a re­sult, I am in­clined to give less credit for mar­ket-rate books, as the au­thor is already com­pen­sated and in­cen­tised by sales rev­enue.

More pro­saically, or­gani­sa­tions should make sure to up­load the re­search they have pub­lished to their web­site! Hav­ing gone to all the trou­ble of do­ing use­ful re­search it is a con­stant shock to me how many or­gani­sa­tions don’t take this sim­ple step to sig­nifi­cantly in­crease the reach of their work. Ad­di­tion­ally, sev­eral times I have come across in­cor­rect in­for­ma­tion on or­gani­sa­tion’s web­sites.

Re­search Flywheel

My ba­sic model for AI safety suc­cess is this:

  1. Iden­tify in­ter­est­ing problems

    1. As a byproduct this draws new peo­ple into the field through al­tru­ism, nerd-sniping, ap­par­ent tractability

  2. Solve in­ter­est­ing problems

    1. As a byproduct this draws new peo­ple into the field through cred­i­bil­ity and prestige

  3. Repeat

One ad­van­tage of this model is that it pro­duces both ob­ject-level work and field growth.

There is also some value in ar­gu­ing for the im­por­tance of the field (e.g. Bostrom’s Su­per­in­tel­li­gence) or ad­dress­ing crit­i­cisms of the field.

No­tice­ably ab­sent are strate­gic pieces. I find that a lot of these pieces do not add ter­ribly much in­cre­men­tal value. Ad­di­tion­ally, my sus­pi­cion strat­egy re­search is, to a cer­tain ex­tent, pro­duced ex­oge­nously by peo­ple who are in­ter­ested /​ tech­ni­cally in­volved in the field. This does not ap­ply to tech­ni­cal strat­egy pieces, about e.g. whether CIRL or Am­plifi­ca­tion is a more promis­ing ap­proach.

There is some­what of a para­dox with tech­ni­cal vs ‘wordy’ pieces how­ever: as a non-ex­pert, it is much eas­ier for me to un­der­stand and eval­u­ate the lat­ter, even though I think the former are much more valuable.

Differ­en­tial AI progress

There are many prob­lems that need to be solved be­fore we have safe gen­eral AI, one of which is not pro­duc­ing un­safe gen­eral AI in the mean­time. If no­body was do­ing non-safety-con­scious re­search there would be lit­tle risk or haste to AGI – though we would be miss­ing out on the po­ten­tial benefits of safe AI.

There are sev­eral con­se­quences of this:

  • To the ex­tent that safety re­search also en­hances ca­pa­bil­ities, it is less valuable.

  • To the ex­tent that ca­pa­bil­ities re­search re-ori­en­tates sub­se­quent re­search by third par­ties into more safety-tractable ar­eas it is more valuable.

  • To the ex­tent that safety re­sults would nat­u­rally be pro­duced as a by-product of ca­pa­bil­ities re­search (e.g. au­tonomous ve­hi­cles) it is less at­trac­tive to fi­nance.

One ap­proach is to re­search things that will make con­tem­po­rary ML sys­tems safer, be­cause you think AGI will be a nat­u­ral out­growth from con­tem­po­rary ML. This has the ad­van­tage of faster feed­back loops, but is also more re­place­able (as per the pre­vi­ous sec­tion).

Another ap­proach is to try to rea­son di­rectly about the sorts of is­sues that will arise with su­per­in­tel­li­gent AI. This work is less likely to be pro­duced ex­oge­nously by un­al­igned re­searchers, but it re­quires much more faith in the­o­ret­i­cal ar­gu­ments, un­moored from em­piri­cal ver­ifi­ca­tion.

Near-term safety AI issues

Many peo­ple want to con­nect AI ex­is­ten­tial risk is­sues to ‘near-term’ is­sues; I am gen­er­ally scep­ti­cal of this. For ex­am­ple, au­tonomous cars seem to risk only lo­cal­ised tragedies, and pri­vate com­pa­nies should have good in­cen­tives here. Unem­ploy­ment con­cerns seem ex­ag­ger­ated to me, as they have been for most of his­tory (new jobs will be cre­ated), at least un­til we have AGI, at which point we have big­ger con­cerns. Similarly, I gen­er­ally think con­cerns about al­gorith­mic bias are es­sen­tially poli­ti­cal—I recom­mend this pre­sen­ta­tion—though there is at least some con­nec­tion to the value learn­ing prob­lem there.

Fi­nan­cial Reserves

Char­i­ties like hav­ing fi­nan­cial re­serves to provide run­way, and guaran­tee that they will be able to keep the lights on for the im­me­di­ate fu­ture. This could be jus­tified if you thought that char­i­ties were ex­pen­sive to cre­ate and de­stroy, and were wor­ried about this oc­cur­ring by ac­ci­dent due to the whims of donors. Un­like a com­pany which sells a product, it seems rea­son­able that char­i­ties should be more con­cerned about this.

Donors pre­fer char­i­ties to not have too much re­serves. Firstly, those re­serves are cash that could be be­ing spent on out­comes now, by ei­ther the spe­cific char­ity or oth­ers. Valuable fu­ture ac­tivi­ties by char­i­ties are sup­ported by fu­ture dona­tions; they do not need to be pre-funded. Ad­di­tion­ally, hav­ing re­serves in­creases the risk of or­gani­sa­tions ‘go­ing rogue’, be­cause they are in­su­lated from the need to con­vince donors of their value.

As such, in gen­eral I do not give full cre­dence to char­i­ties say­ing they need more fund­ing be­cause they want much more than a 18 months or so of run­way in the bank. If you have a year’s re­serves now, af­ter this De­cem­ber you will have that plus what­ever you raise now, giv­ing you a mar­gin of safety be­fore rais­ing again next year.

I es­ti­mated re­serves = (cash and grants) /​ (2020 bud­get). In gen­eral I think of this as some­thing of a mea­sure of ur­gency. How­ever de­spite be­ing prima fa­cie a very sim­ple calcu­la­tion there are many is­sues with this data. As such these should be con­sid­ered sug­ges­tive only.

Dona­tion Matching

In gen­eral I be­lieve that char­ity-spe­cific dona­tion match­ing schemes are some­what dishon­est, de­spite my hav­ing pro­vided match­ing fund­ing for at least one in the past.

Iron­i­cally, de­spite this view be­ing es­poused by GiveWell (albeit in 2011), this is es­sen­tially of OpenPhil’s policy of, at least in some cases, ar­tifi­cially limit­ing their fund­ing to 50% or 60% of a char­ity’s need, which some char­i­ties have ar­gued effec­tively pro­vides a 1:1 match for out­side donors. I think this is bad. In the best case this forces out­side donors to step in, im­pos­ing mar­ket­ing costs on the char­ity and re­search costs on the donors. In the worst case it leaves valuable pro­jects un­funded.

Ob­vi­ously cause-neu­tral dona­tion match­ing is differ­ent and should be ex­ploited. Every­one should max out their cor­po­rate match­ing pro­grams if pos­si­ble, and things like the an­nual Face­book Match con­tinue to be great op­por­tu­ni­ties.

Poor Qual­ity Re­search

Partly thanks to the efforts of the com­mu­nity, the field of AI safety is con­sid­er­ably more well re­spected and funded than was pre­vi­ously the case, which has at­tracted a lot of new re­searchers. While gen­er­ally good, one side effect of this (per­haps com­bined with the fact that many low-hang­ing fruits of the in­sight tree have been plucked) is that a con­sid­er­able amount of low-qual­ity work has been pro­duced. For ex­am­ple, there are a lot of pa­pers which can be ac­cu­rately sum­ma­rized as as­sert­ing “just use ML to learn ethics”. Fur­ther­more, the con­ven­tional peer re­view sys­tem seems to be ex­tremely bad at deal­ing with this is­sue.

The stan­dard view here is just to ig­nore low qual­ity work. This has many ad­van­tages, for ex­am­ple 1) it re­quires lit­tle effort, 2) it doesn’t an­noy peo­ple. This con­spir­acy of silence seems to be the strat­egy adopted by most sci­en­tific fields, ex­cept in ex­treme cases like anti-vax­ers.

How­ever, I think there are some down­sides to this strat­egy. A suffi­ciently large mi­lieu of low-qual­ity work might de­grade the rep­u­ta­tion of the field, de­ter­ring po­ten­tially high-qual­ity con­trib­u­tors. While low-qual­ity con­tri­bu­tions might help im­prove Con­crete Prob­lems’ cita­tion count, they may use up scarce fund­ing.

More­over, it is not clear to me that ‘just ig­nore it’ re­ally gen­er­al­izes as a com­mu­nity strat­egy. Per­haps you, en­light­ened reader, can judge that “How to solve AI Ethics: Just use RNNs” is not great. But is it re­ally effi­cient to re­quire ev­ery­one to in­de­pen­dently work this out? Fur­ther­more, I sus­pect that the idea that we can all just ig­nore the weak stuff is some­what an ex­am­ple of typ­i­cal mind fal­lacy. Sev­eral times I have come across peo­ple I re­spect ac­cord­ing re­spect to work I found clearly pointless. And sev­eral times I have come across peo­ple I re­spect ar­gu­ing per­sua­sively that work I had pre­vi­ously re­spected was very bad – but I only learnt they be­lieved this by chance! So I think it is quite pos­si­ble that many peo­ple will waste a lot of time as a re­sult of this strat­egy, es­pe­cially if they don’t hap­pen to move in the right so­cial cir­cles.

Hav­ing said all that, I am not a fan of unilat­eral ac­tion, and am some­what self­ishly con­flict-averse, so will largely con­tinue to abide by this non-ag­gres­sion con­ven­tion. My only de­vi­a­tion here is to make it ex­plicit. If you’re in­ter­ested in this you might en­joy this by 80,000 Hours.

The Bay Area

Much of the AI and EA com­mu­ni­ties, and es­pe­cially the EA com­mu­nity con­cerned with AI, is lo­cated in the Bay Area, es­pe­cially Berkeley and San Fran­cisco. This is an ex­tremely ex­pen­sive place, and is dys­func­tional both poli­ti­cally and so­cially. Aside from the lack of elec­tric­ity and ag­gres­sive home­less­ness, it seems to at­tract peo­ple who are ex­tremely weird in so­cially un­de­sir­able ways – and in­duces this in those who move there—though to be fair the peo­ple who are do­ing use­ful work in AI or­gani­sa­tions seem to be drawn from a bet­ter dis­tri­bu­tion than the broader com­mu­nity. In gen­eral I think the cen­tral­iza­tion is bad, but if there must be cen­tral­iza­tion I would pre­fer it be al­most any­where other than Berkeley. Ad­di­tion­ally, I think many fun­ders are ge­o­graph­i­cally my­opic, and bi­ased to­wards fund­ing things in the Bay Area. As such, I have a mild prefer­ence to­wards fund­ing non-Bay-Area pro­jects. If you’re in­ter­ested in this topic I recom­mend you read this or this or this.


The size of the field con­tinues to grow, both in terms of fund­ing and re­searchers. Both make it in­creas­ingly hard for in­di­vi­d­ual donors. I’ve at­tempted to sub­jec­tively weigh the pro­duc­tivity of the differ­ent or­gani­sa­tions against the re­sources they used to gen­er­ate that out­put, and donate ac­cord­ingly.

My con­stant wish is to pro­mote a lively in­tel­lect and in­de­pen­dent de­ci­sion-mak­ing among read­ers; hope­fully my lay­ing out the facts as I see them above will prove helpful to some read­ers. Here is my even­tual de­ci­sion, rot13′d so you can do come to your own con­clu­sions first (which I strongly recom­mend):

De­spite hav­ing donated to MIRI con­sis­tently for many years as a re­sult of their highly non-re­place­able and ground­break­ing work in the field, I can­not in good faith do so this year given their lack of dis­clo­sure. Ad­di­tion­ally, they already have a quite large bud­get.

FHI have con­sis­tently pro­duced some of the high­est qual­ity re­search. How­ever, I am not con­vinced they have a high need for ad­di­tional fund­ing.

I con­tinue to be im­pressed with CHAI’s out­put, and think they po­ten­tially do a good job in­ter­act­ing with main­stream ML re­searchers. They have a lot of cash re­serves, which seems like it might re­duce the ur­gency of fund­ing some­what, and a con­sid­er­able por­tion of the work is on more near-term is­sues, but there are rel­a­tively few op­por­tu­ni­ties to fund tech­ni­cal AI safety work, so I in­tend to donate to CHAI again this year.

Deep­mind and OpenAI both do ex­cel­lent work but I don’t think it is vi­able for (rel­a­tively) small in­di­vi­d­ual donors to mean­ingfully sup­port their work.

In the past I have been very im­pressed with GCRI’s out­put on a low bud­get. De­spite in­tend­ing 2019 in­tend­ing to be their year of scal­ing up, out­put has ac­tu­ally de­creased. I still in­tend to make a dona­tion, in case this is just an un­for­tu­nate timing is­sue, but definitely would want to see more next year.

CSER’s re­search is just not fo­cused enough to war­rant dona­tions for AI Risk work in my opinion.

I would con­sider donat­ing to the AI Safety Camp if I knew more about their fi­nances.

Ought seems like a very valuable pro­ject, and like CHAI rep­re­sents one of the few op­por­tu­ni­ties to di­rectly fund tech­ni­cal AI safety work. As such I think I plan to make a dona­tion this year.

I thought AI Im­pacts did some nice small pro­jects this year, and on a not large bud­get. I think I would like to see the re­sults from their large pro­jects first how­ever.

In a ma­jor differ­ence from pre­vi­ous years, I ac­tu­ally plan to donate some money to the Long Term Fu­ture Fund. While I haven’t agreed with all their grants, I think they offer small donors ac­cess to a range of small pro­jects that they could not oth­er­wise fund, which seems very valuable con­sid­er­ing the strong fi­nan­cial situ­a­tion of many of the best larger or­gani­sa­tions (OpenAI, Deep­mind etc.)

One thing I would like to see more of in the fu­ture is grants for PhD stu­dents who want to work in the area. Un­for­tu­nately at pre­sent I am not aware of many ways for in­di­vi­d­ual donors to prac­ti­cally sup­port this.

How­ever, I wish to em­pha­size that all the above or­gani­sa­tions seem to be do­ing good work on the most im­por­tant is­sue fac­ing mankind. It is the na­ture of mak­ing de­ci­sions un­der scarcity that we must pri­ori­tize some over oth­ers, and I hope that all or­gani­sa­tions will un­der­stand that this nec­es­sar­ily in­volves nega­tive com­par­i­sons at times.

Thanks for read­ing this far; hope­fully you found it use­ful. Apolo­gies to ev­ery­one who did valuable work that I ex­cluded!

If you found this post helpful, and es­pe­cially if it helped in­form your dona­tions, please con­sider let­ting me and any or­gani­sa­tions you donate to as a re­sult know.

If you are in­ter­ested in helping out with next year’s ar­ti­cle, please get in touch, and per­haps we can work some­thing out.


I have not in gen­eral checked all the proofs in these pa­pers, and similarly trust that re­searchers have hon­estly re­ported the re­sults of their simu­la­tions.

I was a Sum­mer Fel­low at MIRI back when it was SIAI and vol­un­teered briefly at GWWC (part of CEA). I have con­flicts of in­ter­est with the Sur­vival and Flour­ish­ing Fund and OpenPhil so have not eval­u­ated them. I have no fi­nan­cial ties be­yond be­ing a donor and have never been ro­man­ti­cally in­volved with any­one who has ever worked at any of the other or­gani­sa­tions.

I shared drafts of the in­di­vi­d­ual or­gani­sa­tion sec­tions with rep­re­sen­ta­tives from FHI, CHAI, MIRI, GCRI, BERI, Me­dian, CSER, GPI, AISC, BERI, AIIm­pacts, FRI and Ought.

My eter­nal grat­i­tude to Greg Lewis, Jess Riedel, Hay­den Wilk­in­son, Kit Har­ris and Jas­mine Wang for their in­valuable re­view­ing. Any re­main­ing mis­takes are of course my own. I would also like to thank my wife and daugh­ter for tol­er­at­ing all the time I have spent/​in­vested/​wasted on this.


80,000 Hours—AI/​ML safety re­search job board − 2019-09-29 - https://​​80000hours.org/​​job-board/​​ai-ml-safety-re­search/​​

Agrawal, Mayank; Peter­son, Joshua; Griffiths, Thomas—Scal­ing up Psy­chol­ogy via Scien­tific Re­gret Min­i­miza­tion:A Case Study in Mo­ral De­ci­sion-Mak­ing − 2019-10-16 - https://​​arxiv.org/​​abs/​​1910.07581

AI Im­pacts—AI Con­fer­ence At­ten­dance − 2019-03-06 - https://​​aiim­pacts.org/​​ai-con­fer­ence-at­ten­dance/​​

AI Im­pacts—His­tor­i­cal Eco­nomic Growth Trends − 2019-03-06 - https://​​aiim­pacts.org/​​his­tor­i­cal-growth-trends/​​

Alexan­der, Scott—Noisy Poll Re­sults And Rep­tilian Mus­lim Cli­ma­tol­o­gists from Mars − 2013-04-12 - https://​​slat­estar­codex.com/​​2013/​​04/​​12/​​noisy-poll-re­sults-and-rep­tilian-mus­lim-cli­ma­tol­o­gists-from-mars/​​

Arm­strong, Stu­art—Re­search Agenda v0.9: Syn­the­sis­ing a hu­man’s prefer­ences into a util­ity func­tion − 2019-06-17 - https://​​www.less­wrong.com/​​posts/​​CSEdLLEkap2pub­jof/​​re­search-agenda-v0-9-syn­the­sis­ing-a-hu­man-s-prefer­ences-into#comments

Arm­strong, Stu­art; Bostrom, Nick; Shul­man, Carl—Rac­ing to the precipice: a model of ar­tifi­cial in­tel­li­gence de­vel­op­ment − 2015-08-01 - https://​​link.springer.com/​​ar­ti­cle/​​10.1007%2Fs00146-015-0590-y

Arm­strong, Stu­art; Min­der­mann, Sören - Oc­cam’s ra­zor is in­suffi­cient to in­fer the prefer­ences of ir­ra­tional agents − 2017-12-15 - https://​​arxiv.org/​​abs/​​1712.05812

Aschen­bren­ner, Leopold—Ex­is­ten­tial Risk and Eco­nomic Growth − 2019-09-03 - https://​​leopoldaschen­bren­ner.github.io/​​xriskand­growth/​​Ex­is­ten­tialRiskAndGrowth050.pdf

Avin, Sha­har—Ex­plor­ing Ar­tifi­cial In­tel­li­gence Fu­tures − 2019-01-17 - https://​​www.sha­har­avin.com/​​pub­li­ca­tion/​​pdf/​​ex­plor­ing-ar­tifi­cial-in­tel­li­gence-fu­tures.pdf

Avin, Sha­har; Amadae, S—Au­ton­omy and ma­chine learn­ing at the in­ter­face of nu­clear weapons, com­put­ers and peo­ple − 2019-05-06 - https://​​www.sipri.org/​​sites/​​de­fault/​​files/​​2019-05/​​sipri1905-ai-strate­gic-sta­bil­ity-nu­clear-risk.pdf

Baum, Seth—Risk-Risk Trade­off Anal­y­sis of Nu­clear Ex­plo­sives for As­teroid Deflec­tion − 2019-06-13 - https://​​pa­pers.ssrn.com/​​sol3/​​pa­pers.cfm?ab­stract_id=3397559

Baum, Seth—The Challenge of An­a­lyz­ing Global Catas­trophic Risks − 2019-07-15 - https://​​high­er­log­ic­down­load.s3.ama­zon­aws.com/​​INFORMS/​​f0ea61b6-e74c-4c07-894d-884bf2882e55/​​Upload­edI­mages/​​2019_July.pdf#page=20

Baum, Seth; de Neufville, Robert; Bar­rett, An­thony; Ack­er­man, Gary—Les­sons for Ar­tifi­cial In­tel­li­gence from Other Global Risks − 2019-11-21 - http://​​gcrin­sti­tute.org/​​pa­pers/​​les­sons.pdf

Beard, Si­mon—Perfec­tion­ism and the Repug­nant Con­clu­sion − 2019-03-05 - https://​​link.springer.com/​​ar­ti­cle/​​10.1007/​​s10790-019-09687-4

Beard, Si­mon—What Is Un­fair about Unequal Brute Luck? An In­ter­gen­er­a­tional Puz­zle − 2019-01-21 - https://​​www.cser.ac.uk/​​re­sources/​​brute-luck-in­ter­gen­er­a­tional-puz­zle/​​

Belfield, Haydn—How to re­spond to the po­ten­tial mal­i­cious uses of ar­tifi­cial in­tel­li­gence? − 2019-09-19 - https://​​www.cser.ac.uk/​​re­sources/​​how-re­spond-po­ten­tial-mal­i­cious-uses-ar­tifi­cial-in­tel­li­gence/​​

Bo­gosian, Kyle—On AI Weapons − 2019-11-13 - https://​​fo­rum.effec­tivealtru­ism.org/​​posts/​​vdqBn65Qaw77MpqXz/​​on-ai-weapons

Brown, Noam; Sand­holm, Tuo­mas—Su­per­hu­man AI for mul­ti­player poker − 2019-07-17 - https://​​www.cs.cmu.edu/​​~noamb/​​pa­pers/​​19-Science-Su­per­hu­man.pdf

Ca­plan, Bryan—The Myth of the Ra­tional Voter − 2008-08-24 - https://​​www.ama­zon.com/​​Myth-Ra­tional-Voter-Democ­ra­cies-Poli­cies/​​dp/​​0691138737

Carey, Ryan—How use­ful is Quan­tiliza­tion for Miti­gat­ing Speci­fi­ca­tion-Gam­ing − 2019-05-06 - https://​​www.fhi.ox.ac.uk/​​wp-con­tent/​​up­loads/​​SafeML2019_pa­per_40.pdf

Car­roll, Micah; Shah Ro­hin; Mark K Ho, Griffiths, Tom; Seshia,San­jit; Abbeel,Pieter; Dra­gan, Anca - On the Utility of Learn­ing about Hu­mans­for Hu­man-AI Co­or­di­na­tion − 2019-10-22 - http://​​pa­pers.nips.cc/​​pa­per/​​8760-on-the-util­ity-of-learn­ing-about-hu­mans-for-hu­man-ai-co­or­di­na­tion.pdf

Cave, Stephen; Ó hÉigeartaigh, Seán - Bridg­ing near- and long-term con­cerns about AI − 2019-01-07 - https://​​www.na­ture.com/​​ar­ti­cles/​​s42256-018-0003-2

Chan, Lawrence; Had­field-Menell, Dy­lan; Srini­vasa, Sid­dhartha; Dra­gan, Anca—The As­sis­tive Multi-Armed Ban­dit − 2019-01-24 - https://​​arxiv.org/​​abs/​​1901.08654

Chivers, Tom—The AI Does Not Hate You: Su­per­in­tel­li­gence, Ra­tion­al­ity and the Race to Save the World − 2019-06-13 - https://​​www.ama­zon.com/​​Does-Not-Hate-You-Su­per­in­tel­li­gence-ebook/​​dp/​​B07K258VCV

Chris­ti­ano, Paul—AI al­ign­ment land­scape − 2019-10-12 - https://​​ai-al­ign­ment.com/​​ai-al­ign­ment-land­scape-d3773c37ae38

Chris­ti­ano, Paul—What failure looks like − 2019-03-17 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​HBxe6wd­jxK239zajf/​​what-failure-looks-like

Cihon, Peter—Stan­dards for AI Gover­nance: In­ter­na­tional Stan­dards to En­able Global Co­or­di­na­tion in AI Re­search & Devel­op­ment − 2019-05-16 - https://​​www.fhi.ox.ac.uk/​​wp-con­tent/​​up­loads/​​Stan­dards_-FHI-Tech­ni­cal-Re­port.pdf

Clark, Jack; Had­field, Gillian—Reg­u­la­tory Mar­kets for AI Safety − 2019-05-06 - https://​​drive.google.com/​​uc?ex­port=down­load&id=1bFPiwLrZc7SQTMg2_bW4gt0PaS5NyqOH

Co­hen, Michael; Vel­lambi, Badri; Hut­ter, Mar­cus—Asymp­tot­i­cally Unam­bi­tious Ar­tifi­cial Gen­eral In­tel­li­gence − 2019-05-29 - https://​​arxiv.org/​​abs/​​1905.12186

Col­lins, Ja­son—Prin­ci­ples for the Ap­pli­ca­tion of Hu­man In­tel­li­gence − 2019-09-30 - https://​​be­hav­ioral­scien­tist.org/​​prin­ci­ples-for-the-ap­pli­ca­tion-of-hu­man-in­tel­li­gence/​​

Colvin, R; Kemp, Luke; Talberg, Anita; De Castella, Clare ; Downie, C; Friel, S; Grant, Will; How­den, Mark; Jotzo, Frank; Markham, Fran­cis; Pla­tow, Michael - Learn­ing from the Cli­mate Change De­bate to AvoidPo­lari­sa­tion on Nega­tive Emis­sions − 2019-07-25 - https://​​sci-hub.tw/​​10.1080/​​17524032.2019.1630463

Cot­tier, Ben; Shah, Ro­hin—Clar­ify­ing some key hy­pothe­ses in AI al­ign­ment − 2019-08-15 - https://​​www.less­wrong.com/​​posts/​​mJ5oNYnkYrd4sD5uE/​​clar­ify­ing-some-key-hy­pothe­ses-in-ai-alignment

CSER—Policy se­ries Manag­ing global catas­trophic risks: Part 1 Un­der­stand − 2019-08-13 - https://​​www.gcr­policy.com/​​un­der­stand-overview

Cum­mings, Do­minic—On the refer­en­dum #31: Pro­ject Maven, pro­cure­ment, lol­la­palooza re­sults & nu­clear/​AGI safety − 2019-03-01 - https://​​do­minic­cum­mings.com/​​2019/​​03/​​01/​​on-the-refer­en­dum-31-pro­ject-maven-pro­cure­ment-lol­la­palooza-re­sults-nu­clear-agi-safety/​​

Dai, Wei—Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to − 2019-08-17 - https://​​www.less­wrong.com/​​posts/​​rASeoR7iZ9Fokzh7L/​​prob­lems-in-ai-al­ign­ment-that-philoso­phers-could-potentially

Dai, Wei—Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to − 2019-08-17 - https://​​www.less­wrong.com/​​posts/​​rASeoR7iZ9Fokzh7L/​​prob­lems-in-ai-al­ign­ment-that-philoso­phers-could-potentially

Dai, Wei—Two Ne­glected Prob­lems in Hu­man-AI Safety − 2018-12-16 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​HT­gakSs6JpnogD6c2/​​two-ne­glected-prob­lems-in-hu­man-ai-safety

Drexler, Eric—Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral In­tel­li­gence − 2019-01-08 - https://​​www.fhi.ox.ac.uk/​​wp-con­tent/​​up­loads/​​Refram­ing_Su­per­in­tel­li­gence_FHI-TR-2019-1.1-1.pdf?asd=sa

EU—Ethics Guidelines for Trust­wor­thy Ar­tifi­cial In­tel­li­gence − 2019-04-08 - https://​​ec.eu­ropa.eu/​​digi­tal-sin­gle-mar­ket/​​en/​​news/​​ethics-guidelines-trust­wor­thy-ai

Ever­itt, Tom; Hut­ter, Mar­cus—Re­ward Tam­per­ing Prob­lems and Solu­tions in Re­in­force­ment Learn­ing: A Causal In­fluence Di­a­gram Per­spec­tive − 2019-08-13 - https://​​arxiv.org/​​abs/​​1908.04734

Ever­itt, Tom; Ku­mar, Ra­mana; Krakovna, Vic­to­ria; Legg, Shane—Model­ing AGI Safety Frame­works with Causal In­fluence Di­a­grams − 2019-06-20 - https://​​arxiv.org/​​abs/​​1906.08663

Ever­itt, Tom; Ortega, Pe­dro; Barnes, Eliz­a­beth; Legg, Shane—Un­der­stand­ing Agent In­cen­tives us­ing Causal In­fluence Di­a­grams. Part I: Sin­gle Ac­tion Set­tings − 2019-02-26 - https://​​arxiv.org/​​abs/​​1902.09980

Fried­man, David D—Le­gal Sys­tems Very Differ­ent from Ours − 1970-01-01 - https://​​www.ama­zon.com/​​Le­gal-Sys­tems-Very-Differ­ent-Ours/​​dp/​​1793386722

Garfinkel, Ben & Dafoe, Allan—How does the offense-defense bal­ance scale? − 2019-08-22 - https://​​www.tand­fon­line.com/​​doi/​​full/​​10.1080/​​01402390.2019.1631810

Greaves, Hilary; Cot­ton-Bar­ratt, Owen—A bar­gain­ing-the­o­retic ap­proach to moral un­cer­tainty − 2019-08-09 - https://​​globalpri­ori­tiesin­sti­tute.org/​​wp-con­tent/​​up­loads/​​2019/​​Greaves_Cot­ton-Bar­ratt_bar­gain­ing_the­o­retic_ap­proach.pdf

Grotto, Andy—Ge­net­i­cally Mod­ified Or­ganisms: A Pre­cau­tion­ary Tale for AI Gover­nance − 2019-01-24 - https://​​aipulse.org/​​ge­net­i­cally-mod­ified-or­ganisms-a-pre­cau­tion­ary-tale-for-ai-gov­er­nance-2/​​

Her­nan­dez-Orallo, Jose; Martınez-Plumed, Fer­nando; Avin, Sha­har; Ó hÉigeartaigh, Seán - Sur­vey­ing Safety-rele­vant AI Char­ac­ter­is­tics − 2019-01-20 - http://​​ceur-ws.org/​​Vol-2301/​​pa­per_22.pdf

Hub­inger, Evan; van Mer­wijk, Chris; Mikulik, Vladimir; Skalse, Joar; Garrabrant, Scott—Risks from Learned Op­ti­miza­tion in Ad­vanced Ma­chine Learn­ing Sys­tems − 2019-06-05 - https://​​arxiv.org/​​abs/​​1906.01820

Irv­ing, Ge­offrey; Askell, Amanda—AI Safety Needs So­cial Scien­tists − 2019-02-19 - https://​​dis­till.pub/​​2019/​​safety-needs-so­cial-sci­en­tists/​​

Irv­ing, Ge­offrey; Chris­ti­ano, Paul; Amodei, Dario—AI Safety via De­bate − 2018-05-02 - https://​​arxiv.org/​​abs/​​1805.00899

Kacz­marek, Pa­trick; Beard, Si­mon—Hu­man Ex­tinc­tion and Our Obli­ga­tions to thePast − 2019-11-05 - https://​​sci-hub.tw/​​https://​​www.cam­bridge.org/​​core/​​jour­nals/​​util­i­tas/​​ar­ti­cle/​​hu­man-ex­tinc­tion-and-our-obli­ga­tions-to-the-past/​​C29A0406EFA2B43EE8237D95AAFBB580

Kauf­man, Jeff—Uber Self-Driv­ing Crash − 2019-11-07 - https://​​www.jefftk.com/​​p/​​uber-self-driv­ing-crash

Kemp, Luke—Me­di­a­tion Without Mea­sures: Con­flict Re­s­olu­tion in Cli­mate Di­plo­macy − 2019-05-15 - https://​​www.cser.ac.uk/​​re­sources/​​me­di­a­tion-with­out-mea­sures/​​

Ken­ton, Zachary; Filos, An­gelos; Gal, Yarin; Evans, Owain—Gen­er­al­iz­ing from a few en­vi­ro­ments in Safety-Crit­i­cal Re­in­force­ment Learn­ing − 2019-07-02 - https://​​arxiv.org/​​abs/​​1907.01475

Korzekwa, Rick—The un­ex­pected difficulty of com­par­ing AlphaS­tar to hu­mans − 2019-09-17 - https://​​aiim­pacts.org/​​the-un­ex­pected-difficulty-of-com­par­ing-alphas­tar-to-hu­mans/​​

Kosoy, Vanessa—Del­ega­tive Re­in­force­ment Learn­ing: Learn­ing to Avoid Traps with a Lit­tle Help − 2019-07-19 - https://​​arxiv.org/​​abs/​​1907.08461

Ko­varik, Vo­jta; Ga­j­dova, Anna; Lind­ner, David; Fin­nve­den, Lukas; Agrawal, Ra­jashree—AI Safety De­bate and Its Ap­pli­ca­tions − 2019-07-23 - https://​​www.less­wrong.com/​​posts/​​5Kv2qNfRyXXihNrx2/​​ai-safety-de­bate-and-its-applications

Krakovna, Vic­to­ria—ICLR Safe ML Work­shop Re­port − 2019-06-18 - https://​​fu­ture­oflife.org/​​2019/​​06/​​18/​​iclr-safe-ml-work­shop-re­port/​​

Krue­gar, David; Ma­haraj, Te­gan; Legg, Shane; Leike, Jan—Mislead­ing Meta-Ob­jec­tives and Hid­den In­cen­tives for Distri­bu­tional Shift − 2019-01-01 - https://​​drive.google.com/​​uc?ex­port=down­load&id=1k93292JCoIHU0h6xVO3qmeRwLyOSlS4o

Ku­mar, Ram Shankar Siva; O’Brien, David; Snover, Jeffrey; Albert, Ken­dra; Viloen, Salome—Failure Modes in Ma­chine Learn­ing − 2019-11-10 - https://​​docs.microsoft.com/​​en-us/​​se­cu­rity/​​failure-modes-in-ma­chine-learning

LeCun, Yann; Rus­sell, Stu­art; Ben­gio, Yoshua; Olds, Elliot; Zador, Tony; Rossi, Francesca; Mal­lah, Richard; Bar­zov, Yuri—De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More − 2019-10-04 - https://​​www.less­wrong.com/​​posts/​​WxW6Gc6f2z3mzmqKs/​​de­bate-on-in­stru­men­tal-con­ver­gence-be­tween-le­cun-russell

Lewis, So­phie; Perk­ins-Kirk­patrick, Sarah; Althor, Glenn; King, An­drew; Kemp, Luke—Assess­ing con­tri­bu­tions of ma­jor emit­ters’ Paris‐era de­ci­sions to fu­ture tem­per­a­ture ex­tremes − 2019-03-20 - https://​​www.cser.ac.uk/​​re­sources/​​as­sess­ing-con­tri­bu­tions-ex­tremes/​​

Long, Robert; Ber­gal, Asya—Ev­i­dence against cur­rent meth­ods lead­ing to hu­man level ar­tifi­cial in­tel­li­gence − 2019-08-12 - https://​​aiim­pacts.org/​​ev­i­dence-against-cur­rent-meth­ods-lead­ing-to-hu­man-level-ar­tifi­cial-in­tel­li­gence/​​

Long, Robert; Davis, Ernest—Con­ver­sa­tion with Ernie Davis − 2019-08-23 - https://​​aiim­pacts.org/​​con­ver­sa­tion-with-ernie-davis/​​

Ma­caskill, Will; Dem­ski, Abram—A Cri­tique of Func­tional De­ci­sion The­ory − 2019-09-13 - https://​​www.less­wrong.com/​​posts/​​ySLYSsNeFL5CoAQzN/​​a-cri­tique-of-func­tional-de­ci­sion-theory

MacAskill, William; Val­lin­der, Aron; Oester­held, Cas­par; Shul­man, Carl; Treut­lein, Jo­hannes—The Ev­i­den­tial­ist’s Wager − 2019-11-19 - https://​​globalpri­ori­tiesin­sti­tute.org/​​the-ev­i­den­tial­ists-wa­ger/​​

Ma­jha, Arushi; Sarkar, Sayan; Zagami, Davide—Cat­e­go­riz­ing Wire­head­ing in Par­tially Embed­ded Agents − 2019-06-21 - https://​​arxiv.org/​​abs/​​1906.09136

Malt­in­sky, Baeo; Gal­lagher, Jack; Tay­lor, Jes­sica—Fea­si­bil­ity of Train­ing an AGI us­ing Deep RL:A Very Rough Es­ti­mate − 2019-03-24 - http://​​me­di­an­group.org/​​docs/​​Fea­si­bil­ity%20of%20Train­ing%20an%20AGI%20us­ing%20Deep%20Re­in­force­ment%20Learn­ing,%20A%20Very%20Rough%20Es­ti­mate.pdf

Man­cuso, Ja­son; Kisielewski, To­masz; Lind­ner, David; Singh, Alok—De­tect­ing Spiky Cor­rup­tion in Markov De­ci­sion Pro­cesses − 2019-06-30 - https://​​arxiv.org/​​abs/​​1907.00452

Mar­cus, Gary—Deep Learn­ing: A Crit­i­cal Ap­praisal − 2018-01-02 - https://​​arxiv.org/​​ftp/​​arxiv/​​pa­pers/​​1801/​​1801.00631.pdf

McCaslin, Te­gan—In­ves­ti­ga­tion into the re­la­tion­ship be­tween neu­ron count and in­tel­li­gence across differ­ing cor­ti­cal ar­chi­tec­tures − 2019-02-11 - https://​​aiim­pacts.org/​​in­ves­ti­ga­tion-into-the-re­la­tion­ship-be­tween-neu­ron-count-and-in­tel­li­gence-across-differ­ing-cor­ti­cal-ar­chi­tec­tures/​​

Mo­gensen, An­dreas - ‘The only eth­i­cal ar­gu­ment for pos­i­tive 𝛿 ’? − 2019-01-01 - https://​​globalpri­ori­tiesin­sti­tute.org/​​an­dreas-mo­gensen-the-only-eth­i­cal-ar­gu­ment-for-pos­i­tive-delta-2/​​

Mo­gensen, An­dreas—Dooms­day rings twice − 2019-01-01 - https://​​globalpri­ori­tiesin­sti­tute.org/​​an­dreas-mo­gensen-dooms­day-rings-twice/​​

Naude, Wim; Dimitri, Ni­cola—The race for an ar­tifi­cial gen­eral in­tel­li­gence: im­pli­ca­tions for pub­lic policy − 2019-04-22 - https://​​link.springer.com/​​ar­ti­cle/​​10.1007%2Fs00146-019-00887-x

Ngo, Richard—Tech­ni­cal AGI safety re­search out­side AI − 2019-10-18 - https://​​fo­rum.effec­tivealtru­ism.org/​​posts/​​2e9NDGiXt8PjjbTMC/​​tech­ni­cal-agi-safety-re­search-out­side-ai

O’Keefe, Cul­len—Stable Agree­ments in Tur­bu­lent Times: A Le­gal Toolkit for Con­strained Tem­po­ral De­ci­sion Trans­mis­sion − 2019-05-01 - https://​​www.fhi.ox.ac.uk/​​wp-con­tent/​​up­loads/​​Stable-Agree­ments.pdf

Ovadya, Aviv; Whit­tle­stone, Jess—Re­duc­ing Mal­i­cious Use of Syn­thetic Me­dia Re­search: Con­sid­er­a­tions and Po­ten­tial Re­lease Prac­tices for Ma­chine Learn­ing − 2019-07-29 - https://​​arxiv.org/​​abs/​​1907.11274

Owain, Evans; Saun­ders, William; Stuh­lmüller, An­dreas - Ma­chine Learn­ing Pro­jects for Iter­ated Distil­la­tion and Am­plifi­ca­tion − 2019-07-03 - https://​​owainevans.github.io/​​pdfs/​​evans_ida_pro­jects.pdf

Perry, Bran­don; Uuk, Risto—AI Gover­nance and the Poli­cy­mak­ing Pro­cess: Key Con­sid­er­a­tions for Re­duc­ing AI Risk − 2019-05-08 - https://​​www.mdpi.com/​​2504-2289/​​3/​​2/​​26/​​pdf

Piper, Kel­sey—The case for tak­ing AI se­ri­ousl as a threat to hu­man­ity − 2018-12-21 - https://​​www.vox.com/​​fu­ture-perfect/​​2018/​​12/​​21/​​18126576/​​ai-ar­tifi­cial-in­tel­li­gence-ma­chine-learn­ing-safety-alignment

Quigley, Ellen—Univer­sal Own­er­ship in the An­thro­pocene − 2019-05-13 - https://​​pa­pers.ssrn.com/​​sol3/​​pa­pers.cfm?ab­stract_id=3457205

Roy, Mati—AI Safety Open Prob­lems − 2019-11-02 - https://​​docs.google.com/​​doc­u­ment/​​d/​​1J2fOOF-NYiPC0-J3ZGEfE0OhA-QcOInhlvWjr1fAsS0/​​edit

Rus­sell, Stu­art—Hu­man Com­pat­i­ble; Ar­tifi­cial In­tel­li­gence and the Prob­lem of Con­trol − 2019-10-08 - https://​​www.ama­zon.com/​​Hu­man-Com­pat­i­ble-Ar­tifi­cial-In­tel­li­gence-Prob­lem/​​dp/​​0525558616/​​ref=sr_1_2?key­words=Stu­art+Rus­sell&qid=1565996574&s=books&sr=1-2

Sch­warz, Wolf­gang—On Func­tional De­ci­sion The­ory − 2018-12-27 - https://​​www.umsu.de/​​blog/​​2018/​​688

Sevilla, Jaime; Moreno, Pablo—Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence al­ign­ment re­search − 2019-08-19 - https://​​arxiv.org/​​abs/​​1908.07613

Shah, Ro­hin; Gun­do­tra, Noah; Abbeel, Pieter; Dra­gan, Anca - On the Fea­si­bil­ity of Learn­ing, Rather than As­sum­ing, Hu­man Bi­ases for Re­ward In­fer­ence − 2019-06-23 - https://​​arxiv.org/​​abs/​​1906.09624

Shah, Ro­hin; Krashen­in­nikov, Dmitrii; Alexan­der Jor­dan; Abbeel, Pieter; Dra­gan, Anca—Prefer­ences Im­plicit in the State of the World − 2019-02-12 - https://​​arxiv.org/​​abs/​​1902.04198

Shul­man, Carl—Per­son-af­fect­ing views may be dom­i­nated by pos­si­bil­ities of large fu­ture pop­u­la­tions of nec­es­sary peo­ple − 2019-11-30 - http://​​re­flec­tivedis­e­quil­ibrium.blogspot.com/​​2019/​​11/​​per­son-af­fect­ing-views-may-be-dom­i­nated.html

Sny­der-Beat­tie, An­drew; Ord, Toby; Bon­sall, Michael—An up­per bound for the back­ground rate of hu­man ex­tinc­tion − 2019-07-30 - https://​​www.na­ture.com/​​ar­ti­cles/​​s41598-019-47540-7

Steiner, Char­lie—Some Com­ments on Stu­art Arm­strong’s “Re­search Agenda v0.9” − 2019-08-08 - https://​​www.less­wrong.com/​​posts/​​GHNokcgERpLJwJnLW/​​some-com­ments-on-stu­art-arm­strong-s-re­search-agenda-v0-9

Stein­hardt, Ja­cob—AI Align­ment Re­search Overview − 2019-10-14 - https://​​ro­hin­shah.us18.list-man­age.com/​​track/​​click?u=1d1821210cc4f04d1e05c4fa6&id=1a148ef72c&e=1e228e7079

Ster­benz, Ciara; Trager, Robert—Au­tonomous Weapons and Co­er­cive Threats − 2019-02-06 - https://​​aipulse.org/​​au­tonomous-weapons-and-co­er­cive-threats/​​

Sut­ton, Rich—The Bit­ter Les­son − 2019-03-13 - http://​​www.in­com­pletei­deas.net/​​IncIdeas/​​Bit­terLes­son.html

Szlam et al. - Why Build an As­sis­tant in Minecraft? − 2019-07-19 - https://​​arxiv.org/​​abs/​​1907.09273

Tay­lor, Jes­sica - − 1900-01-00 - https://​​www.aaai.org/​​ocs/​​in­dex.php/​​WS/​​AAAIW16/​​pa­per/​​view/​​12613

Tay­lor, Jes­sica—The AI Timelines Scam − 2019-07-11 - https://​​un­sta­bleon­tol­ogy.com/​​2019/​​07/​​11/​​the-ai-timelines-scam/​​

Tay­lor, Jes­sica; Gal­lagher, Jack; Malt­in­sky, Baeo—Re­vist­ing the In­sights model − 2019-07-20 - http://​​me­di­an­group.org/​​in­sights2.html

The AlphaS­tar Team—AlphaS­tar: Mas­ter­ing the Real-Time Strat­egy Game StarCraft II − 2019-01-24 - https://​​deep­mind.com/​​blog/​​ar­ti­cle/​​alphas­tar-mas­ter­ing-real-time-strat­egy-game-star­craft-ii

Turner, Alexan­der; Dad­field-Menell, Dy­lan; Tade­palli, Prasad—Con­ser­va­tive Agency − 2019-02-26 - https://​​arxiv.org/​​abs/​​1902.09725

Tza­chor, Asaf—The Fu­ture of Feed: In­te­grat­ing Tech­nolo­gies to De­cou­ple Feed Pro­duc­tion from En­vi­ron­men­tal Im­pacts − 2019-04-23 - https://​​www.lie­bert­pub.com/​​doi/​​full/​​10.1089/​​ind.2019.29162.atz

Useato, Jonathan; Ku­mar, Ananya; Szepes­vari, Cs­aba; Erex, Tom; Ru­d­er­man, Avra­ham; An­der­son, Keith; Dvijotham, Kr­ish­ma­murthy; Heess, Ni­co­las; Kohli, Push­meet—Ri­gor­ous Agent Eval­u­a­tion: An Ad­ver­sar­ial Ap­proach to Un­cover Catas­trophic Failures − 2018-12-04 - https://​​arxiv.org/​​abs/​​1812.01647

USG—Na­tional Se­cu­rity Com­mis­sion on Ar­tifi­cial In­tel­li­gence In­terim Re­port − 2019-11-01 - https://​​drive.google.com/​​file/​​d/​​153OrxnuGEjsUvlxWsFYaus­lwNeCEkvUb/​​view

Walsh, Bryan—End Times: A Brief Guide to the End of the World − 2019-08-27 - https://​​smile.ama­zon.com/​​End-Times-Brief-Guide-World-ebook/​​dp/​​B07J52NW99/​​ref=tmm_kin_swatch_0?_en­cod­ing=UTF8&qid=&sr=

Weitzdörfer & Julius, Beard & Si­mon—Law and Policy Re­sponses to Disaster-In­duced Fi­nan­cial Distress − 2019-11-24 - https://​​sci-hub.tw/​​10.1007/​​978-981-13-9005-0

Whit­tle­stone, Jess; Nyrup, Rune; Alexan­drova, Anna; Cave, Stephen—The Role and Limits of Prin­ci­ples in AI Ethics: Towards a Fo­cus on Ten­sions − 2019-01-27 - http://​​lcfi.ac.uk/​​me­dia/​​up­loads/​​files/​​AIES-19_pa­per_188_Whit­tle­stone_Nyrup_Alexan­drova_Cave_OcF7jnp.pdf

Za­bel, Claire; Muehlhauser, Luke—In­for­ma­tion se­cu­rity ca­reers for GCR re­duc­tion − 2019-06-20 - https://​​fo­rum.effec­tivealtru­ism.org/​​posts/​​ZJiCfwTy5dC4CoxqA/​​in­for­ma­tion-se­cu­rity-ca­reers-for-gcr-reduction

Zhang, Baobao; Dafoe, Allan—Ar­tifi­cial In­tel­li­gence: Amer­i­can At­ti­tudes and Trends − 2019-01-15 - https://​​gov­er­nanceai.github.io/​​US-Public-Opinion-Re­port-Jan-2019/​​