2021 AI Alignment Literature Review and Charity Comparison

cross-posted to the EA fo­rum here.

Introduction

As in 2016, 2017, 2018, 2019 and 2020 I have at­tempted to re­view the re­search that has been pro­duced by var­i­ous or­gani­sa­tions work­ing on AI safety, to help po­ten­tial donors gain a bet­ter un­der­stand­ing of the land­scape. This is a similar role to that which GiveWell performs for global health char­i­ties, and some­what similar to a se­cu­ri­ties an­a­lyst with re­gards to pos­si­ble in­vest­ments.

My aim is ba­si­cally to judge the out­put of each or­gani­sa­tion in 2021 (tech­ni­cally: 2020-12-01 to 2021-11-30) and com­pare it to their bud­get. This should give a sense of the or­gani­sa­tions’ av­er­age cost-effec­tive­ness. We can also com­pare their fi­nan­cial re­serves to their 2021 bud­gets to get a sense of ur­gency.

This doc­u­ment aims to be suffi­ciently broad that some­one who has not paid any at­ten­tion to the space all year could read it (and the linked doc­u­ments) and be as well-in­formed to make dona­tion de­ci­sions as they could rea­son­ably be with­out per­son­ally in­ter­view­ing re­searchers and or­gani­sa­tions.

I’d like to apol­o­gize in ad­vance to ev­ery­one do­ing use­ful AI Safety work whose con­tri­bu­tions I have over­looked or mis­con­strued. As ever I am painfully aware of the var­i­ous cor­ners I have had to cut due to time con­straints from my job, as well as be­ing dis­tracted by 1) other pro­jects, 2) the mir­a­cle of life and 3) com­puter games.

This ar­ti­cle fo­cuses on AI risk work. If you think other causes are im­por­tant too, your pri­ori­ties might differ. This par­tic­u­larly af­fects GCRI, FHI and CSER, who both do a lot of work on other is­sues which I at­tempt to cover but only very cur­so­rily.

How to read this document

This doc­u­ment is fairly ex­ten­sive, and some parts (par­tic­u­larly the method­ol­ogy sec­tion) are largely the same as last year, so I don’t recom­mend read­ing from start to finish. In­stead, I recom­mend nav­i­gat­ing to the sec­tions of most in­ter­est to you. You should also read the Con­flict of In­ter­est Sec­tion.

If you are in­ter­ested in a spe­cific re­search or­gani­sa­tion, you can use the table of con­tents to nav­i­gate to the ap­pro­pri­ate sec­tion. You might then also want to Ctrl+F for the or­gani­sa­tion acronym in case they are men­tioned el­se­where as well. Papers listed as ‘X re­searchers con­tributed to the fol­low­ing re­search lead by other or­gani­sa­tions’ are in­cluded in the sec­tion cor­re­spond­ing to their first au­thor and you can Cn­trl+F to find them.

If you are in­ter­ested in a spe­cific topic, I have added a tag to each pa­per, so you can Ctrl+F for a tag to find as­so­ci­ated work. The tags were cho­sen some­what in­for­mally so you might want to search more than one, es­pe­cially as a piece might seem to fit in mul­ti­ple cat­e­gories.

Here are the un-sci­en­tifi­cally-cho­sen hash­tags:

  • AgentFoundations

  • Amplification

  • Capabilities

  • Community

  • Ethics

  • Fiction

  • Forecasting

  • GameTheory

  • Interpretability

  • IRL

  • NearTerm

  • Obstruction

  • Other

  • Overview

  • Policy

  • Robustness

  • Scenarios

  • ShortTerm

  • Strategy

  • Translation

  • ValueLearning

You might also be in­ter­ested in the ‘Or­gani­sa­tion Se­cond Prefer­ence’ sec­tion, which is new this year.

New to Ar­tifi­cial In­tel­li­gence as an ex­is­ten­tial risk?

If you are new to the idea of Gen­eral Ar­tifi­cial In­tel­li­gence as pre­sent­ing a ma­jor risk to the sur­vival of hu­man value, I recom­mend this Vox piece by Kel­sey Piper, or for a more tech­ni­cal ver­sion thisby Richard Ngo.

If you are already con­vinced and are in­ter­ested in con­tribut­ing tech­ni­cally, I recom­mend this piece by Ja­cob Stein­hardt, as un­like this doc­u­ment Ja­cob cov­ers pre-2019 re­search and or­ganises by topic, not or­gani­sa­tion, or this from Hendrycks et al., or this­from Critch & Krueger, or this from Ever­itt et al, though it is a few years old now.

Con­flict of Interest

In the past I have had very de­mand­ing stan­dards around Con­flicts of In­ter­est, in­clud­ing be­ing crit­i­cal of oth­ers for their lax treat­ment of the is­sue. His­tor­i­cally this was not an is­sue be­cause I had very few con­flicts. How­ever this year I have ac­cu­mu­lated a large num­ber of such con­flicts, and worse, con­flicts that can­not all be in­di­vi­d­u­ally pub­li­cally dis­closed due to an­other eth­i­cal con­straint.

As such the reader should as­sume I could be con­flicted on any and all re­viewed or­gani­sa­tions.

Re­search Organisations

FHI: The Fu­ture of Hu­man­ity Institute

FHI is an Oxford-based Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2005 by Nick Bostrom. They are af­fili­ated with Oxford Univer­sity. They cover a wide va­ri­ety of ex­is­ten­tial risks, in­clud­ing ar­tifi­cial in­tel­li­gence, and do poli­ti­cal out­reach. Their re­search can be found here.

Their re­search is more varied than MIRI’s, in­clud­ing strate­gic work, work di­rectly ad­dress­ing the value-learn­ing prob­lem, and cor­rigi­bil­ity work—as well as work on other Xrisks.

They ran a Re­search Schol­ars Pro­gram, where peo­ple could join them to do re­search at FHI. There is a fairly good re­view of this here, albeit from be­fore the pan­demic /​ hiring freeze.

The EA Meta Fund sup­ported a spe­cial pro­gram for pro­vid­ing in­fras­truc­ture and sup­port to FHI, called the Fu­ture of Hu­man­ity Foun­da­tion. This re­minds me some­what of what BERI does.

In the past I have been very im­pressed with their re­search.

They didn’t share any in­for­ma­tion with me about hiring or de­par­tures.

Research

Co­hen et al.‘s Fully Gen­eral On­line Imi­ta­tion Learn­ing is a de­tailed pa­per pre­sent­ing an on­line imi­ta­tor learner that op­er­ates with bounds on how harm­ful it can be de­spite no prior train­ing phase. The imi­ta­tor keeps track of the most plau­si­ble mod­els of the teacher, and for each pos­si­ble ac­tion gives it the min­i­mum prob­a­bil­ity of any teacher-model as­cribed. At the be­gin­ning, when you have many plau­si­ble mod­els, this means a lot of prob­a­bil­ity mass is un­defined, and hence you re­quest feed­back a lot (i.e. er­satz train­ing pe­riod), but over time you should nar­row down on the true model (though per­haps this is very slow?) They prove some re­sults to sug­gest that this al­lows us to keep the prob­a­bil­ity of catas­tro­phes low, pro­por­tionate to their risk un­der the true model, as­sum­ing the true model was in our ‘most plau­si­ble’ set at the be­gin­ning (which is not clear to me). In­ci­den­tally, I think this is a good ex­am­ple of the prob­lems of aca­demic re­search. The al­ign­ment fo­rum blog post quite clearly lays out that this is about mesa-op­ti­misers… but the pa­per (per­haps be­cause of re­view­ers?) liter­ally does not in­clude the string ‘mesa’ at all, mak­ing it a lot harder to un­der­stand the sig­nifi­cance! See also the dis­cus­sion here. Over­all I thought this was an ex­cel­lent pa­per. Re­searchers from Deep­mind were also named au­thors on the pa­per. #IRL

Evans et al.‘s Truth­ful AI: Devel­op­ing and gov­ern­ing that does not lie is a de­tailed and lengthy piece dis­cussing a lot of is­sues around truth­ful­ness for AI agents. This in­cludes con­cep­tual, prac­ti­cal and gov­er­nance is­sues, es­pe­cially with re­gard con­ver­sa­tion bots. They ar­gue for truth­ful­ness (or at least, non-neg­li­gently-false) rather than hon­esty as a stan­dard both to avoid in­ten­tion­al­ity is­sues and also be­cause, rel­a­tive to hu­mans, the costs of pun­ish­ing un­in­tended mis­takes are much lower /​ less un­fair. Espe­cially hard top­ics in­clude the truth­ful­ness of AIs that are more ex­pert than hu­mans (and hence can­not be di­rectly sub­ject to hu­man over­sight) and cases where the truth is con­tested—for ex­am­ple, tech plat­forms choice to sup­press as mis­lead­ing var­i­ous claims about covid which con­tra­dicted offi­cial ad­vice, even though the offi­cial ad­vice was fre­quently clearly wrong. I’m not con­vinced that their ap­proach would end up be­ing sig­nifi­cantly differ­ent than ‘dom­i­nant ide­ol­ogy cen­sors ri­vals’ - even sug­gest­ing ex­plicit warn­ings about con­tro­versy has the is­sue that what is con­tro­ver­sial is it­self con­tro­ver­sial. See also the dis­cus­sion here and here. Re­searchers from GPI, OpenAI were also named au­thors on the pa­per. #Strategy

Lin et al.‘s Truth­fulQA: Mea­sur­ing How Models Mimic Hu­man False­hoods pro­vides a se­ries of test ques­tions to study how ‘hon­est’ var­i­ous text mod­els are. Of course, these mod­els are try­ing to copy hu­man re­sponses, not be hon­est, so be­cause many of the ques­tions al­lude to com­mon mis­con­cep­tions, the more ad­vanced mod­els ‘lie’ more of­ten. In­ter­est­ingly they also used GPT-3 to eval­u­ate the truth of these an­swers. See also the dis­cus­sion here. Re­searchers from OpenAI were also named au­thors on the pa­per. #Other

Ord et al.’s Fu­ture Proof: The Op­por­tu­nity to Trans­form the UK’s Re­silience to Ex­treme Risks is a flashy policy doc­u­ment recom­mend­ing steps HMG could take to pre­pare for fu­ture risks. This in­cludes things like hav­ing a Chief Risk Officer and avoid­ing AI con­trol of the nu­clear weapons, as well as non-AI-rele­vant but still good recom­men­da­tions for bio. In gen­eral I thought it did a sig­nifi­cantly bet­ter job, and rep­re­sented a more re­al­is­tic the­ory of change, than many pre­vi­ous policy pieces Re­searchers from CSER, Gov.AI were also named au­thors on the pa­per. #Policy

Man­heim & Sand­berg’s What is the Up­per Limit of Value? ar­gues that the to­tal amount of value/​growth hu­man­ity can cre­ate/​ex­pe­rience is finite. This is re­lated to typ­i­cal ‘limits to growth’ ar­gu­ments, ex­cept much bet­ter and a much higher ceiling: rather than wor­ry­ing about peak oil, they dis­cuss the speed of light and the difficul­ties ex­tract­ing in­finite value from a sin­gle elec­tron. As they note, it may not ap­ply in ex­otic mod­els of physics though, and I do not un­der­stand why they think that you can­not as­sign a prob­a­bil­ity to some­thing so low that noth­ing could ever con­vince you it hap­pened. See also the dis­cus­sion here. #Other

Ham­mond et al.’s Equil­ibrium Refine­ments for Multi-Agent In­fluence Di­a­grams: The­ory and Prac­tice ex­tends pre­vi­ous work on rep­re­sent­ing games as causal net­works rather an pay­off ta­bles. This has the ad­van­tage of fa­cil­i­tat­ing de­com­pos­abil­ity, mak­ing them much more eas­ily un­der­stable; the hope is that this type of work will help us to un­der­stand when agents have bad in­cen­tives to e.g. de­ceive. Re­searchers from Deep­mind were also named au­thors on the pa­per. #GameTheory

Fin­nve­den’s Ex­trap­o­lat­ing GPT-N perfor­mance ex­am­ines the perfor­mance scal­ing for GPT on a va­ri­ety of tasks. He finds gen­er­ally rel­a­tively smooth scal­ing, and in­ter­est­ingly comes to rel­a­tively similar con­clu­sions to Ajeya’s work de­spite the differ­ent method­ol­ogy. #Forecasting

FHI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

They also pro­duced a va­ri­ety of pieces on biorisk and other similar sub­jects, which I am sure are very good and im­por­tant but I have not read.

Finances

FHI is ap­par­ently cur­rently banned from fundrais­ing by the uni­ver­sity, and hence can­not share fi­nan­cial in­for­ma­tion. I would guess their bud­get is rel­a­tively large. Ap­par­ently they have suffi­cient funds to last un­til the ex­pected re­sump­tion of fundrais­ing in the new year. I do not know the ex­act rea­son for this ban; the ab­sence of dis­clo­sure makes due dili­gence of them very difficult.

If you wanted to donate to them any­way, here is the rele­vant web page.

GovAI: The Cen­ter for the Gover­nance of AI

GovAI is an Oxford based AI Gover­nance Re­search or­gani­sa­tion founded in 2021 by Allan Dafoe; Ben Garfinkel be­came Act­ing Direc­tor in 2021. They are af­fili­ated with CEA. They were formerly a re­search cen­ter as part of FHI, but spun out this year to al­low Allan to take up a po­si­tion at Deep­mind, and for in­creased op­er­a­tional in­de­pen­dence from the uni­ver­sity. Their re­search can be found here. Their dec­la­ra­tion of in­de­pen­dence can be found here.

I gen­er­ally re­garded the work they did as part of FHI as quite good, and pre­sum­ably that is likely to con­tinue.

The one pos­si­ble down­side of leav­ing the uni­ver­sity af­fili­a­tion is the loss of as­so­ci­ated pres­tige.

Research

Zaidi & Dafoe’s In­ter­na­tional Con­trol of Pow­er­ful Tech­nol­ogy: Les­sons from the Baruch Plan for Nu­clear Weapons at­tempts to draw con­clu­sions rele­vant for AGI con­trol. It’s a very de­tailed ac­count of ne­go­ti­a­tions, but I’m not sure how much we can learn from it, given that the plan failed, and it seems plau­si­ble that nei­ther side was re­ally ne­go­ti­at­ing in earnest any­way. #Strategy

Fischer et al.’s AI Policy Lev­ers: A Re­view of the U.S. Govern­ment’s Tools to Shape AI Re­search, Devel­op­ment, and De­ploy­ment lays out var­i­ous policy lev­ers the USG can use to con­trol AI. Th­ese gen­er­ally fo­cus on AI race is­sues—e.g. tech­niques to un­der­mine chi­nese com­pe­ti­tion—rather than AGI con­trol, which would be an is­sue even with a unified world gov­ern­ment. It fo­cuses on tools based in cur­rent law, which I think makes sense, as even in pre­vi­ous crisis (e.g. 2008, March 2020) the gov­ern­ment re­sponse has leant heav­ily on re­pur­pos­ing ex­ist­ing pro­grams and per­mis­sions. #Policy

Dafoe et al.’s Open Prob­lems in Co­op­er­a­tive AI and Co­op­er­a­tive AI: Machines Must Learn to Find Com­mon Ground give an overview of differ­ent ways to think about co­op­er­a­tion is­sues. This is not about prin­ci­ple-agent is­sues with get­ting an AI to do what its hu­man wants, but about how to deal with mul­ti­ple hu­mans/​AIs with differ­ent goals and knowl­edge. Much of the piece was effec­tively about hu­man co­or­di­na­tion—while there were some AI-spe­cific ideas, like pre-com­mit­ment in lane merg­ing for au­tonomous cars, ideas like AIs self-mod­ify­ing to a joint util­ity func­tion didn’t get much dis­cus­sion. Re­searchers from FHI, Deep­mind were also named au­thors on the pa­per. #Strategy

Zhang’s Ethics and Gover­nance of Ar­tifi­cial In­tel­li­gence: Ev­i­dence from a Sur­vey of Ma­chine Learn­ing Re­searchers asked a bunch of peo­ple at Neu­roIPS etc. what they thought of var­i­ous eth­i­cal is­sues. Sort of a fol­low up to Katja’s pre­vi­ous re­searcher sur­vey and BaoBao’s sur­vey of the pub­lic, many of the re­sults are not sur­pris­ing—e.g. re­searchers trust sci­ence or­gani­sa­tions and dis­trust the mil­i­tary and china. I was sur­prised/​dis­ap­pointed to see that ex­perts were less con­cerned about Value Align­ment than the gen­eral pub­lic (to the ex­tent we can in­ter­pret these sur­veys liter­ally). Re­searchers from Gov.AI were also named au­thors on the pa­per. #Strategy

Ding’s China’s Grow­ing In­fluence over the Rules of the Digi­tal Road de­scribes China’s ap­proach to in­fluenc­ing tech­nol­ogy stan­dards, and sug­gests some poli­cies the US might adopt. #Policy

Gark­inkel’s A Tour of Emerg­ing Cryp­to­graphic Tech­nolo­gies pro­vides an overview of var­i­ous cryp­tog­ra­phy tech­niques (not pri­mar­ily cur­rency) and their rele­vance for is­sues like co­or­di­na­tion prob­lems and surveillance. #Overview

Dafoe et al.‘s Rep­u­ta­tions for Re­solve and Higher-Order Beliefs in Cri­sis Bar­gain­ing con­ducts a semi-ex­pert sur­vey to try to eval­u­ate for­eign policy de­ci­sion mak­ers’ use of rep­u­ta­tion for re­solve when eval­u­at­ing geopoli­ti­cal strat­egy. Their work sug­gests find that rep­u­ta­tion does mat­ter, and the Dom­ino The­ory is true: a his­tory of stead­fast­ness causes oth­ers to ex­pect you to be re­s­olute in the fu­ture, and hence they are more likely to back down. #Strategy

Ding & Dafoe’s Eng­ines of Power: Elec­tric­ity, AI, and Gen­eral-Pur­pose Mili­tary Trans­for­ma­tions ar­gues that for coun­tries to take ad­van­tage of AI In war­fare they will re­quire lot of do­mes­tic civilian AI ca­pa­bil­ities. This is in con­trast to peo­ple who have ar­gued that rapid AI tech­nol­ogy diffu­sion would up­set US dom­i­nance. How­ever, I’m not sure the con­clu­sions re­ally pull through for AGI. #Forecasting

Zwet­sloot, Remvo; Zhang, Baobao; An­der­ljung, Marksu; Horow­itz, Michael; Dafoe, Allan—The Im­mi­gra­tion Prefer­ences of Top AI Re­searchers: New Sur­vey Ev­i­dence − 2021-10-22 - https://​​www.gov­er­nance.ai/​​re­search-pa­per/​​the-im­mi­gra­tion-prefer­ences-of-top-ai-re­searchers-new-sur­vey-evidence

Re­searchers from GovAI were also named con­trib­u­tors to the fol­low­ing pa­pers:

De­spite the name they have done a fair bit of work on non-AI re­lated gov­er­nance; these pa­pers are not re­viewed here.

Finances

They cur­rently have around $3.5m cash, which they ex­pect con­sti­tutes around 2-3 years of run­way (less if they grow faster).

They are not cur­rently ac­tively so­lic­it­ing dona­tions but if you would like to donate to them, you can do so here.

CHAI: The Cen­ter for Hu­man-Com­pat­i­ble AI

CHAI is a UC Berkeley based AI Safety Re­search or­gani­sa­tion founded in 2016 by Stu­art Rus­sell. They do ML-ori­ented safety re­search, es­pe­cially around in­verse re­in­force­ment learn­ing, and cover both near and long-term fu­ture is­sues.

As an aca­demic or­gani­sa­tion their mem­bers pro­duce a very large amount of re­search; I have only tried to cover the most rele­vant be­low. It seems they do a bet­ter job en­gag­ing with academia than many other or­gani­sa­tions, es­pe­cially in terms of in­ter­fac­ing with the cut­ting edge of non-safety-spe­cific re­search. The down­side of this, from our point of view, is that not all of their re­search is fo­cused on ex­is­ten­tial risks.

They have ex­panded some­what to other uni­ver­si­ties out­side Berkeley and have peo­ple at places like Prince­ton and Cor­nell.

Research

CHAI and their as­so­ci­ated aca­demics pro­duce a huge quan­tity of re­search. Far more so than other or­gani­sa­tions their out­put is un­der-stated by my sur­vey here; if they were a small or­gani­sa­tion that only pro­duced one re­port, there would be 100% cov­er­age, but as it is this is just a sam­ple of those pieces I felt most in­ter­ested in. On the other hand aca­demic or­gani­sa­tions tend to pro­duce some slightly less rele­vant work also, and I have fo­cused on what seemed to me to be the top pieces.

Hendrycks et al.‘s Un­solved Prob­lems in ML Safety pro­vides an overview of ML safety is­sues: Ro­bust­ness, Mon­i­tor­ing, Align­ment and ‘Ex­ter­nal Safety’. It’s ba­si­cally an up­dated ver­sion of Con­crete Prob­lems, with one of the same au­thors. I gen­er­ally think these pieces are quite good for helping provide easy on-ramps (with Google/​OpenAI cre­den­tials) for main­stream re­searchers. There is prob­a­bly not a huge amount of novel con­tent here for read­ers of this ar­ti­cle, though I thought the in­tro­duc­tory mo­ti­va­tion sec­tion was well writ­ten. See also the dis­cus­sion here. Re­searchers from OpenAI were also named au­thors on the pa­per. #Overview

Laid­law & Rus­sell’s Uncer­tain De­ci­sions Fa­cil­i­tate Bet­ter Prefer­ence Learn­ing pre­sents an ap­proach for learn­ing util­ity func­tions from the be­havi­our of hu­mans act­ing un­der un­cer­tainty. They ar­gue that un­cer­tainty can ac­tu­ally make it eas­ier to in­fer val­ues, for similar rea­sons (I think) that util­ity func­tions are de­rived from prefer­ences over bets: by be­ing less ex­tremal there is more prior un­cer­tainty about what agents will take, and hence more in­for­ma­tion in their ac­tions. #IRL

Ro­man et al.’s Ac­cu­mu­lat­ing Risk Cap­i­tal Through In­vest­ing in Co­op­er­a­tion is a game the­ory pa­per about pro­mot­ing co­op­er­a­tion while limit­ing down­side. Ba­si­cally they prove re­sults about the trade­off be­tween the two goals, and show that you can get asymp­tot­i­cally good (i.e. max­i­mally co­op­er­a­tive) be­havi­our. #GameTheory

Hendrycks et al.’s What Would Jiminy Cricket Do? Towards Agents That Be­have Mo­rally an­no­tate a se­ries of text-based games with ac­tion ethics scores and use them for policy shap­ing. They note that many of the games ac­tively re­warded im­moral be­havi­our, and at­tempted to cor­rect this with pro tanto eth­i­cal la­bel­ling. This was then used to ad­just a model at the last step (not re­train­ing us­ing eth­i­cal judge­ments as re­wards). #ValueLearning

Filan et al.’s Cluster­abil­ity in Neu­ral Net­works pro­duces a defi­ni­tion of clus­ter­ing to iden­tify po­ten­tially mean­ingful sub­com­po­nents in neu­ral net­works, and shows that ac­tu­ally trained (and hence mean­ingful in ag­gre­gate) nets have these sub­com­po­nents more than ran­domly gen­er­ated nets. This ba­si­cally tries to pro­duce sub­com­po­nents of neu­rons that have strong in­tra­con­nec­tions and weak in­ter­con­nec­tions, rel­a­tive to their size. They also sug­gest two meth­ods for pro­mot­ing clus­ter­ing in a net­work, for when you want to de­sign in­ter­pretabil­ity into a sys­tem: eigen­vec­tor reg­u­lari­sa­tion and ini­tial­is­ing the weights with pre-ex­ist­ing clusters. #Interpretability

Zhuang & Had­field-Menell’s Con­se­quences of Misal­igned AI provide a sim­ple model of mis-al­ign­ment due to util­ity func­tion mis-speci­fi­ca­tion. Essen­tially the true util­ity func­tion is a lin­ear func­tion of N at­tributes, but the agent’s util­ity func­tion only cov­ers M<N, and the re­source con­straint is con­cave, so the robot ends up sac­ri­fic­ing the in­visi­ble N-M at­tributes. They then dis­cuss some solu­tions, in­clud­ing clas­sic ones like Arm­strong’s im­pact min­imi­sa­tion or the hu­man con­stantly pro­vid­ing new lo­cal util­ity func­tions—both of which un­for­tu­nately re­quire you to know what the other at­tributes are. #AgentFoundations

Critch’s What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs) ar­gues for a fo­cus on multi-po­lar AI sce­nar­ios, and on the mechanisms by which they in­ter­act over the de­signs of in­di­vi­d­ual AIs. The main part of the post is a se­ries of sce­nar­ios, show­ing bad out­comes from oth­er­wise al­igned AIs be­cause com­pe­ti­tion forces them to sac­ri­fice al­ign­ment. Ba­si­cally AI com­pa­nies are su­per suc­cess­ful at mak­ing prof­its, but then even­tu­ally they be­come too pow­er­ful and we die. I found the mod­els a bit con­fus­ing—they didn’t seem to re­ally ex­plain how this com­pe­ti­tion goes from ‘the best thing in the his­tory of the world’ to ‘ex­tinc­tion’. Stan­dard microe­co­nomics sug­gests that the perfect com­pe­ti­tion you need to ‘force’ all firms to be max­i­mally effi­cient is in­fact great for con­sumers (hu­mans). The ar­ti­cle claims that hu­mans will lose con­trol be­cause the AIs have be­come ex­tremely pow­er­ful and well defended, but I don’t see how this makes sense: for­tified com­pounds are an ex­pen­sive li­a­bil­ity that firms in perfect com­pe­ti­tion can­not af­ford, and an­tag­o­nis­ing hu­man­ity (which is ex­tremely rich and pow­er­ful be­cause of all the stuff the AI firms have made for us) also sounds like a very ex­pen­sive mis­take to make. I think my steel­man would be that these firms are ex­ploit­ing in­se­cure prop­erty rights, in which case the solu­tion to AI al­ign­ment turns out to be… car­bon taxes? #Scenarios

Lind­ner et al.‘s Learn­ing What To Do by Si­mu­lat­ing the Past is an ex­ten­sion of Ro­hin’s pre­vi­ous pa­per. Both pa­pers ba­si­cally try to learn from the la­tent in­for­ma­tion em­bod­ied in the world that already sig­nifi­cantly re­flects hu­man prefer­ences—e.g. if an AI dis­cov­ers a fence in the woods, it can rea­son­ably in­fer some hu­man prefered a fence be there. This pa­per ba­si­cally aims to move from gird­wor­lds with triv­ial ‘physics’ to more re­al­is­tic set­tings where you can’t pre­cisely com­pute the his­to­ries. Re­searchers from CHAI were also named au­thors on the pa­per. #IRL

Shah et al.‘s The MineRL BASALT Com­pe­ti­tion on Learn­ing from Hu­man Feed­back in­tro­duces a com­pe­ti­tion for learn­ing hu­man in­tent in a noisy en­vi­ron­ment: for tasks like “build­ing wa­ter­falls in Minecraft”. The idea here is that while hu­mans have a strong sense for what is a ‘nice look­ing’ wa­ter­fall, we don’t have a good for­mal speci­fi­ca­tion of the task, so you are com­pet­ing to de­sign agents that are best at ex­tract­ing hu­man in­tent. By us­ing Minecraft they provide a huge space of pos­si­ble strate­gies (vs. e.g. Atari games) and by us­ing hu­man feed­back you re­duce (elimi­nate?) the po­ten­tial for ex­cess hy­per­pa­ram­e­ter tun­ing. See also the dis­cus­sion here. Re­searchers from OpenAI were also named au­thors on the pa­per. #ValueLearning

Hod’s De­tect­ing Mo­du­lar­ity in Deep Neu­ral Net­works provide two crite­ria for whether a neu­ral net­work sub­com­po­nent iden­ti­fied through Filan-style spec­tral anal­y­sis is cap­tur­ing a ‘real’, hu­man-in­tu­itive sub­task. The idea is that the cor­re­la­tion be­tween the neu­rons, and the crit­i­cal­ity of the sub­com­po­nent to over­all perfor­mance, can be au­to­mat­i­cally calcu­lated with­out re­quiring hu­man in­put. #Interpretability

Lee et al.’s PEBBLE: Feed­back-Effi­cient In­ter­ac­tive Re­in­force­ment Learn­ing via Re­la­bel­ing Ex­pe­rience and Un­su­per­vised Pre-train­ing pro­poses a model whereby agents ini­tially learn in an un­su­per­vised fash­ion to bet­ter economise on the scarce re­source of hu­man over­sight later. This is in con­trast to some other ap­proaches where the teacher will provide near-con­stant feed­back in the very early stages when un­cer­tainty is high; pre­sum­ably the cost is that this ex­plo­ra­tion is not safe. They also change the way the agent up­dates based on the sam­ples shown to the hu­mans, but I didn’t quite fol­low that bit. #ValueLearning

Gates et al.‘s A ra­tio­nal model of peo­ple’s in­fer­ences about oth­ers’ prefer­ences based on re­sponse times sug­gests us­ing how long peo­ple take to re­spond as a hid­den mea­sure of prefer­ence in­ten­sity. To my knowl­edge this ‘free’ bit of in­for­ma­tion has not been pro­posed pre­vi­ously. #ValueLearning

CHAI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

Finances

They have been funded by var­i­ous EA or­gani­sa­tions in­clud­ing the Open Philan­thropy Pro­ject.

They spent $1,650,000 in 2020 and $1,250,000 in 2021, and plan to spend around $1,750,000 in 2022. They have around $11,000,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 6.2 years of run­way, or close to 5 if they grow. Their 2021 spend­ing was sig­nifi­cantly be­low plan due to the pan­demic.

If you wanted to donate to them, here is the rele­vant web page.

MIRI: The Ma­chine In­tel­li­gence Re­search Institute

MIRI is a Berkeley based in­de­pen­dent AI Safety Re­search or­gani­sa­tion founded in 2000 by Eliezer Yud­kowsky and cur­rently led by Nate Soares. They were re­spon­si­ble for much of the early move­ment build­ing for the is­sue, but have re­fo­cused to con­cen­trate on re­search for the last few years. With a fairly large bud­get now, they are the largest pure-play AI al­ign­ment shop. Their re­search can be found here.

In gen­eral they do very ‘pure’ math­e­mat­i­cal work, in com­par­i­son to other or­gani­sa­tions with more ‘ap­plied’ ML or strat­egy fo­cuses. I think this is es­pe­cially no­table be­cause of the ir­re­place­abil­ity of the work. It seems quite plau­si­ble that some is­sues in AI safety will arise early on and in a rel­a­tively be­nign form for non-safety-ori­en­tated AI ven­tures (like au­tonomous cars or Minecraft helpers) – how­ever the work MIRI does largely does not fall into this cat­e­gory. I have also his­tor­i­cally been im­pressed with their re­search and staff.

Their agent foun­da­tions work is ba­si­cally try­ing to de­velop the cor­rect way of think­ing about agents and learn­ing/​de­ci­sion mak­ing by spot­ting ar­eas where our cur­rent mod­els fail and seek­ing to im­prove them. This in­cludes things like think­ing about agents cre­at­ing other agents.

In De­cem­ber 2020 (hence in scope for this year’s re­view) they an­nounced that their new re­search di­rec­tion pro­jects they launched in 2017 had been a dis­ap­point­ment, and they were wind­ing down those pro­grams. As a re­sult most of their en­g­ineer­ing staff have left. Given that I did not give them much credit in the past for this se­cret re­search pro­gram this does not sig­nifi­cantly change my opinion of them.

MIRI, in col­lab­o­ra­tion with CFAR, ran a se­ries of four-day work­shop/​camps, the AI Risk for Com­puter Scien­tists work­shops, which gather math­e­mat­i­ci­ans/​com­puter sci­en­tists who are po­ten­tially in­ter­ested in the is­sue in one place to learn and in­ter­act. This sort of work­shop seems very valuable to me as an on-ramp for tech­ni­cally tal­ented re­searchers, which is one of the ma­jor bot­tle­necks in my mind. In par­tic­u­lar they have led to hires for MIRI and other AI Risk or­gani­sa­tions in the past. How­ever, the web­site sug­gests these have been dis­con­tinued due to the pan­demic.

They also sup­port MIRIx work­shops around the world, for peo­ple to come to­gether to dis­cuss and hope­fully con­tribute to­wards MIRI-style work.

MIRI con­tinue their policy of nondis­clo­sure-by-de­fault, some­thing I’ve dis­cussed in the past, which de­spite hav­ing some strong ar­gu­ments in favour un­for­tu­nately makes it very difficult for me to eval­u­ate them. I’ve in­cluded some par­tic­u­larly in­ter­est­ing blog posts some of their peo­ple have writ­ten be­low, but many of their re­searchers pro­duce lit­tle to no pub­lic fac­ing con­tent.

They de­cided not to leave the Bay Area.

Research

Most of their work is non-pub­lic.

Garrabrant’s Tem­po­ral In­fer­ence with Finite Fac­tored Sets in­tro­duces a new way of do­ing tem­po­ral/​causal in­fluence via com­bi­na­torics. To sum­marise greatly, it in­tro­duces the idea of a set fac­tor­iza­tion, which is sort of dual to a set par­ti­tion, and uses this to in­tro­duce an al­ter­na­tive to Judea Pearl’s di­rected acyclic graph ap­proach to causal­ity. The ap­peal here, apart from be­ing a neat new defi­ni­tion, is that this could help us in­fer causal­ity with­out need­ing the graph, which does feel a bit like cheat­ing—you ba­si­cally get tem­po­ral di­rec­tion from fac­tor sub­set re­la­tions. See also the dis­cus­sion here. Over­all I thought this was an ex­cel­lent pa­per. #AgentFoundations

Yud­kowsky’s Dis­cus­sion with Eliezer Yud­kowsky on AGI in­ter­ven­tions is a tran­script of a Chat­man House dis­cus­sion Eliezer hosted on his views about the fu­ture. Very in­ter­est­ing in gen­eral; a lot of it de­scribes try­ing to cre­ate the situ­a­tion to be able to ex­ploit a fu­ture mir­a­cle that might oc­cur. He is gen­er­ally pes­simistic—even if one re­search org can be per­suaded to be­have sen­si­bly, the code will just be stolen by ri­vals. One sec­tion I didn’t un­der­stand was his de­sire for a se­cret team of 5 good ML re­searchers to try out var­i­ous (prob­a­bly un­suc­cess­ful) ideas—isn’t that MIRI? There is also ex­tended dis­cus­sion on is­sues like con­ver­gence, AI de­cep­tion, and the com­ments are also worth read­ing. #Strategy

Yud­kowsky’s Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds” com­bines an ex­tended re­ply from Eliezer ar­gu­ing for a rapid take­off with some back-and-forth with Paul about var­i­ous fore­casts. Many of the ar­gu­ments will not be very sur­pris­ing to those who have read enough Eliezer, but the di­alogue is very in­ter­est­ing to read, even if at times they strug­gled to pin down ex­actly the source of the dis­agree­ment.#Forecasting

Soares’s Visi­ble Thoughts Pro­ject and Bounty An­nounce­ment de­scribes a prize MIRI are putting out for cre­at­ing train­ing datasets for them. Ba­si­cally they want text run­throughs of a D&D cam­paign with ev­ery thought the dun­geon mas­ter might have ex­plic­itly writ­ten out. Ul­ti­mately they want to use #Interpretability

Finances

They spent $7,500,000 in 2020 and a ‘similar’ amount in 2021, and plan to spend around $6m in 2022. They have around $30,000,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 5.2 years of run­way. This large amount of run­way is due to some big re­cent crypto dona­tions.

They have been sup­ported by a va­ri­ety of EA groups in the past, in­clud­ing OpenPhil.

They are not run­ning a fundraiser this year due to strong re­serves, but if you wanted to donate to them any­way here is the rele­vant web page.

GCRI: The Global Catas­trophic Risks Institute

GCRI is a globally-based in­de­pen­dent Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2011 by Seth Baum and Tony Bar­rett. They cover a wide va­ri­ety of ex­is­ten­tial risks, in­clud­ing ar­tifi­cial in­tel­li­gence, and do policy out­reach to gov­ern­ments and other en­tities. Their re­search can be found here. Their an­nual sum­mary can be found here.

They run an ad­vis­ing and col­lab­o­ra­tion pro­gram where they gave guidance to peo­ple from around the world who wanted to help work on catas­trophic risks, and po­ten­tially write pa­pers with them.

In 2021 they hired An­drea Owe as Re­search As­so­ci­ate, and Robert de Neufville left ear­lier this month.

They have an AMA on the EA fo­rum.

Research

de Neufville & Baum’s Col­lec­tive Ac­tion on Ar­tifi­cial In­tel­li­gence: A Primer and Re­view pro­vides ways in which economists break col­lec­tive ac­tion prob­lems down and ap­plies them to AI. Th­ese ap­ply to both near and AGI is­sues. #Strategy

Owe & Baum’s The Ethics of Sus­tain­abil­ity for Ar­tifi­cial In­tel­li­gence dis­cusses a va­ri­ety of is­sues with the con­cept of sus­tain­abil­ity, and links them to AI. #Strategy

Owe & Baum’s Mo­ral Con­sid­er­a­tion of Non­hu­mans in the Ethics of Ar­tifi­cial In­tel­li­gence ar­gues for an­i­mals (and na­ture/​ar­tifi­cial agents) in AI ethics. It sur­veys ex­ist­ing state­ments of AI prin­ci­ples; few ex­plic­itly men­tion an­i­mals, and ar­gues against an­thro­pocen­trism—at least that we should give *some* weight to an­i­mals, even if less than hu­mans. #Ethics

Fitzger­ald et al.’s 2020 Sur­vey of Ar­tifi­cial Gen­eral In­tel­li­gence Pro­jects for Ethics, Risk, and Policy is ba­si­cally a mas­sive list of cur­rently ex­ist­ing AGI pro­jects. In­ter­est­ingly the pa­per ar­gues that for-profit AGI pro­jects at least claim to have eth­i­cal/​hu­man­i­tar­ian con­cerns sig­nifi­cantly more fre­quently than aca­demic ones. Ob­vi­ously this is in places a fairly sub­jec­tive ex­er­cise but I’m not aware of any­thing else of this na­ture since their ear­lier 2017 work. #Overview

Baum & Owe’s Ar­tifi­cial In­tel­li­gence Needs En­vi­ron­men­tal Ethics dis­cusses en­vi­ron­men­tal per­spec­tives on a num­ber of near-term AI is­sues, in­clud­ing whether ar­tifi­cial life counts for bio­di­ver­sity. #NearTerm

Baum et al.’s GCRI State­ment on the Jan­uary 6 US Capi­tol In­sur­rec­tion con­demns the protest. I thought the link to global geno­cide was a bit of a stretch. #Policy

GCRI re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

Finances

They spent $300,000 in 2020 and $415,000 in 2021, and plan to spend around $350,000 in 2022. They have around $600,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.7 years of run­way.

If you want to donate to GCRI, here is the rele­vant web page.

CSER: The Cen­ter for the Study of Ex­is­ten­tial Risk

CSER is a Cam­bridge based Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2012 by Jaan Tal­linn, Martin Rees and Huw Price, and then es­tab­lished by Seán Ó hÉigeartaigh with the first hire in 2015. After an in­ter­mis­sion they are once again lead by him Seán and are af­fili­ated with Cam­bridge Univer­sity. They cover a wide va­ri­ety of ex­is­ten­tial risks, in­clud­ing ar­tifi­cial in­tel­li­gence, and do poli­ti­cal out­reach, in­clud­ing to the UK and EU par­li­a­ments—e.g. this. Their re­search can be found here.

Seán re­turns as leader this year, re­plac­ing Cather­ine Rhodes. Jess Whit­tle­stone is leav­ing for CLTR/​Alpen­glow, and they hired Jes­sica Bland (who I heard good things about) and Paul In­gram.

In Nuño Sem­pere’s shal­low re­view he sug­gested that many CSER re­searchers were not re­ally fo­cused on longter­mist work, which is also my im­pres­sion.

They had a sub­mis­sion on the EU AI act here.

Research

Hua & Belfield’s AI & An­titrust: Rec­on­cil­ing Ten­sions Between Com­pe­ti­tion Law and Co­op­er­a­tive AI Devel­op­ment analy­ses var­i­ous AI safety gov­er­nance pro­pos­als for their com­pat­i­bil­ity with EU an­titrust rules. The pa­per fo­cuses on EU com­pe­ti­tion law, be­cause even though the EU has no big AI firms, it is very en­thu­si­as­tic about reg­u­lat­ing US tech firms. Wor­ry­ingly (but, to my mind, cor­rectly) it ar­gues that the OpenAI As­sist clause, where they promise to help rather than com­pete with any other firm who gets close to AGI, could be in vi­o­la­tion of the EU’s Ar­ti­cle 101(1), es­pe­cially if it oc­curs late in the race and be­tween mar­ket lead­ers (both likely) and the unilat­eral na­ture of the pro­posal only par­tially miti­gates it. The con­clu­sion to the pa­per is op­ti­mistic, but my read­ing of the spe­cific ar­gu­ments is quite nega­tive; I think it would be very hard for an AI com­pany to e.g. per­suade a hos­tile reg­u­la­tor to give them credit for the spec­u­la­tive effi­ciency gains of col­lu­sion. Three strate­gies it doesn’t con­sider are 1) avoid the EU (vi­able for OpenAI, not Google), 2) rely on EU en­force­ment be­ing so slow it is sim­ply ir­rele­vant (seems plau­si­ble) and 3) push­ing for re­forms to weaken an­titrust laws. Over­all I thought this was an ex­cel­lent pa­per. #Policy

Whit­tle­stone & Clark’s Why and How Govern­ments Should Mon­i­tor AI Devel­op­ment recom­mends that gov­ern­ments build ca­pa­bil­ities for mon­i­tor­ing the de­vel­op­ment of AI tech­nolo­gies. The ex­am­ples in the pa­per are all neart­erm things, but pre­sum­ably the mo­ti­va­tion is gen­eral readi­ness for AGI. Re­searchers from An­thropic were also named au­thors on the pa­per. #Policy

Maas & Stix’s Bridg­ing the gap: the case for an ‘In­com­pletely The­o­rized Agree­ment’ on AI policy ar­gues in a similar line to some pre­vi­ous pa­pers that peo­ple con­cerned with AI Xrisk have com­mon cause on var­i­ous cur­rent policy is­sues with those con­cerned with short term AI. It sug­gests that pub­lic dis­agree­ment be­tween near and long term peo­ple is bad be­cause it re­duces both their le­gi­t­i­macy, and hence sup­ports the ri­val ‘AI race’ com­pe­ti­tion fram­ing. #Strategy

Maas’s AI, Gover­nance Dis­place­ment, and the (De)Frag­men­ta­tion of In­ter­na­tional Law dis­cusses the po­ten­tial im­pacts of AI tech­nolo­gies on how in­ter­na­tional law is made and en­forced. #NearTerm

Maas’s Align­ing AI Reg­u­la­tion to So­ciotech­ni­cal Change ar­gues we should fo­cus on the im­pacts of tech­nolo­gies, rather than the spe­cific tech­nolo­gies them­selves, when de­cid­ing whether to reg­u­late. #Policy

They also did work on var­i­ous non-AI is­sues, which I have not read, but you can find on their web­site.

CSER re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

Finances

They spent $854,000 in 2020 and $1,300,000 in 2021, and plan to spend around $1,300,000 in 2022. It seems that similar to GPI maybe ‘run­way’ is not that mean­ingful—they sug­gested their grants be­gin to end in early 2022 and all end by mid-2024, the same dates as last year.

If you want to donate to them, here is the rele­vant web page.

OpenAI

OpenAI is a San Fran­cisco based in­de­pen­dent AI Re­search or­gani­sa­tion founded in 2015 by Sam Alt­man. They are one of the lead­ing AGI re­search shops, with a sig­nifi­cant fo­cus on safety. Ini­tially they planned to make all their re­search open, but changed plans and are now sig­nifi­cantly more se­lec­tive about dis­clo­sure—see for ex­am­ple here.

One of their biggest achieve­ments is GPT-3, a mas­sive nat­u­ral lan­guage al­gorithm that gen­er­ates highly plau­si­ble con­tinu­a­tions from prompts, which seems to be very ver­sa­tile. GPT-3 con­tinues to be one of OpenAI (and in fact any­one’s) biggest AI ca­pa­bil­ities achieve­ments. In 2021 they re­leased DALL-E, which is similar ex­cept in­stead of cre­at­ing text based on prompts it cre­ates pic­tures. They ini­tially pi­o­neered a de­layed re­lease pro­gram for GPT to al­low peo­ple to adapt to the per­ceived risks of this tech­nol­ogy (and to nor­mal­ise do­ing so for fu­ture tech­nolo­gies) GPT-3 is now gen­er­ally available for any­one to use.

A no­table GPT-3 deriva­tive this year is the OpenAI CoPilot, which helps pro­gramers in an in­tel­li­gently way based off open­source code.

They have also done work on iter­a­tively sum­maris­ing books (sum­maris­ing, and then sum­maris­ing the sum­mary, etc.) as a method for scal­ing hu­man over­sight.

In De­cem­ber 2020 the ma­jor­ity of the OpenAI safety team left, with most of them (Dario Amodei, Chris Olah) go­ing to found An­thropic and Paul Chris­ti­ano go­ing to found the Align­ment Re­search Cen­ter. Osten­si­bly this were un­re­lated ex­its and not be­cause of any prob­lems at OpenAI; how­ever their stated mo­ti­va­tions for leav­ing do not fully make sense to me (why couldn’t they do the safety work they want to within OpenAI?) and OpenAI did not seem to have re­place­ments lined up, though they do have Jan Leike now.

He­len Toner, of CSET and similar things, joined their board this year. OpenPhil’s Holden Karnofsky, who pre­vi­ously had joined the board af­ter OpenPhil made a $30m dona­tion which ex­pired this year, has now left the board. Will Hurd, a US poli­ti­cian, also joined the board.

They have a sub­mis­sion on the EU AI act here.

Research

Cam­marata et al.‘s Curve Cir­cuits is a very cool piece that shows the abil­ity to un­der­stand a (mod­er­ately) large neu­ral net­work in the Feyn­man sense: to recre­ate it. The au­thors are able to iden­tify what in­di­vi­d­ual neu­rals (and fam­i­lies of neu­rons) in a 50k+ curve-recog­ni­tion net­work ‘mean’. To test this they de­scribe the net­work’s ‘strat­egy’ in en­glish sen­tences, and then are able to re-im­ple­ment (more or less) the net­work based on this de­scrip­tion. This is much more ex­plan­able than I’d’ve ex­pected a neu­ral net to be! I do won­der if vi­sion prob­lems are un­usu­ally tractable here; would it be so easy to vi­su­al­ise what in­di­vi­d­ual neu­rons mean in a lan­guage model? In any case you should read the pa­per for the psychedelic pic­tures if noth­ing else. Over­all I thought this was an ex­cel­lent pa­per. #Interpretability

Barnes & Chris­ti­ano’s De­bate up­date: Obfus­cated ar­gu­ments prob­lem de­scribes a prob­lem they weren’t able to solve with their tests of AI safety through De­bate. Ba­si­cally there are ar­gu­ments where, even if you know it is wrong, it is very hard to nar­row down ex­actly where the er­ror is. This means that the hon­est de­ba­tor can’t pre­sent the judge with the sort of knock-down ev­i­dence they want. Clearly this was always go­ing to be a the­o­ret­i­cal is­sue; the main up­date here is that these obfus­cated er­ror ar­gu­ments can arise quite com­monly. #Amplification

Chen et al.‘s Eval­u­at­ing Large Lan­guage Models Trained on Code in­tro­duces and eval­u­ates the pro­gram­ming-lan­guage GPT im­ple­men­ta­tion that pre­ceded GitHub CoPilot. They ba­si­cally take a mas­sive pre­trained GPT model and point it at github, and then see how good it is at writ­ing python func­tions based on nat­u­ral lan­guage de­scrip­tions, tested with unit tests. They also at­tempted to write doc­strings from code. It dis­plays some of the same smooth scal­ing curves we see el­se­where with GPT. Some in­ter­est­ing be­havi­our emerges, in­clud­ing that it ‘knows’ how to write both good and bad code, and if your prompt in­cludes bad code (e.g. be­cause you are a bad pro­gram­mer who re­ally needs a copi­lot) it will as­sume you want to carry on writ­ing bad code! That seems po­ten­tially like a good safety les­son, but over­all this does seem rather like ca­pac­ity-en­hanc­ing re­search to me. There’s also some more ‘silly’ con­cerns, like that the AI might pre­fer some open-source pack­ages over oth­ers, which would be un­fair on the au­thors of the dis­favoured pack­ages. Re­searchers from OpenPhil were also named au­thors on the pa­per. #Capabilities

OpenAI Re­searchers also con­tributed to the fol­low­ing pa­pers lead by other or­gani­sa­tions:

  • Un­solved Prob­lems in ML Safety

  • The MineRL BASALT Com­pe­ti­tion on Learn­ing from Hu­man Feedback

  • Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

Finances

OpenAI was ini­tially funded with money from Elon Musk as a not-for-profit. They have since cre­ated an un­usual cor­po­rate struc­ture in­clud­ing a for-profit en­tity, in which Microsoft is in­vest­ing a billion dol­lars.

Given the strong fund­ing situ­a­tion at OpenAI, as well as their safety team’s po­si­tion within the larger or­gani­sa­tions, I think it would be difficult for in­di­vi­d­ual dona­tions to ap­pre­cia­bly sup­port their work. How­ever it could be an ex­cel­lent place to ap­ply to work.

Google Deepmind

Deep­mind is a Lon­don based AI Re­search or­gani­sa­tion founded in 2010 by Demis Hass­abis, Shane Legg and Mustafa Suley­man and cur­rently lead by Demis Hass­abis. They are af­fili­ated with Google. As well as be­ing ar­guably the most ad­vanced AI re­search shop in the world, Deep­mind has a very so­phis­ti­cated AI Safety team, cov­er­ing both ML safety and AGI safety.

We dis­cussed AlphaFold last year, and there was sig­nifi­cant fur­ther progress on pro­tein fold­ing this year with AlphaFold 2. Long-time fol­low­ers of the space will re­call this is a de­vel­op­ment Eliezer high­lighted back in 2008. See also this spec­u­la­tion that Deep­mind might have been try­ing to avoid pub­lish­ing it un­til a com­peti­tor ‘forced’ their hand.

Ro­hin Shah and team con­tinue to pro­duce the AI Align­ment Newslet­ter, cov­er­ing in de­tail a huge num­ber of in­ter­est­ing new de­vel­op­ments, es­pe­cially new pa­pers. I re­ally can­not praise these newslet­ters highly enough.

Research

Stooke et al.’s Open-Ended Learn­ing Leads to Gen­er­ally Ca­pable Agents is a wor­ry­ingly-ti­tled pa­per show­ing a tech­nique for train­ing agents to deal with a wide va­ri­ety of en­vi­ron­ments and ob­jec­tives. While AlphaZero showed the same al­gorithm could learn Chess/​Shoji/​Go etc., learn­ing any one of these games didn’t di­rectly help with the oth­ers. In this case they pro­duce a 3D en­vi­ron­ment that can be con­figured in a wide va­ri­ety of ways, with the idea that the agents will learn quite gen­eral les­sons—at least rel­a­tive to their XLand en­vi­ron­ment, which seems much more gen­eral than the board games. The train­ing pro­cess is very in­volved—like with AlphaX, there are mul­ti­ple gen­er­a­tions, com­bined with clever ways of judg­ing how hard a task is (so that agents are pre­sented with hard-but-not-im­pos­si­ble things to learn from), and eval­u­a­tion based on dom­i­nance rather than av­er­age scores. See also the dis­cus­sion here. #Capabilities

Welbl et al.‘s Challenges in De­tox­ify­ing Lan­guage Models tests and dis­cusses var­i­ous is­sues with au­to­mated ‘tox­i­c­ity’ (rude­ness/​poli­ti­cal cor­rect­ness) filters for lan­guage mod­els. Un­sur­pris­ingly these filters can gen­er­ate a lot of false pos­i­tives, and de­grade the qual­ity of the re­sponses on other axis. I think this pa­per is a good illus­tra­tion of the prob­lems with ‘Eth­i­cal Con­sid­er­a­tions’ sec­tions: while they list a num­ber of is­sues, the fact that their ap­proach by de­sign re­quires the sup­pres­sion of en­tire classes of true and im­por­tant state­ments is not men­tioned. #NearTerm

Gabriel’s Towards a The­ory of Jus­tice for Ar­tifi­cial In­tel­li­gence ar­gues it is not im­pos­si­ble to ap­ply con­sid­er­a­tions of jus­tice to AI, and then ap­plies Rawlsi­anism to the is­sue. Un­for­tu­nately I don’t find literal read­ing of Rawls very con­vinc­ing (highly ar­bi­trary col­lec­tion of liber­ties and prin­ci­ples that seem like Rawls was just try­ing to work back­wards from his con­clu­sion, and the im­plau­si­bly high level of risk aver­sion re­quired to sup­port max­imin). #Ethics

Re­searchers from Deep­mind were also named on the fol­low­ing pa­pers:

Finances

Be­ing part of Google, I think it would be difficult for in­di­vi­d­ual donors to di­rectly sup­port their work. How­ever it could be an ex­cel­lent place to ap­ply to work.

Anthropic

An­thropic is a San Fran­sisco based for-profit AI Startup or­gani­sa­tion founded in 2021 by Dario Amodei & Daniela Amodei. They are a highly safety al­igned firm founded by peo­ple who left the OpenAI safety team in 2020. Their web­site is here.

Research

Their first pub­li­ca­tion falls out­side the time frame for this doc­u­ment by one day, and hence will go in next year’s re­view.

Finances

As a well-funded for-profit startup I would not ex­pect them to need or want dona­tions, but they could be a good place to work.

ARC: Align­ment Re­search Center

ARC is a Berkeley based in­de­pen­dent AI Safety Re­search or­gani­sa­tion founded in 2021 by Paul Chris­ti­ano. They work on Paul’s agenda of try­ing to de­velop sys­tems for scal­ing hu­man over­sight to al­low for (com­mer­cially com­pet­i­tive) well con­trol­led sys­tems. Their re­search can be found here.

Research

You can read about their work on al­ign­ing hu­man and AI on­tolo­gies here, and dis­cus­sion here.

Chris­ti­ano’s Teach­ing ML to an­swer ques­tions hon­estly in­stead of pre­dict­ing hu­man an­swers pre­sents a pos­si­ble ap­proach to the prob­lem de­scribed in Teach­ing ML to an­swer ques­tions hon­estly in­stead of pre­dict­ing hu­man an­swers. Essen­tially he is at­tempt­ing to bias our train­ing al­gorithm away from the ‘copy hu­man ex­pla­na­tion’ and to­wards the ‘give true ex­pla­na­tion’ ap­proach in a va­ri­ety of ways, in­clud­ing se­quen­tial train­ing and pro­duc­ing a mini train­ing set of ground ex­tra-truth. I must ad­mit I don’t ex­actly un­der­stand Step 2. See also the dis­cus­sion here. #Interpretability

Chris­ti­ano’s Another (outer) al­ign­ment failure story de­scribes a pos­si­ble fu­ture mis­al­ign­ment sce­nario, where AIs be­come more and more in­fluen­tial, but we un­der­stand them less and less. We defer more and more of our de­ci­sion-mak­ing to them, and things gen­er­ally get bet­ter, though peo­ple worry about the loss of con­trol. Even­tu­ally we see a treach­er­ous turn and the AIs sud­denly turn off all the cam­eras on us, though I don’t ex­actly un­der­stand how this step fits with the rest of the story. #Scenarios

Chris­ti­ano’s A naive al­ign­ment strat­egy and op­ti­mism about gen­er­al­iza­tion is a sim­ple post de­scribing a prob­lem with a ‘naïve’ strat­egy of mak­ing AIs ‘ex­plain’ what they are do­ing. The con­cern is that rather than learn­ing to give the true ex­pla­na­tion for their ac­tions, they will in­stead learn how to give per­sua­sive ac­counts. #Interpretability

ARC Re­searchers also con­tributed to the fol­low­ing pa­pers lead by other or­gani­sa­tions:

· Yud­kowsky’s Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds”

Finances

They are not look­ing for dona­tions at this time; how­ever they are hiring.

Red­wood Research

Red­wood is a Berkeley based in­de­pen­dent AI Safety Re­search or­gani­sa­tion that started do­ing pub­lic AI al­ign­ment re­search in 2021, founded by Nate Thomas, Bill Zito, and Buck Sh­legeris. They aim to do highly prac­ti­cal safety work—tak­ing the­o­ret­i­cal safety in­sights from their own work and from other or­gani­sa­tions (e.g. ARC) and prov­ing it out in prac­ti­cal ML sys­tems to ease adop­tion by non-al­ign­ment-fo­cused AI teams.

The team mem­bers I know are pretty tal­ented.

They have an ex­tended and very in­for­ma­tive AMA here.

Research

Sh­legeris’s Red­wood Re­search’s cur­rent pro­ject pro­vides an overview of Red­wood’s first re­search pro­ject (in progress). They are try­ing to ‘hand­i­cap’ GPT-3 to only pro­duce non-vi­o­lent com­ple­tions; the idea is that there are many rea­sons we might ul­ti­mately want to ap­ply some over­sight func­tion to an AI model, like “don’t be de­ceit­ful”, and if we want to get AI teams to ap­ply this we need to be able to in­cor­po­rate these over­sight pred­i­cates into the origi­nal model in an effi­cient man­ner. #Obstruction

Sh­legeris’s The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes pro­vides a dis­am­bigua­tion be­tween a cou­ple of differ­ent AI sce­nar­ios and the types of al­ign­ment prob­lems and solu­tions that would be rele­vant. #Overview

Finances

Red­wood ap­par­ently has am­ple fund­ing at the pre­sent time (They re­cently sug­gested that they didn’t ex­pect to be able to pro­duce a lot more out­put with more fund­ing) and hence is not cur­rently look­ing for dona­tions from the gen­eral EA pub­lic.

Ought

Ought is a San Fran­cisco based in­de­pen­dent AI Safety Re­search or­gani­sa­tion founded in 2018 by An­dreas Stuh­lmüller and run by An­dreas and Jung­won Byun. They re­search meth­ods of break­ing up com­plex, hard-to-ver­ify tasks into sim­ple, easy-to-ver­ify tasks—to ul­ti­mately al­low us effec­tive over­sight over AIs. This in­cludes build­ing com­puter sys­tems and re­cruit­ing test sub­jects. Ap­par­ently one of the best places to find their re­search is the mailing list here.

In the past they worked on fac­tored gen­er­a­tion – try­ing to break down ques­tions into con­text-free chunks so that dis­tributed teams could pro­duce the an­swer (Chris­ti­ano style) – and then fac­tored eval­u­a­tion – us­ing similar dis­tributed ideas to try to eval­u­ate ex­ist­ing an­swers, which seemed a sig­nifi­cantly eas­ier task (by anal­ogy to P<=NP).

They are now work­ing on a sys­tem called Elicit, an au­to­mated re­search as­sis­tant, which uses lan­guage mod­el­ling to do things like try to pro­pose new re­search di­rec­tions and liter­a­ture re­view.

James Brady will start as Head of Eng­ineer­ing in Jan­uary 2022.

Research

Alex et al.’s RAFT: A Real-World Few-Shot Text Clas­sifi­ca­tion Bench­mark pro­vides a bench­mark of real-world tasks from the Elicit com­mu­nity, like clas­sify­ing Neu­roIPS ethics state­ments, for few-shot learn­ing, and tests var­i­ous mod­els on them. The bench­mark aims to mea­sure how far cur­rent mod­els are from au­tomat­ing eco­nom­i­cally valuable work. Re­searchers from Gov.AI were also named au­thors on the pa­per. #Capabilities

Finances

They spent $1,200,000 in 2020 and $1,400,000 in 2021, and plan to spend around $2,000,000 in 2022. They have around $3,800,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.9 years of run­way.

If you want to donate you can do so here.

AI Impacts

AI Im­pacts is a Berkeley based AI Strat­egy or­gani­sa­tion founded in 2014 by Katja Grace and Paul Chris­ti­ano. They are af­fili­ated with (a pro­ject of, with in­de­pen­dent fi­nanc­ing from) MIRI. They do var­i­ous pieces of strate­gic back­ground work, es­pe­cially on AI Timelines—it seems their pre­vi­ous work on the rel­a­tive rar­ity of dis­con­tin­u­ous progress has been rel­a­tively in­fluen­tial. A lot of their work is in the form of a pri­vate wiki col­lect­ing po­ten­tially use­ful back­ground in­for­ma­tion. Their re­search can be found here. You can see a de­scrip­tion of the ques­tions they work on here.

For most of this year they have been down to 1-2 peo­ple, but are plan­ning on hiring back up in 2022.

Research

They have pro­duced a se­ries of pieces on how long it has his­tor­i­cally taken for AIs to cover the hu­man range (from be­gin­ner to ex­pert to su­per­hu­man) for differ­ent tasks. This seems rele­vant be­cause peo­ple only seem to re­ally pay at­ten­tion to AI progress in a field when it starts beat­ing hu­mans. Th­ese pieces in­clude Star­craft, ImageNet, Go, Chess and Draughts.

Grace’s Beyond fire alarms: free­ing the group­struck is a de­tailed re­sponse to Eliezer’s clas­sic post. She ar­gues, con­tra Eliezer, that the main pur­pose of fire alarms is not to cre­ate com­mon knowl­edge and over­come awk­ward­ness; they also do nor­mal things like provide ev­i­dence about the ex­is­tence of fires, and make stay­ing in­side un­pleas­ant. I thought this was per­sua­sive, but also that Eliezer’s main con­clu­sion still held: even if he didn’t un­der­stand fire alarms (in­deed, prior to read­ing this post I didn’t re­al­ise that I didn’t un­der­stand fire alarms) it is still true and bad that there is no fire alarm, and it is worth mak­ing peo­ple aware of this. See also the dis­cus­sion here. #Forecasting

The AI Vignettes Pro­ject was a se­ries of ex­er­cises where peo­ple wrote short ‘sto­ries’ for how AI de­vel­op­ment might un­fold. Others then cri­tiqued them to try to im­prove their plau­si­bil­ity. See also here. #Forecasting

Fer­nan­dez’s How en­ergy effi­cient are hu­man-en­g­ineered flight de­signs rel­a­tive to nat­u­ral ones? finds that an­i­mal flight is sig­nifi­cantly more en­ergy-effi­cient than hu­man flight. #Forecasting

Grace’s Ar­gu­ment for AI x-risk from large im­pacts lays out this ar­gu­ment for the im­por­tance of AGI safety and some re­sponses. #Forecasting

Grace’s Co­her­ence ar­gu­ments im­ply a force for goal-di­rected be­hav­ior ar­gues that co­her­ence ar­gu­ments do in­deed show that agents which start out be­ing weakly goal seek­ing will end up be­ing strongly goal seek­ing. See also the dis­cus­sion here. #AgentFoundations

Finances

They spent $280,000 in 2020 and $240,000 in 2021, and plan to spend around $650,000 in 2022 (twice 2019 peak of $316,000). They have around $340,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 0.5 years of run­way. In the past they have re­ceived sup­port from EA or­gani­sa­tions like OpenPhil and FHI.

MIRI ad­ministers their fi­nances on their be­half; dona­tions can be made here.

GPI: The Global Pri­ori­ties Institute

GPI is an Oxford-based Aca­demic Pri­ori­ties Re­search or­gani­sa­tion founded in 2018 by Hilary Greaves and part of Oxford Univer­sity. They do work on is­sues in philos­o­phy and eco­nomics likely to be very im­por­tant for global pri­ori­ti­sa­tion, much of which is, in my opinion, rele­vant to AI Align­ment work. Their re­search can be found here.

They re­cently took on two new philos­o­phy post­docs (Hay­den Wilk­in­son and Adam Bales) and will be joined by Ti­mothy William soon; they didn’t dis­close any de­par­tures.

Research

I cover only the more AI Xrisk rele­vant pa­pers; no­tably we do not in­clude the tem­po­ral dis­count­ing pa­pers, even though they are rele­vant.

Mo­gensen’s Do not go gen­tle: why the Asym­me­try does not sup­port anti-na­tal­ism ar­gues that even if you have the view that it is bad to cre­ate un­happy peo­ple but not good to cre­ate happy ones, it still doesn’t fol­low that it would be good for hu­man­ity to go ex­tinct. This is be­cause in or­der to avoid the well known tran­si­tivity prob­lem with the Asym­me­try, you should adopt an in­com­men­su­ra­bil­ity prin­ci­ple, which in turn means that com­bin­ing neu­tral and bad things can make them neu­tral over all. This is pretty coun­ter­in­tu­itive, but I think this is ba­si­cally just be­cause the Asym­me­try is coun­ter­in­tu­itive to start with. #Ethics

Greaves & MacAskill’s The case for strong longter­mism ar­gues that, for our most im­por­tant de­ci­sions, they can­not be the best with­out be­ing the best for the long term fu­ture. This is ac­tu­ally some­what weaker than pre­vi­ous dis­cus­sions, be­cause the scope is only for the most im­por­tant de­ci­sions (ca­reer and dona­tion), and hence does not cover much ‘ev­ery­day’ be­havi­our. It is per­sua­sive and dili­gent; I’d ex­pect most read­ers here to already agree with the con­clu­sions. #Ethics

Thorstad’s The scope of longter­mism dis­cusses how many types of de­ci­sions Strong Longter­mism is true for. He is pretty scep­ti­cal—due to knowl­edge prob­lems plus ‘wash­ing out’ he ar­gues that while Strong Longter­mism ap­plies for a small num­ber of Xrisk re­lated ques­tions, for vir­tu­ally any other ques­tion (e.g. malara fund­ing) it fails to hold. #Ethics

Thomas’s Si­mu­la­tion Ex­pec­ta­tion pre­sents a re­fine­ment to the Bostrom’s simu­la­tion ar­gu­ment, by mov­ing from ‘there are a lot of peo­ple in Sims’ to ‘there are a lot of peo­ple like me in sims’. #Forecasting

Re­searchers from GPI were also named on the fol­low­ing pa­pers:

Finances

They spent £850,000 in 2019/​2020 (aca­demic year) and £1,000,000 in 202021, be­low their plan of £1,400,000, and in­tend to spend around £1,800,000 in 2021/​2022. They sug­gested that as part of Oxford Univer­sity ‘cash on hand’ or ‘run­way’ were not re­ally mean­ingful con­cepts for them, as they need to fully-fund all em­ploy­ees for mul­ti­ple years.

If you want to donate to GPI, you can do so here.

CLR: The Cen­ter on Long Term Risk

CLR is a Lon­don (pre­vi­ously Ger­many) based Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded in 2013 and lead by Ste­fan Torges and Jesse Clif­ton. Un­til last year they were known as FRI (Foun­da­tional Re­search In­sti­tute) and were part of the Effec­tive Altru­ism Foun­da­tion (EAF). They do re­search on a num­ber of fun­da­men­tal long-term is­sues, with AI as one of their top fo­cus ar­eas.

In gen­eral they adopt what they re­fer to as ‘suffer­ing-fo­cused’ ethics, which I think is a quite mis­guided view, albeit one they seem to ap­proach thought­fully. A lot of their work is about avoid­ing con­flict be­tween differ­ent agents.

Research

Oester­held & Conitzer’s Safe Pareto Im­prove­ments for Del­e­gated Game Play­ing pre­sents an ap­proach to del­e­gated game play­ing where each agent ‘re-shapes’ the in­cen­tives for their del­e­gates. Ba­si­cally each prin­ci­ple can ex­clude op­tions and give their agent a differ­ent util­ity func­tion in an at­tempt to push them to­wards not-less-globally-op­ti­mal play. It seems to as­sume a high de­gree of com­pe­tence on be­half of the prin­ci­ples though, at which point do they re­ally need agents? #GameTheory

Stastny et al.’s Multi-agent learn­ing in mixed-mo­tive co­or­di­na­tion prob­lems dis­cusses games where agents have differ­ent prefer­ences and there is no sin­gle co­op­er­a­tive equil­ibrium. Pre­sum­ably this is a situ­a­tion that seems likely to oc­cur if we have mul­ti­ple AGIs. They show that var­i­ous ex­ist­ing tech­niques strug­gle in this set­ting. #GameTheory

Clif­ton’s Col­lab­o­ra­tive game speci­fi­ca­tion: ar­riv­ing at com­mon mod­els in bar­gain­ing works on the is­sue of agents ar­riv­ing at con­flict ‘un­nec­es­sar­ily’ due to differ­ing world mod­els. The op­tion it sug­gests is for the agents to try to share their mod­els ahead of time, even though they don’t trust each other, and use this re­sult­ing com­mon model to op­ti­mise from. #GameTheory

Clif­ton’s Weak iden­ti­fi­a­bil­ity and its con­se­quences in strate­gic set­tings dis­cusses the po­ten­tial for un­in­tended con­flict in ul­ti­ma­tum games (and the like) due to un­cer­tainty about the other agent’s strat­egy. In the same way that a value learn­ing agent can strug­gle to dis­am­biguate prefer­ences and be­liefs of the hu­mans it is study­ing, similarly it can be hard to dis­t­in­guish re­solve for agents. #GameTheory

Koko­ta­jlo’s Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain (part of a se­quence) ar­gues that, be­cause evolu­tion tends to pro­duce com­pli­cated and messy de­signs, the fact that we are a long way from brain-level perfor­mance in some as­pects doesn’t nec­es­sar­ily mean we’re a long way away from TAI. #Forecasting

Re­searchers from CLR were also named on the fol­low­ing pa­pers:

Finances

They have a col­lab­o­ra­tion with the Swiss-based Cen­ter for Emerg­ing Risk Re­search, who fund part of their costs.

If you wanted to donate to CLR, you could do so here.

CSET: The Cen­ter for Se­cu­rity and Emerg­ing Technology

CSET is a Wash­ing­ton based Think Tank founded in 2019 by Ja­son Ma­theny (ex IARPA), af­fili­ated with the Univer­sity of Ge­orge­town. They analyse new tech­nolo­gies for their se­cu­rity im­pli­ca­tions and provide ad­vice to the US gov­ern­ment. At the mo­ment they are mainly fo­cused on near-term AI is­sues. Their re­search can be found here.

They seem to have good con­nec­tions to the US gov­ern­ment, es­pe­cially the Demo­cratic Party, who are cur­rently in power; their cofounder Ja­son Ma­theny left to take up mul­ti­ple se­nior roles in the Ad­minis­tra­tion.

Most of the peo­ple they hire seem to be poli­tics peo­ple, not EA peo­ple.

Nuño Sem­pere’s eval­u­a­tion of their work is available here.

Research

Arnold & Toner’s AI Ac­ci­dents: An Emerg­ing Threat in­tro­duces some of the ideas around AI safety for poli­cy­mak­ers. This in­cludes ro­bust­ness, speci­fi­ca­tion prob­lems and over­sight. It ba­si­cally tries to link them to near-term threats. #Policy

Buchanan et al.’s How Lan­guage Models Could Change Dis­in­for­ma­tion in­ves­ti­gates the po­ten­tial for us­ing GPT-3 for dis­in­for­ma­tion/​pro­pa­ganda cam­paigns. They run a se­ries of ex­per­i­ments to gen­er­ate plau­si­ble tweets to push nar­ra­tives, add par­ti­san slants to ar­ti­cles, and so on. The pa­per is ob­vi­ously writ­ten with a left-wing au­di­ence in mind, which makes sense given they are at­tempt­ing to in­fluence the cur­rent US ad­minis­tra­tion. #Policy

Feda­siuk et al.’s Har­nessed Light­ning dis­cusses the ways in which the Chi­nese PLA is us­ing AI. #Forecasting

Mur­phy’s Trans­la­tion: Eth­i­cal Norms for New Gen­er­a­tion Ar­tifi­cial In­tel­li­gence Re­leased is a trans­la­tion CSET did of a chi­nese policy doc­u­ment on ethics in AI. Given the im­por­tance of China, and how few peo­ple speak Chi­nese, I think this is a pretty use­ful gen­eral ac­tivity, but it’s hard to un­der­stand the sig­nifi­cance of the doc­u­ment by it­self; most of it is con­cerned with fairly high level eth­i­cal goals. #Translation

Mur­phy’s Trans­la­tion: White Paper on Trust­wor­thy Ar­tifi­cial In­tel­li­gence is a trans­la­tion CSET did of a chi­nese policy doc­u­ment on trust­wor­thi­ness in AI. Un­for­tu­nately it doesn’t seem to have a lot of dis­cus­sion of Xrisk. #Translation

Baker’s Ethics and Ar­tifi­cial In­tel­li­gence: A Poli­cy­maker’s In­tro­duc­tion pro­vides an overview of the ways gov­ern­ments could en­courage the use of eth­i­cal stan­dards, IRBs etc. for AI. It’s mainly fo­cused on near-term AI is­sues. #Policy

Mit­tel­steadt’s Mechanisms to En­sure AI Arms Con­trol Com­pli­ance de­scribes var­i­ous tech­ni­cal meth­ods gov­ern­ments could use to en­sure com­pli­ance with reg­u­la­tions on the use of AI. One of the sug­ges­tions is van eck phreak­ing, which re­mains very cool, but doesn’t seem very prac­ti­cal. #Policy

Rud­ner & Toner’s Key Con­cepts in AI Safety: An Overview is a very ba­sic in­tro­duc­tion to some of the is­sues in AI safety, not xrisk-fo­cused but with readthrough, for poli­cy­mak­ers. #Overview

Zwet­sloot et al.’s The Im­mi­gra­tion Prefer­ences of Top AI Re­searchers: New Sur­vey Ev­i­dence asked re­searchers who had pub­lished in Neu­roIPS etc. about their mi­gra­tion plans. As a first ap­prox­i­ma­tion, peo­ple want to live in the US (or UK) but have le­gal prob­lems; other coun­tries like China (and France) only re­ally at­tract their own peo­ple back. Re­searchers from Gov.AI also ap­peared as au­thors on the pa­per. #Policy

Im­brie et al.‘s Eval­u­at­ing Rhetor­i­cal Dy­nam­ics in AI analy­ses the fre­quency of four differ­ent fram­ings for AI fu­tures in me­dia ar­ti­cles. They find that the ‘Killer Robots’ fram­ing peaked in 2015 (Musk/​OpenAI?) and has de­creased sig­nifi­cantly since then. #Strategy

Aiken’s Clas­sify­ing AI Sys­tems sug­gests some sim­plified clas­sifi­ca­tion schemas to make it eas­ier for lay­men to clas­sify AI sys­tems based on e.g. their au­ton­omy and in­puts. #Overview

Crawford & Wulkan’s Fed­eral Prize Com­pe­ti­tions dis­cusses us­ing prices to in­cen­tivise AI de­vel­op­ment. #Policy

Rud­ner & Toner’s Key Con­cepts in AI Safety: Ro­bust­ness and Ad­ver­sar­ial Ex­am­ples is a very ba­sic in­tro­duc­tion to (non-AGI) ad­ver­sa­ial ex­am­ples. #Overview

Rud­ner & Toner’s Key Con­cepts in AI Safety: In­ter­pretabil­ity in Ma­chine Learn­ing is a very ba­sic in­tro­duc­tion to (non-AGI) model ex­plan­abil­ity. #Overview

Finances

As they raised $50m from OpenPhil (a,b,c) this year, and have had similar suc­cesses in the past, I am as­sum­ing they do not need more dona­tions at this time.

AI Safety camp

AISC is a globally based res­i­den­tial re­search camp or­gani­sa­tion founded in 2018 by Linda Linse­fors and cur­rently lead by Rem­melt Ellen. They are af­fili­ated with AI Safety Sup­port. They bring to­gether peo­ple who want to start do­ing tech­ni­cal AI re­search, host­ing a 10-day camp aiming to pro­duce pub­lish­able re­search. Their re­search can be found here. Their an­nual sum­mary can be found here.

To the ex­tent they can provide an on-ramp to get more tech­ni­cally profi­cient re­searchers into the field I think this is po­ten­tially very valuable. But I haven’t per­son­ally ex­pe­rienced the camps, and though I spoke to two peo­ple who found them valuable and seem good, these peo­ple were not ran­domly se­lected.

In the past each camp was run by differ­ent vol­un­teers; they are in the pro­cess of tran­si­tion­ing to more con­sis­tent (and hence ex­pe­rienced) lead­ers.

Research

Koch et al.’s Ob­jec­tive Ro­bust­ness in Deep Re­in­force­ment Learn­ing pro­vides a se­ries of toy ex­am­ples demon­strat­ing ob­jec­tive ro­bust­ness failure. In each case the agent’s ca­pa­bil­ities are ro­bust, so it can still nav­i­gate the en­vi­ron­ment, but it has failed to learn the ob­jec­tive prop­erly. See also the dis­cus­sion here. Re­searchers from CLR were also named au­thors on the pa­per. #Robustness

Finances

They spent $11,162 in 2020 and $29,665 in 2021, and plan to spend around $153,400 in 2022. They have around $236,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.5 years of run­way.

If you want to donate, the web page is here.

FLI: The Fu­ture of Life Institute

FLI is a Bos­ton-based in­de­pen­dent ex­is­ten­tial risk or­ga­ni­za­tion, fo­cus­ing on out­reach, founded in large part to help or­ganise the re­grant­ing of $10m from Elon Musk. They cover nu­clear, biolog­i­cal and AI risks; one of their ma­jor pro­jects is try­ing to ban Lethal Au­tonomous Weapons.

They had a sub­mis­sion on the EU AI act here.

Read­ers might be in­ter­ested in their pod­casts like here, here here.

FLI re­ceived a large grant - $25m at time of dona­tion, but crypto has ral­lied since then so prob­a­bly more – from Vi­talik Bu­terin, which they are us­ing to fund a new grant pro­gram (similar to the pre­vi­ous ones funded by Elon Musk). Th­ese grants will fund both pro­jects (the Shiba Inu Grants) and tal­ent de­vel­op­ment like school pro­grams or post­docs (the Vi­talik Bu­terin Fel­low­ships).

Nuño Sem­pere’s eval­u­a­tion of FLI is available here.

Light­cone Infrastructure

Light­cone In­fras­truc­ture is a Berkeley based in­de­pen­dent Meta Longter­mist or­gani­sa­tion founded in 2021 by Oliver Habryka. They provide a range of in­fras­truc­ture and sup­port to the Longter­mist move­ment, most promi­nently the LessWrong web­site, but also the Light­cone office, work­shops and re­treats etc. Their slightly-out-of-date in­tro can be found here.

In gen­eral I have been pretty im­pressed with the team’s se­ri­ous­ness and strate­gic sense. Nuño Sem­pere’s eval­u­a­tion of LessWrong is available here. Zvi’s views on Light­cone here.

Finances

They spent $500,000 in 2020 and $1,300,000 in 2021, and plan to spend around $2,000,000 in 2022. They have around $1,900,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1 years of run­way.

The in­crease in bud­get is partly driven by their de­ci­sion to pay nearly (-30%) mar­ket salaries for tal­ent; to my knowl­edge they are ba­si­cally the first EA org to do so. Ob­vi­ously this in­creases their cost base a lot, but I think maybe I sup­port it – pay­ing a lot less is ba­si­cally like manda­tory dona­tions to your em­ployer, which seems in­effi­cient.

CLTR: Cen­ter for Long Term Re­silience (formerly Alpen­glow)

The CLTR is a Lon­don based in­de­pen­dent policy think tank founded by An­gus Mercer & So­phie Dan­nreuther. They work to con­nect top Xrisk re­searchers and ideas to the UK gov­ern­ment. My im­pres­sion is they are un­usu­ally skilful at this. Their web­site is here.

Research

CLTR re­searchers con­tributed to the fol­low­ing re­search led by other or­gani­sa­tions:

Finances

If you wanted to donate you could do so here.

Re­think Priorities

Re­think Pri­ori­ties is an in­ter­na­tion­ally based in­de­pen­dent EA re­search con­sul­tancy or­gani­sa­tion founded in 2018 by Peter Wilde­ford & Mar­cus A Davis. They provide re­search on im­por­tant EA is­sues for other EA or­gani­sa­tions, and the broader move­ment. Their an­nual sum­mary can be found here. You can read their re­search here.

Research

A lot of their work is con­tract work, where they are hired by an­other EA or­gani­sa­tion to re­search spe­cific top­ics, as de­scribed here. While his­tor­i­cally their work has fo­cused on other is­sues, they are cur­rently ramp­ing up their Longter­mism work, which in­cludes a sig­nifi­cant AI gov­er­nance com­po­nent. Given the con­tract na­ture of much of their work, they will have a fair bit of non-pub­lic out­put, which ob­vi­ously makes ex­ter­nal eval­u­a­tion a bit harder, though pre­sum­ably this work is sup­ported by or­gani­sa­tions pay­ing for that spe­cific work any­way.

They haven’t pub­lished a lot on AI yet, but I have of­ten been im­pressed with their work on other sub­jects in the past, and their de­scrip­tion of planned pro­jects (pri­vately shared) seems sen­si­ble.

Finances

They spent $883,000 in 2020 and $2,100,000 in 2021, of which around $329,000 was for Longter­mism, and ten­ta­tively plan to spend $1.5 - $4m in 2022 on Longter­mism. (Note that these figures are higher than what they in­di­cated in the bud­get sec­tion of their strat­egy post pub­lished in Novem­ber 2021.) They have around $5,480,000 in cash and pledged fund­ing, of which around $400,000 is ear­marked for Longter­mism. They sug­gested they had around 16 months of run­way (be­cause re­stricted funds can­not be used to run op­er­a­tions).

If you wanted to donate you could do so here.

Convergence

Con­ver­gence is a globally based in­de­pen­dent Ex­is­ten­tial Risk Re­search or­gani­sa­tion founded (in­cor­po­rated and first grant) in 2018 by Justin Shov­e­lain and David Kristoffers­son. They do strate­gic re­search on x-risk re­duc­tion de­ci­sion mak­ing. Their re­search can be found here.

They plan to hire sev­eral more peo­ple in 2022.

In 2021 they ad­vised Lion­heart Ven­tures on in­vest­ing in AGI-re­lated com­pa­nies eth­i­cally, in­clud­ing eval­u­at­ing 4 such firms.

Research

No rele­vant pub­lic re­search for 2021.

Finances

They spent $14,000 in 2020 and $10,000 in 2021, and plan to spend around $100,000-300,000 in 2022.

They re­cently re­ceived ‘sub­stan­tial’ fund­ing, and hence are not ac­tively seek­ing dona­tions at the mo­ment, though if you wanted to donate any­way you could donate here.

SERI: The Stan­ford Ex­is­ten­tial Risk Initiative

SERI is a Stan­ford based stu­dent-fac­ulty col­lab­o­ra­tion work­ing on ex­is­ten­tial risk is­sues, founded in 2020; their web­site is here.

Research

GAA’s Nu­clear Es­pi­onage and AI Gover­nance pro­vides an overview of the im­pact of com­mu­nist spies on the Man­hat­tan pro­ject, and some po­ten­tial les­sons for AI safety. It sug­gests that spy­ing is more im­por­tant if the scal­ing hy­poth­e­sis is false and if AI pro­jects are na­tion­al­ised (as then na­tion­al­ism could be a mo­ti­va­tor, and groups might need to steal hard­ware rather if they can’t buy it). It seems that gen­er­ally spy­ing is bad, but he does note that se­crecy tends to beget se­crecy, and could be hard to com­bine with in­ter­pretabil­ity, which might be im­por­tant for al­ign­ment. See also the dis­cus­sion here. #Strategy

Other Research

I would like to em­pha­size that there is a lot of re­search I didn’t have time to re­view, es­pe­cially in this sec­tion, as I fo­cused on read­ing or­gani­sa­tion-dona­tion-rele­vant pieces. In par­tic­u­lar there is a lot of good work on the Align­ment Fo­rum. So please do not con­sider it an in­sult that your work was over­looked!

Filan’s AXRP—the AI X-risk Re­search Pod­cast is a new pod­cast ded­i­cated to dis­cussing AI safety work. #Overview

lifelon­glearner and Hase’s Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers is a ridicu­lously com­pre­hen­sive overview of the work that has been done on mak­ing ML sys­tems hu­man-com­pre­hen­si­ble over the last few years. I am go­ing to have to ad­mit I didn’t read it all. #Interpretability

Turner’s Satis­ficers Tend To Seek Power: In­stru­men­tal Con­ver­gence Via Re­tar­getabil­ity ar­gues that a wide range of poli­cies, not just op­ti­misers, are mo­ti­vated to seek to con­trol their en­vi­ron­ment, Omo­hun­dro-style. This is bad news in­so­much as it pre­sents a prob­lem with var­i­ous at­tempts to make AI ‘un­am­bi­tious’ and hence safe. #AgentFoundations

Went­worth’s Utility Max­i­miza­tion = De­scrip­tion Length Min­i­miza­tion shows that util­ity max­i­miz­ers can be mod­el­led as at­tempt­ing to make the world sim­pler, ac­cord­ing to a model of the world which as­signs prob­a­bil­ity in ac­cor­dance to util­ity. The maths is not com­pli­cated and once I read it the idea was ob­vi­ous. Un­for­tu­nately it is now im­pos­si­ble for my to tell if it was ob­vi­ous prior to read­ing—prob­a­bly not! #AgentFoundations

Jiang et al.‘s Delphi: Towards Ma­chine Ethics and Norms is trans­former model trained on var­i­ous eth­i­cal judge­ment datasets with a fun web fron­tend. They use five datasets, in­clud­ing Hendrycks et al.‘s Align­ing AI with Shared Hu­man Values we dis­cussed last year. There’s been a lot of crit­i­cism of the model for pro­duc­ing ab­surd re­sults (e.g. here) but it gave good re­sponses to all but one of the ~30 prompts I gave it, in­clud­ing some at­tempted tricky ones; the only ex­cep­tions were I sus­pect side effects of their ‘fix’ for its prior poli­ti­cal in­cor­rect­ness. #ValueLearning

Went­worth’s How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency de­scribes in a quite prac­ti­cal way John’s ex­pe­riences and ad­vice for do­ing use­ful AI work out­side of a re­search org. Given re­cent dra­matic in­crease in the vi­a­bil­ity of this as a ca­reer (largely due to the LTFF) I thought this was a good post, for mak­ing more peo­ple aware of this pos­si­bil­ity if noth­ing else. #Overview

Cihon et al.‘s Cor­po­rate Gover­nance of Ar­tifi­cial In­tel­li­gence in the Public In­ter­est ex­pands on Belfield’s work last year to show a very wide va­ri­ety of ways in which AI cor­po­ra­tions can be in­fluenced, col­lab­o­ra­tively and ad­ver­sar­i­ally, to change their be­havi­our. My main con­cern is it is not clear how to make sure these struc­tures ac­tu­ally do use­ful work, as op­posed to filling up with grifters and ide­ologues; they list some past ‘suc­cess sto­ries’, but it is not clear to me that many of these in­stances of in­fluenc­ing cor­po­rate be­havi­our ac­tu­ally had a *pos­i­tive* in­fluence. (Also, more mun­danely, they mi­s­un­der­stood why share­hold­ers have more in­fluence than bond­hold­ers: be­cause share­hold­ers are the resi­d­ual claimant on cash­flows). Re­searchers from GCRI, Le­gal Pri­ori­ties Pro­ject were also named au­thors on the pa­per. #Policy

Cihon et al.’s AI Cer­tifi­ca­tion: Ad­vanc­ing Eth­i­cal Prac­tice by Re­duc­ing In­for­ma­tion Asym­me­tries sur­veys the ex­ist­ing land­scape for AI eth­i­cal cer­tifi­ca­tions (ba­si­cally all near-term) and dis­cusses the po­ten­tial for Xrisk rele­vant cer­tifi­ca­tion. Re­searchers from GCRI,Le­gal Pri­ori­ties Pro­ject were also named au­thors on the pa­per. #Policy

Prunkl et al.’s In­sti­tu­tion­al­iz­ing ethics in AI through broader im­pact re­quire­ments com­pares the Neu­roIPS ethics state­ments to similar things in other fields and con­sid­ers their im­pact. They also in­clude a num­ber of sug­ges­tions for ame­lio­rat­ing their weak­nesses. Re­searchers from Gov.AI were also named au­thors on the pa­per. #Policy

Ashurst et al.’s AI Ethics State­ments: Anal­y­sis and les­sons learnt from NeurIPS Broader Im­pact State­ments pro­vides de­scrip­tive statis­tics around the state­ments, and dis­cusses why they were dis­con­tinued. It seems that in gen­eral they were not very suc­cess­ful at caus­ing re­searchers to pay at­ten­tion to im­por­tant things. Re­searchers from Gov.AI were also named au­thors on the pa­per. #Policy

Davis’s Fea­ture Selec­tion is a very well writ­ten short story about what it feels like on the in­side to be a ML al­gorithm. I don’t want to spoil it, but it does a good job illus­trat­ing var­i­ous re­lated points around e.g. Ro­bust­ness. #Fiction

Ni­con­i­coni’s Whole Brain Emu­la­tion: No Progress on C. el­gans After 10 Years gives an up­date on progress (or lack thereof) in whole brain em­u­la­tion for C. el­gans. It seems that ba­si­cally no-one was mo­ti­vated to fund it so lit­tle progress has been made in the last 10 years. #Forecasting

Guter­res’s Our Com­mon Agenda is a re­port by the UN sec­re­tary gen­eral. It is largely a list of pieties, but men­tions ex­is­ten­tial risks and AI (but not di­rectly AI Xrisk). #Policy

Brown et al.’s Value Align­ment Ver­ifi­ca­tion pro­poses a num­ber of tests to de­ter­mine if an agent is al­igned with a hu­man. Some­what im­plau­si­bly to me they sug­gest this works even in cases where both hu­man and AI are black boxes, so long as they share an on­tol­ogy. Re­searchers from CHAI were also named au­thors on the pa­per. #ValueLearning

Liu & Maas’s ‘Solv­ing for X?’ Towards a prob­lem-find­ing frame­work to ground long-term gov­er­nance strate­gies for ar­tifi­cial in­tel­li­gence ar­gues we should spend less time try­ing to solve AI gov­er­nance prob­lems and more time look­ing for new prob­lems. men­tions sup­ply chains highly vuln­er­a­ble to pan­demics but they ac­tu­ally held up pretty well? Heavy on buzz words. Re­searchers from CSER were also named au­thors on the pa­per. #Strategy

An­drus et al.’s AI Devel­op­ment for the Public In­ter­est: From Ab­strac­tion Traps to So­ciotech­ni­cal Risks makes some com­ments about the re­la­tion­ship be­tween tech­ni­cal and so­cial prob­lems and grad school. Re­searchers from CHAI were also named au­thors on the pa­per. #Strategy

Chatila et al.‘s Trust­wor­thy AI pro­vides some high-level dis­cus­sion of is­sues like in­ter­pretabil­ity and hu­man rights. I was sur­prised by the con­fi­dent as­ser­tion that be­cause ma­chines ‘can only de­cide and act within a bounded set of pos­si­bil­ities’ that they can­not make eth­i­cal de­ci­sions. Re­searchers from CHAI were also named au­thors on the pa­per. #Strategy

Ma­clure & Rus­sell’s AI for Hu­man­ity: The Global Challenges de­scribes some pos­si­ble ap­pli­ca­tions of AI to the Sus­tain­able Devel­op­ment Goals. Re­searchers from CHAI were also named au­thors on the pa­per. #ShortTerm

Cave et al.‘s Us­ing AI eth­i­cally to tackle covid-19 raises var­i­ous stan­dard NearTerm ob­jec­tions against us­ing AI to com­bat covid. I found this pretty un­con­vinc­ing; none of the ‘harms’ it raises seem ma­te­rial com­pared to a the pan­demic. If an ML sys­tem for di­ag­nos­ing covid based on the sound of coughs can save peo­ple’s lives, the fact that not ev­ery­one has a smart­phone doesn’t seem like a good rea­son to ban it. Re­searchers from CSER were also named au­thors on the pa­per. #NearTerm

Klinova & Korinek’s AI and Shared Pros­per­ity recom­mends AI de­vel­op­ers analyse the labour mar­ket im­pacts of their work. I think ask­ing ML en­g­ineers—not even economists! - to at­tempt to micro-man­age the econ­omy in this way is a mis­take, for rea­sons de­scribed by Law­son here. Re­searchers from Gov.AI were also named au­thors on the pa­per. #NearTerm

Cap­i­tal Allo­ca­tors & Other Organisations

One of my goals with this doc­u­ment is to help donors make an in­formed choice be­tween the differ­ent or­gani­sa­tions. How­ever, it is quite pos­si­ble that you re­gard this as too difficult, and wish in­stead to donate to some­one else who will al­lo­cate on your be­half. This is of course much eas­ier; now in­stead of hav­ing to solve the Or­gani­sa­tion Eval­u­a­tion Prob­lem, all you need to do is solve the dra­mat­i­cally sim­pler Or­gani­sa­tion Eval­u­a­tor Or­gani­sa­tion Eval­u­a­tion Prob­lem.

It’s worth not­ing that many of the orgs in this cat­e­gory, be­ing sup­ported by large en­dow­ments, do not re­ally take out­side money.

LTFF: Long-term fu­ture fund

LTFF is a globally based EA grant­mak­ing or­gani­sa­tion founded in 2017, cur­rently lead by Asya Ber­gal and part of EA Funds. They are one of four funds set up by CEA (but now op­er­a­tionally in­de­pen­dent, though they still re­port to the CEA board) to al­low in­di­vi­d­ual donors to benefit from spe­cial­ised cap­i­tal al­lo­ca­tors; this one fo­cuses on long-term fu­ture is­sues, in­clud­ing a large fo­cus on AI Align­ment. Their web­site is here. In 2021 they did a May grant round (writeup, dis­cus­sion). At time of writ­ing there are no pub­lic write-ups for the grants from rest of the year.

Grant ap­pli­ca­tions are now ac­cepted on a rol­ling ba­sis: you can ap­ply at any time of year.

The fund is now run by four peo­ple (plus ad­vi­sors), and the grants have gone to a wide va­ri­ety of causes, many of which would sim­ply not be ac­cessible to in­di­vi­d­ual donors.

The fund man­agers are cur­rently:

● Asya Bergal

● Adam Gleave

● Oliver Habryka

● Evan Hubinger

Evan is new, re­plac­ing He­len Toner, who left, and Matt Wage, who be­came an ad­vi­sor. I know Asya and Habryka rea­son­ably well and think they will make gen­er­ally good grants; Adam and Evan I know less well but seem also good. There has been a fair bit of man­ager turnover, and this will prob­a­bly con­tinue.

Not men­tioned on the web­site there were also sev­eral part-time man­agers in 2021, a prac­tice which seems likely to con­tinue with differ­ent peo­ple:

  • Daniel Eth

  • Ozzie Gooen

  • Luisa Rodriguez

  • [one other non-vot­ing ‘ad­vi­sor’ who re­quested anonymity]

  • (sort of) Jonas Vollmer

The man­agers now have the op­tion to be paid for their work by CEA (on an OpenPhil grant).

In to­tal for 2021, based on my calcu­la­tions, they granted around $4.96m. In gen­eral most of the grants seem at least plau­si­bly valuable to me, and many seemed quite good in­deed. There weren’t any in 2021 that seemed sig­nifi­cantly nega­tive. I es­ti­mate that 66% of the dol­lars went to AI-rele­vant ac­tivi­ties (in­clud­ing par­tial credit for some things), and 85% were to grants I would have made (again in­clud­ing par­tial credit).

I at­tempted to clas­sify the recom­mended by type. Note that ‘train­ing’ means pay­ing an in­di­vi­d­ual to self-study. One type of fund­ing I’m not re­ally sure how to clas­sify is in­come sup­port, where a re­searcher already has a stipend, but the LTFF thinks they could be more effec­tive if they didn’t have to worry so much about (in the grand scheme of things) rel­a­tively small amounts of money.

I have de­liber­ately omit­ted the ex­act per­centages be­cause this is an in­for­mal clas­sifi­ca­tion.

Of these cat­e­gories, I am most ex­cited by the In­di­vi­d­ual Re­search, Event and Plat­form pro­jects. I am gen­er­ally some­what scep­ti­cal of pay­ing peo­ple to ‘level up’ their skills. (Many) in­di­vi­d­ual donors are perfectly ca­pa­ble of eval­u­at­ing large or­gani­sa­tions that pub­li­cly ad­ver­tise for dona­tions. In donat­ing to the LTFF, I think (many) donors are hop­ing to be fund­ing smaller pro­jects that they could not di­rectly ac­cess them­selves. As it is, such donors will prob­a­bly have to con­sider such or­gani­sa­tion al­lo­ca­tions a mild ‘tax’ – to the ex­tent that differ­ent large or­gani­sa­tions are cho­sen then they would have picked them­selves.

Tet­lock et al.’s fore­cast­ing work PhD, which the fund man­agers recom­mended $572,000, was the largest sin­gle grant (around 12% of the 2021 to­tal), fol­lowed by EA Geneva on $310,000, and Amon Elders (PhD), BERI and Kristaps Zil­galvis (PhD), all on $250,000 each.

I was able to view what the LTFF re­garded as its ‘marginal but re­jected’ ap­pli­ca­tions from the May round; in gen­eral there were some de­cent pro­jects there I’d be happy to fund.

In the past all grants had to be ap­proved by CEA be­fore they are made; my un­der­stand­ing is this re­quire­ment is more pro forma now due to the in­creased in­de­pen­dence. I only know of one grant pre­vi­ously ve­toed, and this was widely agreed to have been a bad grant, so los­ing this as­pect of qual­ity con­trol seems like a shame to me.

The EA Funds have pre­vi­ously dis­cussed try­ing to adopt an ac­tive grant­mak­ing ap­proach, where in­stead of just re­ac­tively eval­u­at­ing pro­pos­als they re­ceive they will ac­tively search for good op­por­tu­ni­ties. How­ever this does not seem to have hap­pened to a sig­nifi­cant de­gree yet.

Nuño Sem­pere did an ex­cel­lent ret­ro­spec­tive on out­comes from the LTFF’s 2018-2019 grant rounds here. My in­ter­pre­ta­tion of his eval­u­a­tion was gen­er­ally quite pos­i­tive for the LTFF:

Went­worth’s How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency sug­gests that the LTFF has been cru­cial to en­abling the emer­gence of in­de­pen­dent safety re­searcher as a vi­able oc­cu­pa­tion; this seems like a very ma­jor pos­i­tive for the LTFF.

I am quite con­cerned about the lack of trans­parency the LTFF pro­vides donors. In the past there have been a num­ber of is­sues around this (e.g. in­cor­rect num­bers on the web­site, re­leas­ing or not re­leas­ing in­for­ma­tion and then claiming oth­er­wise), but as a small vol­un­teer-run or­gani­sa­tion I figured these were to be ex­pected. With the shift to­wards pro­fes­sional (paid) man­age­ment, and a stated in­ten­tion to provide bet­ter dis­clo­sure, I ex­pected things to be­come sig­nifi­cantly bet­ter.

How­ever, this has not been the case.

Partly this is the re­sult of de­liber­ate policy change. In 2020 they made an anony­mous grant (roughly 3% of the to­tal), and they have now adopted a policy of al­low­ing peo­ple to ap­ply anony­mously. I un­der­stand why this could be ap­peal­ing for ap­pli­cants, and why the LTFF would want to not ex­clude po­ten­tially good but pub­lic­ity-shy ap­pli­cants. How­ever by do­ing so they un­der­mine the abil­ity of the donor com­mu­nity to provide over­sight, which is definitely a bit con­cern­ing to me.

More con­cern­ing to me how­ever what ap­pears to be a lack of dis­clo­sure due to sim­ple over­sight. At time of writ­ing (2021-12-20) the lat­est grant round men­tioned on the LTFF web­site is April, de­spite ap­par­ently there also hav­ing been two since then – some­thing donors would have no way of know­ing with­out per­son­ally reach­ing out. The ‘Fund Pay­outs’ num­ber, de­spite claiming to be ‘to date’, is around 8 months, 55 grants and ~$3.5m out of date. And de­spite hav­ing had many tem­po­rary Fund Man­agers this year, as far as I can see nowhere on the web­site are these men­tioned.

As a re­sult over­all my im­pres­sion is that donors have much less ac­cu­rate in­for­ma­tion available to them to eval­u­ate the LTFF this year than they did in prior years. While I’m grate­ful to them for per­son­ally shar­ing drafts about their re­cent ac­tivi­ties with me, ideally this would be shared di­rectly with all donors.

If you wish to donate to the LTFF you can do so here.

OpenPhil: The Open Philan­thropy Project

The Open Philan­thropy Pro­ject (sep­a­rated from Givewell in 2017) is an or­gani­sa­tion ded­i­cated to ad­vis­ing Cari and Dustin Moskovitz on how to give away over $15bn to a va­ri­ety of causes, in­clud­ing ex­is­ten­tial risk. They have made ex­ten­sive dona­tions in this area and prob­a­bly rep­re­sent both the largest pool of EA-al­igned cap­i­tal (at least pre-FTX) and the largest team of EA cap­i­tal al­lo­ca­tors.

They de­scribed their strat­egy for AI gov­er­nance, at a very high level, here.

Grants

They have a large and ex­tremely ca­pa­ble grant eval­u­a­tion team, though ar­guably small on a eval­u­a­tor/​dol­lar ba­sis. In gen­eral I think they do a very good job of think­ing strate­gi­cally, analysing in­di­vi­d­ual grants, and giv­ing feed­back to fun­dees for im­prove­ment.

You can see their grants for AI Risk here. It lists 26 AI Risk grants in the last 12 months, plus 2 other highly rele­vant ‘other’ grants. In to­tal I es­ti­mate they spent about $68.5m on AI (giv­ing par­tial credit for re­lated grants).

This was dom­i­nated by two large grants:

  • CSET: $47m

  • CHAI: $12m

This com­pares to $324m for 135 in to­tal grants over the pe­riod, so AI re­lated work was around 21%.

They put out an RFP for var­i­ous Longter­mist out­reach pro­grams here.

The OpenPhil AI Fel­low­ship ba­si­cally fully funds AI PhDs for stu­dents who want to work on the long term im­pacts of AI. Look­ing back at the 2018 and 2019 classes (who pre­sum­ably will have had enough time to do sig­nifi­cant work since re­ceiv­ing the grants), scan­ning the ab­stracts of their pub­li­ca­tions on their web­sites sug­gests that over half have no AI safety rele­vant pub­li­ca­tions in 2019, 2020 and 2021, and only two are [co]au­thors on what I would con­sider a highly rele­vant pa­per. Ap­par­ently it is some­what in­ten­tional that these fel­low­ships are not in­tended to be spe­cific to AI safety, though I do not re­ally un­der­stand what they are in­tended for. OpenPhil sug­gested that part of the pur­pose was to build a com­mu­nity, which I don’t re­ally un­der­stand, be­cause there is limited ev­i­dence that the win­ners work to­gether, at least for the first two co­horts.

They also launched a schol­ar­ship pro­gram last year which seems more tai­lored to peo­ple fo­cused on the long-term fu­ture, though it is not AI spe­cific, and they recom­mend AI longter­mists to ap­ply to the AI one first. There is also a sep­a­rate schol­ar­ship pro­gram for tech­nol­ogy policy peo­ple as well.

Their fund­ing is suffi­ciently dom­i­nant in the EA move­ment that, as Linch pointed out, it can make in­de­pen­dent eval­u­a­tion difficult. Vir­tu­ally ev­ery­one ca­pa­ble of do­ing so ei­ther has re­ceived OpenPhil money in the past or might want to do so in the fu­ture.

Research

Most of their re­search con­cerns their own grant­ing, and is of­ten non-pub­lic.

Co­tra’s The case for al­ign­ing nar­rowly su­per­hu­man mod­els sug­gests we should work on mak­ing large, ‘gen­eral’ AI mod­els bet­ter able to achieve spe­cific hu­man goals. The one-sen­tence de­scrip­tion makes this sound pretty bad but it’s ac­tu­ally an in­ter­est­ing idea. You take sys­tems like GPT-3, which seem like they have the ‘power’ to solve many in­ter­est prob­lems, but aren’t ‘mo­ti­vated’ to do so, and try to give them that mo­ti­va­tion, *with­out* mak­ing them more pow­er­ful (e.g. scal­ing up). This could, per­haps, func­tion as some­thing of a dry-run for the big al­ign­ment task. One par­tic­u­lar ex­am­ple she refers to as ‘sand­wich­ing’; al­ign­ing an AI to help a group of lay­men reach ex­pert-level perfor­mance on a task, be­cause the ex­perts provide you with a ground-truth for perfor­mance eval­u­a­tion. See also the dis­cus­sion here. Over­all I thought this was an ex­cel­lent pa­per. #Strategy

David­son’s Could Ad­vanced AI Drive Ex­plo­sive Eco­nomic Growth? dis­cusses some sim­ple eco­nomic growth mod­els and what they sug­gest for fu­ture growth. The core in­sight—that hu­man-level AI could un-do the de­mo­graphic tran­si­tion and un­lock su­per-ex­po­nen­tial growth again—should I think not be very sur­pris­ing. #Forecasting

Karnofsky’s All Pos­si­ble Views About Hu­man­ity’s Fu­ture Are Wild, and the re­lated se­quence of posts, ar­gues that all plau­si­ble in­side views sug­gest the fu­ture is go­ing to be weird—e.g. mas­sive growth, ex­tinc­tion, value lock in etc. Many of these ideas are not new but they are well pre­sented. #Forecasting

Beck­stead & Thomas’s A para­dox for tiny pos­si­bil­ities and enor­mous val­ues dis­cusses Pas­calian ar­gu­ments, and the prob­lems that arise if you try to re­ject small-prob­a­bil­ity-mas­sive-pay­off cases. In par­tic­u­lar, they gen­er­al­ise be­yond the ex­pected util­ity frame­work. Re­searchers from GPI were also named au­thors on the pa­per. #Ethics

Finances

To my knowl­edge they are not cur­rently so­lic­it­ing dona­tions from the gen­eral pub­lic, as they have a lot of money from Dustin and Cari, so in­cre­men­tal fund­ing is less of a pri­or­ity than for other or­gani­sa­tions. They could be a good place to work how­ever.

SFF: The Sur­vival and Flour­ish­ing Fund

SFF (web­site) is a donor ad­vised fund, tak­ing over ac­tivi­ties pre­vi­ously run by BERI, but now with a sep­a­rate team. SFF was ini­tially funded in 2019 by a grant of ap­prox­i­mately $2 mil­lion from BERI, which in turn was funded by dona­tions from philan­thropist Jaan Tal­linn; Jaan re­mains the largest fun­der.

You can read Zvi’s ex­pe­rience of be­ing an eval­u­a­tor for the fund here.

Grants

In its grant­mak­ing SFF uses an in­no­va­tive al­lo­ca­tion pro­cess to com­bine the views of many grant eval­u­a­tors (de­scribed here). SSF has pub­lished the re­sults of two grant­mak­ing rounds this year (de­scribed here and here), where they donated around $19.4m, of which I es­ti­mate around $13.8m (73%) was AI re­lated, and 75% were to things I would have funded (giv­ing par­tial credit both times).

The largest dona­tions in the year were to:

  • LTFF: $2.1m

  • Alpen­glow/​CLTR: 1.9m

  • Less­wrong/​Light­cone: 1.9m

  • CLR: 1.2m

  • CFAR: 1.2m

  • ALLFED: 1.2m

  • David Krueger’s group at Cam­bridge: 1m

FTX Foundation

The FTX foun­da­tion is in the pro­cess of be­ing launched to dis­tribute some of the prof­its from FTX/​Alameda, and hired Nick Beck­stead (formerly a pro­gram officer at Open Phil mak­ing grants in this area) as CEO so I ex­pect them to make large and thought­ful grants to highly rele­vant or­gani­sa­tions.

BERI: The Berkeley Ex­is­ten­tial Risk Initiative

BERI is a (formerly Berkeley-based) in­de­pen­dent Xrisk or­gani­sa­tion, founded by An­drew Critch but now led by Sawyer Ber­nath. They provide sup­port to var­i­ous uni­ver­sity-af­fili­ated ex­is­ten­tial risk groups to fa­cil­i­tate ac­tivi­ties (like hiring en­g­ineers and as­sis­tants) that would be hard within the uni­ver­sity con­text, alongside other ac­tivi­ties—see their FAQ for more de­tails.

In 2019 they pivoted, drop­ping var­i­ous non-core ac­tivi­ties (e.g. grant­mak­ing) and are now es­sen­tially en­tirely fo­cused on pro­vid­ing sup­port to re­searchers en­gaged in longter­mist (mainly x-risk) work at uni­ver­si­ties and other in­sti­tu­tions. They have five main col­lab­o­ra­tions:

  • FHI: The Fu­ture of Hu­man­ity Institute

  • CSER: The Cen­ter for the Study of Ex­is­ten­tial Risks

  • CHAI: The Cen­ter for Hu­man Com­pat­i­ble AI

  • SERI: The Stan­ford Ex­is­ten­tial Risk Ini­ti­a­tive (pre­vi­ously a trial col­lab­o­ra­tion)

  • ALL: The Au­tonomous Learn­ing Lab­o­ra­tory at UMass Amherst (pre­vi­ously a trial col­lab­o­ra­tion)

In ad­di­tion they have a large num­ber of trial col­lab­o­ra­tions:

  • CLTC: The Cen­ter for Long-Term Cybersecurity

  • CTPL: The Tech Policy Lab at Cornell

  • David Krueger’s un­named lab at Cambridge

  • Dy­lan Had­field-Menell’s lab at MIT

  • In­terAct – the In­ter­ac­tive Au­ton­omy and Col­lab­o­ra­tive Tech­nolo­gies Lab (Anca Dra­gan)

  • Meir Frei­den­berg and Joe Halpern at Cornell

  • The Anh Han group at Teesside

  • The Safe Robotics Lab­o­ra­tory at Princeton

  • The Sculpt­ing Evolu­tion Group at the MIT Me­dia Lab

  • Yale Effec­tive Altruism

I think this is po­ten­tially a pretty at­trac­tive pro­ject. Univer­sity af­fili­ated or­gani­sa­tions provide the con­nec­tion to main­stream academia that we need, but run the risk of in­effi­ciency both due to their lack of in­de­pen­dence from the cen­tral uni­ver­sity and also the rel­a­tive in­de­pen­dence of their aca­demics. BERI po­ten­tially offers a way for donors to sup­port the uni­ver­sity af­fili­ated ecosys­tem in a tar­geted fash­ion.

In gen­eral they op­er­ate on a pull model, where they provide re­sources to help their groups achieve their goals, and seem quite un­likely to say ‘no’ un­less the re­quest was liter­ally ille­gal or similar. BERI seems to effec­tively ex­er­cise dis­cern­ment at the level of which or­gani­sa­tions they col­lab­o­rate with, not the in­di­vi­d­ual pro­ject level. So if you were not a fan of the groups they col­lab­o­rated with, sup­port­ing BERI would prob­a­bly not be the right choice for you.

They are ap­par­ently quite re­laxed about get­ting credit for work, so not all the stuff they sup­port will list them in the ac­knowl­edg­ments.

Finances

They spent $2,800,000 in 2020 and $2,300,000 in 2021, and plan to spend around $2,000,000 in 2022. They have around $2,400,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.2 years of run­way.

BERI is now seek­ing sup­port from the gen­eral pub­lic. If you wanted to donate you can do so here. Note that if you want to you can re­strict the fund­ing to spe­cific col­lab­o­ra­tions if you wanted, though my guess is fung­ing might be ~100% for small donors.

Non­lin­ear Fund

Non­lin­ear is an in­ter­na­tion­ally based in­de­pen­dent Meta AI Safety or­gani­sa­tion founded in 2021 by Kat Woods and Emer­son Spartz. They aim to provide similar ser­vices as Kat did with Char­ity En­trepreneur­ship: helping launch new pro­jects that provide value to the AI safety com­mu­nity. You can read about them here, or on their web­site here.

One of the big pro­jects they plan to work on is helping EAs hire per­sonal as­sis­tants, which seems like a po­ten­tially pretty effec­tive way of un­lock­ing peo­ple’s time, as well as sev­eral other pro­jects which all seemed like broadly good ideas.

Research

Woods’s The Non­lin­ear Library pro­vides au­to­mat­i­cally gen­er­ated voice ver­sions of top EA con­tent. Given that a lot of peo­ple like listen­ing to pod­casts, this seems like po­ten­tially a huge ac­cess­abil­ity im­prove­ment, which I could imag­ine more con­ser­va­tive or­gani­sa­tions like CEA be­ing con­cerned about offer­ing for le­gal rea­sons. #Community

Finances

They are not ac­tively so­lic­it­ing dona­tions but if you wanted to any­way you can by reach­ing out to Kat.

80,000 Hours

80,000 Hours is a Lon­don based EA Move­ment-Build­ing or­gani­sa­tion founded in 2011 by Will MacAskill & Ben Todd and cur­rently lead by Ben Todd. They are af­fili­ated with CEA. Pro­vides ca­reer re­search, coach­ing and head­hunt­ing for the world’s most im­por­tant ca­reers, of which AI safety is a sig­nifi­cant fo­cus. Their re­search can be found here. Their web­site is here.

Dur­ing the year, Peters Hartree and McIn­tyre left, and María Gu­tiér­rez Ro­jas might leave next year. They hired Bella For­ristal, Ben­jamin H, Matt Rear­don and Alex Lawsen.

Their abil­ity to make con­nec­tions for peo­ple seems ex­tremely valuable.

80,000 Hours’s AI/​ML safety re­search job board col­lects var­i­ous jobs that could be valuable for peo­ple in­ter­ested in AI safety. At the time of writ­ing it listed 128 po­si­tions, all of which seemed like good op­tions that it would be valuable to have sen­si­ble peo­ple fill. I sus­pect most peo­ple look­ing for AI jobs would find some on here they hadn’t heard of oth­er­wise, though of course for any given per­son many will not be ap­pro­pri­ate. They also have job boards for other EA causes. #Careers

They have a very good pod­cast, read­ers might be in­ter­ested in these epi­sodes from this year:

80k also pro­duced what I now re­gard as my ‘de­fault’ non-text-based EA in­tro link.

Finances

They spent $3,050,000 in 2020 and $3,032,000 in 2021, and plan to spend around $3,600,000 in 2022. They have around $6,600,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 1.8 years of run­way.

If you wanted to donate you can do so here.

AISS: AI Safety Support

AISS is a globally based in­de­pen­dent AI Safety Sup­port or­gani­sa­tion founded in 2020 by JJ Hep­burn and Linda Linse­fors, grow­ing out of the AI Safety Camps which re­main a pro­ject of their or­gani­sa­tion. They aim to provide coach­ing and sup­port ser­vices to peo­ple early in the AI safety re­searcher pipeline.

Finances

They spent $0 in 2020 and $170,000 in 2021, and plan to spend around $650,000 in 2022. They have around $440,000 in cash and pledged fund­ing, sug­gest­ing (on a very naïve calcu­la­tion) around 0.7 years of run­way.

If you wanted to donate you could do so here.

Other News

Google con­tinues to im­prove the perfor­mance of its ASICs.

EA Cam­bridge built an AI Safety Fun­da­men­tals cur­ricu­lum to run in the new year that is ac­cept­ing ap­pli­ca­tions.

Face­book got a lot of (some­what mis­lead­ing) nega­tive press over leaked re­ports that its prod­ucts made users un­happy; Nir Eyal points out that this sort of forced-open­ness re­duces the in­cen­tives for tech com­pa­nies to try to ad­dress such prob­lems: if they had never re­searched the is­sue in the first place there would have been noth­ing to leak.

The EU has a pro­posal for a big AI reg­u­la­tion, some­what mod­el­led af­ter GDPR. It fo­cuses on the use of AI in what they per­ceive to be ‘high risk’ ar­eas, like bio­met­rics, util­ity in­fras­truc­ture, and per­sonal vet­ting. For these use cases the re­quire­ments are quite re­stric­tive, de­mand­ing tech­ni­cal doc­u­men­ta­tion and hu­man over­rides. How­ever, it does not seem to ap­ply to pre-de­ploy­ment sys­tems, and fo­cuses on AI use cases, rather than the power of the sys­tem, to the ex­tent that AGI sys­tems not used in high risk ap­pli­ca­tions are ex­plic­itly ex­empted. So it seems that much of e.g. Deep­mind would cur­rently be triply ex­empted: the UK has left the EU, much of their work is pre-mar­ket, and AGI is ex­plic­itly not a fo­cus. It does ban sub­limi­nal mes­sag­ing, which seems good – we do not want AIs chang­ing peo­ple’s val­ues – but ex­cludes mil­i­tary AIs. In the­ory the in­sti­tu­tions set up by this could provide in­fras­truc­ture for fur­ther fu­ture AGI reg­u­la­tion, but in prac­tice of­ten EU reg­u­la­tions are not amended de­spite clear defi­cien­cies, and poli­ti­cal at­ten­tion may move el­se­where. I’ve been told that ‘it will hurt EU AI com­pa­nies, slow­ing down progress and re­duc­ing com­pe­ti­tion’ is not the rea­son policy EAs like it. See also here.

Or­gani­sa­tion Se­cond Preferences

A new strat­egy I em­ployed this year was to ask each or­gani­sa­tion I con­tacted which or­gani­sa­tion other than them­selves they would be most ex­cited to re­ceive fund­ing. I figured this could be a good way to take ad­van­tage of their do­main spe­cific knowl­edge, in­clud­ing of re­search di­rec­tions, strate­gies and per­sonal qual­ity, in an effi­cient man­ner. There is how­ever a po­ten­tial bias to­wards well known and so­cially cen­tral or­gani­sa­tions.

Not ev­ery or­gani­sa­tion was will­ing to name other orgs they preferred fund­ing to go to; I should prob­a­bly have made ex­plicit that I wouldn’t share this info ex­cept in ag­gre­gated (and hence largely anonymised) form. You should prob­a­bly as­sume that or­gani­sa­tions I had bet­ter so­cial bonds with would be more likely to share this info.

The clear win­ner of this was the LTFF; no other or­gani­sa­tion came close. Of course it is pos­si­ble that some of these or­gani­sa­tions may have thought that the LTFF might give them grants, but their ex­pected share of such in­cre­men­tal dol­lars is likely small, and I think most of these re­ports were hon­est rep­re­sen­ta­tions of their views. The LTFF was so far ahead of any other or­gani­sa­tion that this seems like a sig­nifi­cant data point in their favour. A dis­tant sec­ond were non-spe­cific sen­ti­ments along the lines of “fund some­thing that seems un­der­funded”.

Method­olog­i­cal Thoughts

In­side View vs Out­side View

This doc­u­ment is writ­ten mainly, but not ex­clu­sively, us­ing pub­li­cly available in­for­ma­tion (as well as emailing the or­gani­sa­tions with a few sim­ple ques­tions). In the tra­di­tion of ac­tive man­age­ment, I hope to syn­the­sise many pieces of in­di­vi­d­u­ally well known facts into a whole which pro­vides new and use­ful in­sight to read­ers. Ad­van­tages of this are that 1) it is rel­a­tively un­bi­ased, com­pared to in­side in­for­ma­tion which in­vari­ably favours those you are close to so­cially and 2) most of it is leg­ible and ver­ifi­able to read­ers. The dis­ad­van­tage is that there are prob­a­bly many per­ti­nent facts that I am not a party to! Wei Dai has writ­ten about how much dis­cus­sion now takes place in pri­vate google doc­u­ments – for ex­am­ple this Drexler piece ap­par­ently; in most cases I do not have ac­cess to these. If you want the in­side scoop I am not your guy; all I can sup­ply is ex­te­rior scoop­ing.

We fo­cus on pa­pers, rather than out­reach or other ac­tivi­ties. This is partly be­cause they are much eas­ier to mea­sure; while there has been a large in­crease in in­ter­est in AI safety over the last year, it’s hard to work out who to credit for this, and partly be­cause I think progress has to come by per­suad­ing AI re­searchers, which I think comes through tech­ni­cal out­reach and pub­lish­ing good work, not pop­u­lar/​poli­ti­cal work.

Or­gani­sa­tions vs Individuals

Many cap­i­tal al­lo­ca­tors seem to op­er­ate un­der a sort of Great Man the­ory of in­vest­ment, whereby the most im­por­tant thing is to iden­tify a guy to in­vest in who is re­ally clever and ‘gets it’. I think there is a lot of merit in this (as ar­gued here for ex­am­ple); how­ever, I think I be­lieve in it less than they do. In par­tic­u­lar, I worry that this ap­proach leads to over-fund­ing skil­led rhetori­ci­ans and those the in­vestor/​donor is so­cially con­nected to. Per­haps as a re­sult of my in­sti­tu­tional in­vest­ment back­ground, I place a lot more weight on his­tor­i­cal re­sults. Also, as a prac­ti­cal mat­ter, it is hard for in­di­vi­d­ual donors to fund in­di­vi­d­ual re­searchers. But as part of a con­ces­sion to the in­di­vi­d­ual-first view I’ve started ask­ing or­gani­sa­tions if any­one sig­nifi­cant has joined or left re­cently, though in prac­tice I think or­gani­sa­tions are far more will­ing to high­light new peo­ple join­ing than old peo­ple leav­ing.

Judg­ing or­gani­sa­tions on their his­tor­i­cal out­put is nat­u­rally go­ing to favour more ma­ture or­gani­sa­tions. A new startup, whose value all lies in the fu­ture, will be dis­ad­van­taged. How­ever, I think that this is the cor­rect ap­proach for donors who are not tightly con­nected to the or­gani­sa­tions in ques­tion. The newer the or­gani­sa­tion, the more fund­ing should come from peo­ple with close knowl­edge. As or­gani­sa­tions ma­ture, and have more eas­ily ver­ifi­able sig­nals of qual­ity, their fund­ing sources can tran­si­tion to larger pools of less ex­pert money. This is how it works for star­tups turn­ing into pub­lic com­pa­nies and I think the same model ap­plies here. (I ac­tu­ally think that even those with close per­sonal knowl­edge should use his­tor­i­cal re­sults more, to help over­come their bi­ases.)

This judge­ment in­volves analysing a large num­ber of pa­pers re­lat­ing to Xrisk that were pro­duced dur­ing 2021. Hope­fully the year-to-year volatility of out­put is suffi­ciently low that this is a rea­son­able met­ric; I have tried to in­di­cate cases where this doesn’t ap­ply. I also at­tempted to in­clude pa­pers dur­ing De­cem­ber 2020, to take into ac­count the fact that I’m miss­ing the last month’s worth of out­put from 2021, but I can’t be sure I did this suc­cess­fully.

Politics

My im­pres­sion is that policy on most sub­jects, es­pe­cially those that are more tech­ni­cal than emo­tional is gen­er­ally made by the gov­ern­ment and civil ser­vants in con­sul­ta­tion with, and be­ing lob­bied by, out­side ex­perts and in­ter­ests. Without ex­pert (e.g. top ML re­searchers in academia and in­dus­try) con­sen­sus, no use­ful policy will be en­acted. Push­ing di­rectly for policy seems if any­thing likely to hin­der ex­pert con­sen­sus. At­tempts to di­rectly in­fluence the gov­ern­ment to reg­u­late AI re­search seem very ad­ver­sar­ial, and risk be­ing pat­tern-matched to ig­no­rant techno­pho­bic op­po­si­tion to GM foods or other kinds of progress. We don’t want the ‘us-vs-them’ situ­a­tion that has oc­curred with cli­mate change, to hap­pen here. AI re­searchers who are dis­mis­sive of safety law, re­gard­ing it as an im­po­si­tion and en­cum­brance to be en­dured or evaded, will prob­a­bly be harder to con­vince of the need to vol­un­tar­ily be ex­tra-safe—es­pe­cially as the reg­u­la­tions may ac­tu­ally be to­tally in­effec­tive.

The only case I can think of where sci­en­tists are rel­a­tively happy about puni­tive safety reg­u­la­tions, nu­clear power, is one where many of those ini­tially con­cerned were sci­en­tists them­selves, and also had the effect of ba­si­cally end­ing any progress in nu­clear power (at great cost to cli­mate change). Given this, I ac­tu­ally think policy out­reach to the gen­eral pop­u­la­tion is prob­a­bly nega­tive in ex­pec­ta­tion.

If you’re in­ter­ested in this, I’d recom­mend you read this blog post from a few years back.

Openness

I think there is a strong case to be made that open­ness in AGI ca­pac­ity de­vel­op­ment is bad. As such I do not as­cribe any pos­i­tive value to pro­grams to ‘de­moc­ra­tize AI’ or similar.

One in­ter­est­ing ques­tion is how to eval­u­ate non-pub­lic re­search. For a lot of safety re­search, open­ness is clearly the best strat­egy. But what about safety re­search that has, or po­ten­tially has, ca­pa­bil­ities im­pli­ca­tions, or other in­fo­haz­ards? In this case it seems best if the re­searchers do not pub­lish it. How­ever, this leaves fun­ders in a tough po­si­tion – how can we judge re­searchers if we can­not read their work? Maybe in­stead of do­ing top se­cret valuable re­search they are just slack­ing off. If we donate to peo­ple who say “trust me, it’s very im­por­tant and has to be se­cret” we risk be­ing taken ad­van­tage of by char­latans; but if we re­fuse to fund, we in­cen­tivize peo­ple to re­veal pos­si­ble in­fo­haz­ards for the sake of money. (Is it even a good idea to pub­li­cise that some­one else is do­ing se­cret re­search?)

For similar rea­sons I pre­fer re­search to not be be­hind pay­walls or in­side ex­pen­sive books, but this is a sig­nifi­cantly less im­por­tant is­sue.

More pro­saically, or­gani­sa­tions should make sure to up­load the re­search they have pub­lished to their web­site! Hav­ing gone to all the trou­ble of do­ing use­ful re­search it is a con­stant shock to me how many or­gani­sa­tions don’t take this sim­ple step to sig­nifi­cantly in­crease the reach of their work. Ad­di­tion­ally, sev­eral times I have come across in­cor­rect in­for­ma­tion on or­gani­sa­tion’s web­sites.

Re­search Flywheel

My ba­sic model for AI safety suc­cess is this:

  1. Iden­tify in­ter­est­ing problems

    1. As a byproduct this draws new peo­ple into the field through al­tru­ism, nerd-sniping, ap­par­ent tractability

  2. Solve in­ter­est­ing problems

    1. As a byproduct this draws new peo­ple into the field through cred­i­bil­ity and prestige

  3. Repeat

One ad­van­tage of this model is that it pro­duces both ob­ject-level work and field growth.

Over time, hope­fully an in­creas­ingly large frac­tion of AI re­searchers will be safety con­scious, such that they vol­un­tar­ily choose to adopt safer tech­niques, due to the de­sires of work­ers, man­age­ment and spe­cial­ist in­vestors. This the­ory of change does not op­er­ate via poli­ti­ci­ans, gov­ern­ments or vot­ers. It does have some weak spots, e.g. China.

There is also some value in ar­gu­ing for the im­por­tance of the field (e.g. Bostrom’s Su­per­in­tel­li­gence) or ad­dress­ing crit­i­cisms of the field.

No­tice­ably ab­sent are strate­gic pieces. I find that a lot of these pieces do not add ter­ribly much in­cre­men­tal value. Ad­di­tion­ally, my sus­pi­cion is that strat­egy re­search is, to a cer­tain ex­tent, pro­duced ex­oge­nously by peo­ple who are in­ter­ested /​ tech­ni­cally in­volved in the field. This does not ap­ply to tech­ni­cal strat­egy pieces, about e.g. whether CIRL or Am­plifi­ca­tion is a more promis­ing ap­proach.

There is some­what of a para­dox with tech­ni­cal vs ‘wordy’ pieces how­ever: as a non-ex­pert, it is much eas­ier for me to un­der­stand and eval­u­ate the lat­ter, even though I think the former are much more valuable.

Differ­en­tial AI progress

There are many prob­lems that need to be solved be­fore we have safe gen­eral AI, one of which is not pro­duc­ing un­safe gen­eral AI in the mean­time. If no­body was do­ing non-safety-con­scious re­search there would be lit­tle risk or haste to AGI – though we would be miss­ing out on the po­ten­tial benefits of safe AI.

There are sev­eral con­se­quences of this:

  • To the ex­tent that safety re­search also en­hances ca­pa­bil­ities, it is less valuable.

  • To the ex­tent that ca­pa­bil­ities re­search re-ori­en­tates sub­se­quent re­search by third par­ties into more safety-tractable ar­eas it is more valuable.

  • To the ex­tent that safety re­sults would nat­u­rally be pro­duced as a by-product of ca­pa­bil­ities re­search (e.g. au­tonomous ve­hi­cles) it is less at­trac­tive to fi­nance.

One ap­proach is to re­search things that will make con­tem­po­rary ML sys­tems safer, be­cause you think AGI will be a nat­u­ral out­growth from con­tem­po­rary ML. This has the ad­van­tage of faster feed­back loops, but is also more re­place­able (as per the pre­vi­ous sec­tion).

Another ap­proach is to try to rea­son di­rectly about the sorts of is­sues that will arise with su­per­in­tel­li­gent AI. This work is less likely to be pro­duced ex­oge­nously by un­al­igned re­searchers, but it re­quires much more faith in the­o­ret­i­cal ar­gu­ments, un­moored from em­piri­cal ver­ifi­ca­tion.

Near-term safety AI issues

Ca­pac­ity build­ing VS tol­er­at­ing poor epistemics?

Many peo­ple want to con­nect AI ex­is­ten­tial risk is­sues to ‘near-term’ is­sues; I am gen­er­ally scep­ti­cal of this. For ex­am­ple, au­tonomous cars seem to risk only lo­cal­ised tragedies (though if they were hacked and all crashed si­mul­ta­neously that would be much worse), and pri­vate com­pa­nies should have good in­cen­tives here. Unem­ploy­ment con­cerns seem ex­ag­ger­ated to me, as they have been for most of his­tory (new jobs will be cre­ated), at least un­til we have AGI, at which point we have big­ger con­cerns. Similarly, I gen­er­ally think con­cerns about al­gorith­mic bias are es­sen­tially poli­ti­cal—I recom­mend this pre­sen­ta­tion—though there is at least some con­nec­tion to the value learn­ing prob­lem there.

Some peo­ple ar­gue that work on these near AI is­sues is worth­while be­cause it can in­tro­duce peo­ple to the broader risks around poor AI al­ign­ment. It could also lead to the cre­ation of AI gov­er­nance in­sti­tu­tions that could then do use­ful work later. How­ever, it seems some­what dis­in­gen­u­ous, it risks at­tract­ing grifters while putting off peo­ple who recog­nise that these are bad con­cerns. For ex­am­ple, the pa­per men­tioned above re­jects the pre­cau­tion­ary prin­ci­ple for AI on the ba­sis of re­ject­ing bad ar­gu­ments about un­em­ploy­ment—had these pseudo-straw­man views not been wide­spread, it would have been harder to reach this un­for­tu­nate con­clu­sion.

It’s also the case many of the poli­cies peo­ple recom­mend as a re­sult of these wor­ries are po­ten­tially very harm­ful. A good ex­am­ple is GDPR and similar pri­vacy reg­u­la­tions (in­clud­ing HIPAA) which have made many good things much more difficult—in­clud­ing de­grad­ing our abil­ity to track the pan­demic.

Some in­ter­est­ing spec­u­la­tion I read is the idea that dis­cussing near AI safety is­sues might be a sort of ‘green­wash­ing’ im­mune re­sponse to Xrisk con­cerns. The abil­ity to re­spond to long-term AI safety con­cerns with “yes, we agree AI ethics is very im­por­tance, and that’s why we’re work­ing on pri­vacy and de­colon­is­ing AI” seems like a very rhetor­i­cally pow­er­ful move.

Fi­nan­cial Reserves

Char­i­ties like hav­ing fi­nan­cial re­serves to provide run­way, and guaran­tee that they will be able to keep the lights on for the im­me­di­ate fu­ture. This could be jus­tified if you thought that char­i­ties were ex­pen­sive to cre­ate and de­stroy, and were wor­ried about this oc­cur­ring by ac­ci­dent due to the whims of donors. Un­like a com­pany which sells a product, it seems rea­son­able that char­i­ties should be more con­cerned about this.

Donors pre­fer char­i­ties to not have too much re­serves. Firstly, those re­serves are cash that could be be­ing spent on out­comes now, by ei­ther the spe­cific char­ity or oth­ers. Valuable fu­ture ac­tivi­ties by char­i­ties are sup­ported by fu­ture dona­tions; they do not need to be pre-funded. Ad­di­tion­ally, hav­ing re­serves in­creases the risk of or­gani­sa­tions ‘go­ing rogue’, be­cause they are in­su­lated from the need to con­vince donors of their value.

As such, in gen­eral I do not give full cre­dence to char­i­ties say­ing they need more fund­ing be­cause they want much more than 18 months or so of run­way in the bank. If you have a year’s re­serves now, af­ter this De­cem­ber you will have that plus what­ever you raise now, giv­ing you a mar­gin of safety be­fore rais­ing again next year.

I es­ti­mated re­serves = (cash and grants) /​ (next year’s bud­get). In gen­eral I think of this as some­thing of a mea­sure of ur­gency. How­ever de­spite be­ing prima fa­cie a very sim­ple calcu­la­tion there are many is­sues with this data. As such these should be con­sid­ered sug­ges­tive only.

Dona­tion Matching

In gen­eral I be­lieve that char­ity-spe­cific dona­tion match­ing schemes are some­what dishon­est, de­spite my hav­ing pro­vided match­ing fund­ing for at least one in the past.

Ob­vi­ously cause-neu­tral dona­tion match­ing is differ­ent and should be ex­ploited. Every­one should max out their cor­po­rate match­ing pro­grams if pos­si­ble, and things like the an­nual Face­book Match con­tinue to be great op­por­tu­ni­ties.

Poor Qual­ity Research

Partly thanks to the efforts of the com­mu­nity, the field of AI safety is con­sid­er­ably more well re­spected and funded than was pre­vi­ously the case, which has at­tracted a lot of new re­searchers. While gen­er­ally good, one side effect of this (per­haps com­bined with the fact that many low-hang­ing fruits of the in­sight tree have been plucked) is that a con­sid­er­able amount of low-qual­ity work has been pro­duced. For ex­am­ple, there are a lot of pa­pers which can be ac­cu­rately sum­ma­rized as as­sert­ing “just use ML to learn ethics”. Fur­ther­more, the con­ven­tional peer re­view sys­tem seems to be ex­tremely bad at deal­ing with this is­sue.

The stan­dard view here is just to ig­nore low qual­ity work. This has many ad­van­tages, for ex­am­ple 1) it re­quires lit­tle effort, 2) it doesn’t an­noy peo­ple. This con­spir­acy of silence seems to be the strat­egy adopted by most sci­en­tific fields, ex­cept in ex­treme cases like anti-vax­ers.

How­ever, I think there are some down­sides to this strat­egy. A suffi­ciently large mi­lieu of low-qual­ity work might de­grade the rep­u­ta­tion of the field, de­ter­ring po­ten­tially high-qual­ity con­trib­u­tors. While low-qual­ity con­tri­bu­tions might help im­prove Con­crete Prob­lems’ cita­tion count, they may use up scarce fund­ing.

More­over, it is not clear to me that ‘just ig­nore it’ re­ally gen­er­al­izes as a com­mu­nity strat­egy. Per­haps you, en­light­ened reader, can judge that “How to solve AI Ethics: Just use RNNs” is not great. But is it re­ally effi­cient to re­quire ev­ery­one to in­de­pen­dently work this out? Fur­ther­more, I sus­pect that the idea that we can all just ig­nore the weak stuff is some­what an ex­am­ple of typ­i­cal mind fal­lacy. Sev­eral times I have come across peo­ple I re­spect ac­cord­ing re­spect to work I found clearly pointless. And sev­eral times I have come across peo­ple I re­spect ar­gu­ing per­sua­sively that work I had pre­vi­ously re­spected was very bad – but I only learnt they be­lieved this by chance! So I think it is quite pos­si­ble that many peo­ple will waste a lot of time as a re­sult of this strat­egy, es­pe­cially if they don’t hap­pen to move in the right so­cial cir­cles.

Hav­ing said all that, I am not a fan of unilat­eral ac­tion, and am some­what self­ishly con­flict-averse, so will largely con­tinue to abide by this non-ag­gres­sion con­ven­tion. My only de­vi­a­tion here is to make it ex­plicit. If you’re in­ter­ested in this you might en­joy this by 80,000 Hours.

The Bay Area

Much of the AI and EA com­mu­ni­ties, and es­pe­cially the EA com­mu­nity con­cerned with AI, is lo­cated in the Bay Area, es­pe­cially Berkeley and San Fran­cisco. This is an ex­tremely ex­pen­sive place, and is dys­func­tional both poli­ti­cally and so­cially. It seems to at­tract peo­ple who are ex­tremely weird in so­cially un­de­sir­able ways, in­clud­ing nu­mer­ous cult-like-things (though some are lo­cated el­se­where) – though to be fair the peo­ple who are do­ing use­ful work in AI or­gani­sa­tions seem to be drawn from a bet­ter dis­tri­bu­tion than the broader com­mu­nity. In gen­eral I think the cen­tral­iza­tion is bad, but if there must be cen­tral­iza­tion I would pre­fer it be al­most any­where other than Berkeley. Ad­di­tion­ally, I think that, like VCs, some fun­ders are ge­o­graph­i­cally my­opic, and bi­ased to­wards fund­ing things in the Bay Area. As such, I have a mild prefer­ence to­wards fund­ing non-Bay-Area pro­jects.

Conclusions

The size of the field con­tinues to grow, both in terms of fund­ing and re­searchers. Both make it in­creas­ingly hard for in­di­vi­d­ual donors. I’ve at­tempted to sub­jec­tively weigh the pro­duc­tivity of the differ­ent or­gani­sa­tions against the re­sources they used to gen­er­ate that out­put, and donate ac­cord­ingly.

An in­creas­ingly large amount of the best work is be­ing done in places that are in­side com­pa­nies: Deep­mind, OpenAI, An­thropic etc. While a good de­vel­op­ment over­all—I am cer­tainly very pleased that Deep­mind has such pro­duc­tive a team—it means we can’t re­ally do much here. And many of the not-for-prof­its are well funded.

My con­stant wish is to pro­mote a lively in­tel­lect and in­de­pen­dent de­ci­sion-mak­ing among read­ers; hope­fully my lay­ing out the facts as I see them above will prove helpful to some read­ers. Here is my even­tual de­ci­sion, rot13′d so you can do come to your own con­clu­sions first (which I strongly recom­mend):

V jnag gb er-vgrengr gung V unir n ybg bs pbasyvpgf bs va­gr­erfg, fb guvf fub­hyq abg or pbafvqr­erq n ‘erp­bz­zraqngvba’ be nalgu­vat bs gur fbeg. Guvf vf whfg ju­rer V nz qbang­vat guvf lrne.

[Ha?]sbeghangryl n ybg bs gur bet­navfngvbaf gung V gu­vax qb gur orfg erfrnepu qb abg frrz yvxr cnegv­phyneyl nggenpgvir shaq­vat bc­cbeghavgvrf guvf lrne. Znal ner rvgure sbe-ceb­svg pbzc­navrf be ny­ernql unir fge­bat svanap­vat sbe gurve pheerag cy­naf. Bguref fvz­cyl qb abg ce­bivqr gur yriry bs qvfpybfher erd­hverq sbe riny­h­ngvba.

V qb erznva bcgvzvfgvp nobhg gur YGSS. Rira gub­htu gurve choyvp qvfpybfher unf orra jrnx guvf lrne, jung gurl’ir funerq cev­in­gryl unf orra nqrdhngr, naq V gu­vax gurve fhc­cbeg sbe vaq­vivqhny fn­srgl erfrnepuref vf rkgerzryl iny­h­noyr. V jb­hyq yvxr guvf gb or fhssvpvragyl bire-shaqrq gung fhssvpvrayl fxvyyrq cr­b­cyr pna pbasvqrag­nyyl rffragvnyyl znxr n pn­erre bhg YGSS shaq­vat. Fb gung’f ju­rer V’z qbang­vat guvf lrne. OREV ce­bonoyl jb­hyq unir orra zl frp­baq cvpx.

Ohg lbh fub­hyq pbzr gb lbhe bja pbapy­h­fvbaf!

How­ever, I wish to em­pha­size that all the above or­gani­sa­tions seem to be do­ing good work on the most im­por­tant is­sue fac­ing mankind. It is the na­ture of mak­ing de­ci­sions un­der scarcity that we must pri­ori­tize some over oth­ers, and I hope that all or­gani­sa­tions will un­der­stand that this nec­es­sar­ily in­volves nega­tive com­par­i­sons at times.

Thanks for read­ing this far; hope­fully you found it use­ful. Apolo­gies to ev­ery­one who did valuable work that I ex­cluded!

If you found this post helpful, and es­pe­cially if it helped in­form your dona­tions, please con­sider let­ting me and any or­gani­sa­tions you donate to as a re­sult know.

Disclosures

I have not in gen­eral checked all the proofs in these pa­pers, and similarly trust that re­searchers have hon­estly re­ported the re­sults of their simu­la­tions.

I have a large num­ber of con­flicts of in­ter­est that I can­not in­di­vi­d­u­ally dis­close.

I shared drafts of the in­di­vi­d­ual or­gani­sa­tion sec­tions with rep­re­sen­ta­tives from FHI, Gov.AI, CHAI, MIRI, GCRI, Red­wood Re­search, BERI, Ought, AI Im­pacts, GPI, ARC, CSET, Light­cone, CLTR/​Alpen­glow, CLR, OpenPhil, FTX, LTFF, NonLin­ear, Re­think Pri­ori­ties, 80k, CSER (and pos­si­bly oth­ers I for­got)

My eter­nal grat­i­tude to my anony­mous re­view­ers for their in­valuable re­view­ing. Any re­main­ing mis­takes are of course my own. I would also like to thank my wife and daugh­ter for tol­er­at­ing all the time I have spent/​in­vested/​wasted on this.

Look­ing for Re­search As­sis­tant for Next Year

Over time the amount of ground we need to cover here has in­creased and my time has be­come more scarce. I have been spend­ing more time just col­lect­ing dis­persed in­for­ma­tion and less time be­ing an­a­lyt­i­cal. As such I think it might make sense to take on a Re­search As­sis­tant for next year; ba­si­cally look­ing for some­one who is dili­gent, re­li­able and in­ter­ested in AI Xrisk to email or­gani­sa­tions, find in­for­ma­tion on web­sites and en­ter into the spread­sheet. In the past CEA briefly ex­pressed in­ter­est; pos­si­bly we could find fund­ing for this.

Sources

This is a list of all the ar­ti­cles cited with their own in­di­vi­d­ual para­graph.

Rep­u­ta­tions for Re­solve and Higher-Order Beliefs in Cri­sis Bar­gain­ing − 2021-03-11 - https://​​jour­nals.sagepub.com/​​doi/​​full/​​10.1177/​​0022002721995549

AI Vignettes Pro­ject − 2021-06-25 - https://​​aiim­pacts.org/​​ai-vi­gnettes-pro­ject/​​

Aiken, Cather­ine—Clas­sify­ing AI Sys­tems − 2021-11-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​clas­sify­ing-ai-sys­tems/​​

Alex, Neel; Lifland, Eli; Tun­stall, Lewis; Thakur, Ab­hishek; Ma­ham, Pe­gah; Riedel, C. Jess; Hine, Em­mie; Ashurst, Carolyn; Sedille, Paul; Car­lier, Alexis; Noe­tel, Michael; Stuh­lmüller, An­dreas—RAFT: A Real-World Few-Shot Text Clas­sifi­ca­tion Bench­mark − 2021-10-28 - https://​​arxiv.org/​​abs/​​2109.14076

An­drus, McKane; Dean, Sarah; Gilbert, Thomas Krendl; Lam­bert, Nathan; Zick, Tom—AI Devel­op­ment for the Public In­ter­est: From Ab­strac­tion Traps to So­ciotech­ni­cal Risks − 2021-02-04 - https://​​arxiv.org/​​abs/​​2102.04255

Arnold, Zachary; Toner, He­len—AI Ac­ci­dents: An Emerg­ing Threat − 2021-07-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​ai-ac­ci­dents-an-emerg­ing-threat/​​

Ashurst, Carolyn; Hine, Em­mie; Sedille, Paul; Car­lier, Alexis—AI Ethics State­ments: Anal­y­sis and les­sons learnt from NeurIPS Broader Im­pact State­ments − 2021-11-02 - https://​​arxiv.org/​​abs/​​2111.01705

Baker, Jamie—Ethics and Ar­tifi­cial In­tel­li­gence: A Poli­cy­maker’s In­tro­duc­tion − 2021-04-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​ethics-and-ar­tifi­cial-in­tel­li­gence/​​

Barnes, Beth; Chris­ti­ano, Paul—De­bate up­date: Obfus­cated ar­gu­ments prob­lem − 2020-12-22 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​PJLABqQ962hZEqhdB/​​de­bate-up­date-obfus­cated-ar­gu­ments-prob­lem#comments

Baum, Seth; de Neufville, Robert; Bar­rett, Tony; Fitzger­ald, McKenna—GCRI State­ment on the Jan­uary 6 US Capi­tol In­sur­rec­tion − 2021-01-15 - https://​​gcrin­sti­tute.org/​​gcri-state­ment-on-the-jan­uary-6-us-capi­tol-in­sur­rec­tion/​​

Baum, Seth; Owe, An­drea—Ar­tifi­cial In­tel­li­gence Needs En­vi­ron­men­tal Ethics − 2021-11-14 - https://​​gcrin­sti­tute.org/​​ar­tifi­cial-in­tel­li­gence-needs-en­vi­ron­men­tal-ethics/​​

Beck­stead, Nick; Thomas, Teruji—A para­dox for tiny pos­si­bil­ities and enor­mous val­ues − 2021-07-15 - https://​​globalpri­ori­tiesin­sti­tute.org/​​nick-beck­stead-and-teruji-thomas-a-para­dox-for-tiny-prob­a­bil­ities-and-enor­mous-val­ues/​​

Brown, Daniel S.; Sch­nei­der, Jor­dan; Dra­gan, Anca D.; Niekum, Scott—Value Align­ment Ver­ifi­ca­tion − 2020-12-02 - https://​​arxiv.org/​​abs/​​2012.01557

Buchanan, Ben; Lohn, An­drew; Musser, Micah; Se­dova, Ka­te­rina—How Lan­guage Models Could Change Dis­in­for­ma­tion − 2021-05-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​truth-lies-and-au­toma­tion/​​

Cam­marata, Nick; Goh, Gabriel; Carter, Shan; Voss, Chel­sea; Schu­bert, Lud­wig; Olah, Chris—Curve Cir­cuits − 2021-01-30 - https://​​dis­till.pub/​​2020/​​cir­cuits/​​curve-cir­cuits/​​

Cave, Stephen; Whit­tle­stone, Jess; Nyrup, Rune; Ó hÉigeartaigh, Seán; Calvo, Ra­fael—Us­ing AI eth­i­cally to tackle covid-19 − 2021-03-16 - https://​​www.bmj.com/​​con­tent/​​372/​​bmj.n364

Ce­bul, Matthew; Dafoe, Allan; Mon­teiro—Co­er­cion and the Cred­i­bil­ity of As­surances − 2021-07-15 - https://​​drive.google.com/​​file/​​d/​​1q-vRP19Izn­fPldB­caO6NglnSkyL7wYaL/​​view

Chatila, Raja; Dignum, Virginia; Fisher, Michael; Gi­an­notti, Fosca; Morik, Katha­rina; Rus­sell, Stu­art; Ye­ung, Karen—Trust­wor­thy AI − 2021-02-06 - https://​​smile.ama­zon.com/​​gp/​​product/​​B08W3XZ1TJ/​​ref=ppx_yo_dt_b_d_asin_ti­tle_o00?ie=UTF8&psc=1

Chen, Mark; Tworek, Jerry; Jun, Hee­woo; Yuan, Qiming; Pinto, Hen­rique Ponde de Oliveira; Ka­plan, Jared; Ed­wards, Harri; Burda, Yuri; Joseph, Ni­cholas; Brock­man, Greg; Ray, Alex; Puri, Raul; Krueger, Gretchen; Petrov, Michael; Kh­laaf, Heidy; Sas­try, Gir­ish; Mishkin, Pamela; Chan, Brooke; Gray, Scott; Ry­der, Nick; Pavlov, Mikhail; Power, Alethea; Kaiser, Lukasz; Bavar­ian, Mo­ham­mad; Win­ter, Cle­mens; Tillet, Philippe; Such, Felipe Pet­roski; Cum­mings, Dave; Plap­pert, Matthias; Chantzis, Fo­tios; Barnes, Eliz­a­beth; Her­bert-Voss, Ariel; Guss, William He­b­gen; Ni­chol, Alex; Paino, Alex; Tezak, Niko­las; Tang, Jie; Babuschkin, Igor; Balaji, Suchir; Jain, Shan­tanu; Saun­ders, William; Hesse, Christo­pher; Carr, An­drew N.; Leike, Jan; Achiam, Josh; Misra, Vedant; Morikawa, Evan; Rad­ford, Alec; Knight, Matthew; Brundage, Miles; Mu­rati, Mira; Mayer, Katie; Welin­der, Peter; McGrew, Bob; Amodei, Dario; McCan­dlish, Sam; Sutskever, Ilya; Zaremba, Wo­j­ciech—Eval­u­at­ing Large Lan­guage Models Trained on Code − 2021-07-07 - https://​​arxiv.org/​​abs/​​2107.03374

Chris­ti­ano, Paul—A naive al­ign­ment strat­egy and op­ti­mism about gen­er­al­iza­tion − 2021-06-09 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​QvtHSsZLFCAHmzes7/​​a-naive-al­ign­ment-strat­egy-and-op­ti­mism-about-generalization

Chris­ti­ano, Paul—Another (outer) al­ign­ment failure story − 2021-04-07 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​AyNHoTWWAJ5eb99ji/​​an­other-outer-al­ign­ment-failure-story

Chris­ti­ano, Paul—Teach­ing ML to an­swer ques­tions hon­estly in­stead of pre­dict­ing hu­man an­swers − 2021-05-28 - https://​​ai-al­ign­ment.com/​​a-prob­lem-and-three-ideas-800b42a14f66

Cihon, Peter; Kleinaltenkamp, Moritz; Schuett, Jonas; Baun, Seth—AI Cer­tifi­ca­tion: Ad­vanc­ing Eth­i­cal Prac­tice by Re­duc­ing In­for­ma­tion Asym­me­tries − 2021-06-02 - https://​​gcrin­sti­tute.org/​​ai-cer­tifi­ca­tion-ad­vanc­ing-eth­i­cal-prac­tice-by-re­duc­ing-in­for­ma­tion-asym­me­tries/​​

Cihon, Peter; Schuett, Jonas; Baun, Seth—Cor­po­rate Gover­nance of Ar­tifi­cial In­tel­li­gence in the Public In­ter­est − 2021-07-05 - https://​​www.mdpi.com/​​2078-2489/​​12/​​7/​​275

Clif­ton, Jesse—Col­lab­o­ra­tive game speci­fi­ca­tion: ar­riv­ing at com­mon mod­els in bar­gain­ing − 2021-03-06 - https://​​longtermrisk.org/​​col­lab­o­ra­tive-game-speci­fi­ca­tion/​​

Clif­ton, Jesse—Weak iden­ti­fi­a­bil­ity and its con­se­quences in strate­gic set­tings − 2021-02-15 - https://​​longtermrisk.org/​​weak-iden­ti­fi­a­bil­ity-and-its-con­se­quences-in-strate­gic-set­tings/​​

Co­hen, Michael; Hut­ter, Mar­cus; Nanda, Neel—Fully Gen­eral On­line Imi­ta­tion Learn­ing − 2021-02-17 - https://​​arxiv.org/​​abs/​​2102.08686

Co­tra, Ajeya—The case for al­ign­ing nar­rowly su­per­hu­man mod­els − 2021-05-03 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​PZt­soaoSLpKjjbMqM/​​the-case-for-al­ign­ing-nar­rowly-su­per­hu­man-mod­els#Isn_t_this_not_ne­glected_be­cause_lots_of_peo­ple_want_use­ful_AI_

Crawford, Ali; Wulkan, Ido—Fed­eral Prize Com­pe­ti­tions − 2021-11-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​fed­eral-prize-com­pe­ti­tions/​​

Critch, An­drew—What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs) − 2021-03-31 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​LpM3EAak­wYdS6aRKf/​​what-mul­ti­po­lar-failure-looks-like-and-ro­bust-agent-agnostic

Dafoe, Allan; Hatz, Sophia; Zhang, Baobao—Co­er­cion and Provo­ca­tion − 2019-11-14 - https://​​ora.ox.ac.uk/​​ob­jects/​​uuid:fc9c9bd4-1cd1-45c4-9e3e-4cd9826171e4

Dafoe, Allan; Hughes, Ed­ward; Bachrach, Yo­ram; Col­lins, Tan­tum; McKee, Kevin R.; Leibo, Joel Z.; Lar­son, Kate; Grae­pel, Thore—Open Prob­lems in Co­op­er­a­tive AI − 2020-12-15 - https://​​arxiv.org/​​abs/​​2012.08630

Dafoe, Allan; Zwet­sloot, Remco; Ce­bul, Matthew - Rep­u­ta­tions for Re­solve and Higher-Order Beliefs in Cri­sis Bar­gain­ing − 2021-03-11 - https://​​jour­nals.sagepub.com/​​doi/​​10.1177/​​0022002721995549

Daniels, Matthew; Mur­phy, Ben—Na­tional Power After AI − 2021-07-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​na­tional-power-af­ter-ai/​​

David­son, Tom—Could Ad­vanced AI Drive Ex­plo­sive Eco­nomic Growth? − 2021-06-25 - https://​​www.open­philan­thropy.org/​​could-ad­vanced-ai-drive-ex­plo­sive-eco­nomic-growth

Davis, Zach—Fea­ture Selec­tion − 2021-10-31 - https://​​www.less­wrong.com/​​posts/​​dYspinGtiba5oDCcv/​​fea­ture-selection

de Neufville, Robert; Baum, Seth—Col­lec­tive Ac­tion on Ar­tifi­cial In­tel­li­gence: A Primer and Re­view − 2021-07-15 - https://​​gcrin­sti­tute.org/​​col­lec­tive-ac­tion-on-ar­tifi­cial-in­tel­li­gence-a-primer-and-re­view/​​

Ding, Jeffrey—China’s Grow­ing In­fluence over the Rules of the Digi­tal Road − 2021-04-15 - https://​​sci-hubtw.hkvisa.net/​​10.1353/​​asp.2021.0015

Ding, Jeffrey; Dafoe, Allan—Eng­ines of Power: Elec­tric­ity, AI, and Gen­eral-Pur­pose Mili­tary Trans­for­ma­tions − 2021-06-08 - https://​​arxiv.org/​​abs/​​2106.04338

Drexler, Eric—QNRs: Toward Lan­guage for In­tel­li­gent Machines − 2021-08-27 - https://​​www.fhi.ox.ac.uk/​​qnrs/​​

Evans, Owain; Cot­ton-Bar­ratt, Owen; Fin­nve­den, Lukas; Bales, Adam; Balwit, Avi­tal; Wills, Peter; Righetti, Luca; Saun­ders, William—Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie − 2021-10-13 - https://​​arxiv.org/​​abs/​​2110.06674

Ever­itt, Tom; Carey, Ryan; Lan­glois, Eric; Ortega, Pe­dro A; Legg, Shane—Agent In­cen­tives: A Causal Per­spec­tive − 2021-02-02 - https://​​arxiv.org/​​abs/​​2102.01685

Feda­siuk, Ryan; Melot, Jen­nifer; Mur­phy, Ben—Har­nessed Light­ning − 2021-10-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​har­nessed-light­ning/​​

Fer­nan­dez, Ronny—How en­ergy effi­cient are hu­man-en­g­ineered flight de­signs rel­a­tive to nat­u­ral ones? − 2020-12-10 - https://​​aiim­pacts.org/​​are-hu­man-en­g­ineered-flight-de­signs-bet­ter-or-worse-than-nat­u­ral-ones/​​

Filan, Daniel—AXRP—the AI X-risk Re­search Pod­cast − 2020-12-23 - https://​​axrp.net/​​

Filan, Daniel; Casper, Stephen; Hod, Shlomi; Wild, Cody; Critch, An­drew; Rus­sell, Stu­art—Cluster­abil­ity in Neu­ral Net­works − 2021-03-04 - https://​​arxiv.org/​​abs/​​2103.03386

Fin­nve­den, Lukas—Ex­trap­o­lat­ing GPT-N perfor­mance − 2020-12-18 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​k2SNji3jXaLGhBeYP/​​ex­trap­o­lat­ing-gpt-n-perfor­mance#comments

Fischer, So­phie-Char­lotte; Le­ung, Jade; An­der­ljung, Markus; O’Keefe, Cul­len; Torges, Ste­fan; Khan, Saif M.; Garfinkel, Ben; Dafoe, Allan—AI Policy Lev­ers: A Re­view of the U.S. Govern­ment’s Tools to Shape AI Re­search, Devel­op­ment, and De­ploy­ment − 2021-03-15 - https://​​www.gov­er­nance.ai/​​re­search-pa­per/​​ai-policy-lev­ers-a-re­view-of-the-u-s-gov­ern­ments-tools-to-shape-ai-re­search-de­vel­op­ment-and-deployment

Fitzger­ald, McKenna; Boddy, Aaron; Baum, Seth − 2020 Sur­vey of Ar­tifi­cial Gen­eral In­tel­li­gence Pro­jects for Ethics, Risk, and Policy − 2020-12-31 - https://​​gcrin­sti­tute.org/​​2020-sur­vey-of-ar­tifi­cial-gen­eral-in­tel­li­gence-pro­jects-for-ethics-risk-and-policy/​​

GAA—Nu­clear Es­pi­onage and AI Gover­nance − 2021-10-04 - https://​​fo­rum.effec­tivealtru­ism.org/​​posts/​​CKfHDw5Lmoo6jahZD/​​nu­clear-es­pi­onage-and-ai-gov­er­nance-1

Gabriel, Ia­son—Towards a The­ory of Jus­tice for Ar­tifi­cial In­tel­li­gence − 2021-10-27 - https://​​arxiv.org/​​abs/​​2110.14419

Galaz, Vic­tor; Cen­teno, Miguel; Cal­la­han, Peter; Cau­se­vic, Amar; Pat­ter­son, Thayer; Brass, Irina; Baum, Seth; Far­ber, Darry; Fischer, Jo­ern; Gar­cia, David; McP­hear­son, Ti­mon; Jimenex, Daniel; King, Brian; Larcey, Paul; Levy, Karen—Ar­tifi­cial In­tel­li­gence, Sys­temic Risks, and Sus­tain­abil­ity − 2021-10-07 - https://​​www.sci­encedi­rect.com/​​sci­ence/​​ar­ti­cle/​​pii/​​S0160791X21002165?via%3Dihub

Garfinkel, Ben—A Tour of Emerg­ing Cryp­to­graphic Tech­nolo­gies − 2021-05-15 - https://​​www.gov­er­nance.ai/​​re­search-pa­per/​​a-tour-of-emerg­ing-cryp­to­graphic-technologies

Garrabrant, Scott—Tem­po­ral In­fer­ence with Finite Fac­tored Sets − 2021-10-23 - https://​​arxiv.org/​​abs/​​2109.11513

Gates, Vael; Cal­l­away, Fred­er­ick; Ho, Mark; Griffiths, Thomas—A ra­tio­nal model of peo­ple’s in­fer­ences about oth­ers’ prefer­ences based on re­sponse times − 2021-03-15 - https://​​psyarxiv.com/​​25zfx/​​

Grace, Katja—Ar­gu­ment for AI x-risk from large im­pacts − 2021-09-29 - https://​​aiim­pacts.org/​​ar­gu­ment-from-large-im­pacts/​​

Grace, Katja—Beyond fire alarms: free­ing the group­struck − 2021-09-26 - https://​​aiim­pacts.org/​​be­yond-fire-alarms-free­ing-the-group­struck/​​

Grace, Katja—Co­her­ence ar­gu­ments im­ply a force for goal-di­rected be­hav­ior − 2021-03-25 - https://​​aiim­pacts.org/​​co­her­ence-ar­gu­ments-im­ply-a-force-for-goal-di­rected-be­hav­ior/​​

Greaves, Hilary; MacAskill, William—The case for strong longter­mism − 2021-06-15 - https://​​globalpri­ori­tiesin­sti­tute.org/​​hilary-greaves-william-macaskill-the-case-for-strong-longter­mism-2/​​

Guter­res, An­tónio—Our Com­mon Agenda − 2021-09-10 - https://​​www.un.org/​​en/​​un75/​​com­mon-agenda

Ham­mond, Lewis; Fox, James; Ever­itt, Tom; Abate, Ales­san­dro; Wooldridge, Michael—Equil­ibrium Refine­ments for Multi-Agent In­fluence Di­a­grams: The­ory and Prac­tice − 2021-02-09 - https://​​arxiv.org/​​abs/​​2102.05008

Hendrycks, Dan; Car­lini, Ni­cholas; Schul­man, John; Stein­hardt, Ja­cob—Un­solved Prob­lems in ML Safety − 2021-09-28 - https://​​arxiv.org/​​abs/​​2109.13916

Hendrycks, Dan; Mazeika, Man­tas; Zou, Andy; Pa­tel, Sahil; Zhu, Chris­tine; Navarro, Je­sus; Song, Dawn; Li, Bo; Stein­hardt, Ja­cob—What Would Jiminy Cricket Do? Towards Agents That Be­have Mo­rally − 2021-10-25 - https://​​arxiv.org/​​abs/​​2110.13136

Hod, Shlomi; Casper, Stephen; Filan, Daniel; Wild, Cody; Critch, An­drew; Rus­sell, Stu­art—De­tect­ing Mo­du­lar­ity in Deep Neu­ral Net­works − 2021-10-13 - https://​​arxiv.org/​​abs/​​2110.08058

Hua, Shin-Shin; Belfield, Haydn—AI & An­titrust: Rec­on­cil­ing Ten­sions Between Com­pe­ti­tion Law and Co­op­er­a­tive AI Devel­op­ment − 2021-11-15 - https://​​yjolt.org/​​ai-an­titrust-rec­on­cil­ing-ten­sions-be­tween-com­pe­ti­tion-law-and-co­op­er­a­tive-ai-development

Im­brie, An­drew; Gel­les, Re­becca; Dun­ham, James; Aiken, Cather­ine—Eval­u­at­ing Rhetor­i­cal Dy­nam­ics in AI − 2021-05-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​con­tend­ing-frames/​​

Jiang, Liwei; Hwang, Jena D.; Bha­ga­vat­ula, Chan­dra; Bras, Ro­nan Le; Forbes, Maxwell; Bor­chardt, Jon; Liang, Jenny; Etz­ioni, Oren; Sap, Maarten; Choi, Ye­jin—Delphi: Towards Ma­chine Ethics and Norms − 2021-10-14 - https://​​arxiv.org/​​abs/​​2110.07574

Karnofsky, Holden—All Pos­si­ble Views About Hu­man­ity’s Fu­ture Are Wild − 2021-07-13 - https://​​fo­rum.effec­tivealtru­ism.org/​​s/​​isENJuPdB3fhjWYHd/​​p/​​TwQzyP3QgttmuTHym

Klinova, Katya; Korinek, An­ton—AI and Shared Pros­per­ity − 2021-05-18 - https://​​arxiv.org/​​abs/​​2105.08475

Koch, Jack; Lan­gosco, Lauro; Pfau, Ja­cob; Le, James; Sharkey, Lee—Ob­jec­tive Ro­bust­ness in Deep Re­in­force­ment Learn­ing − 2021-05-28 - https://​​arxiv.org/​​abs/​​2105.14111

Koko­ta­jlo, Daniel—Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain − 2021-01-18 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​HhWhaSzQr6xmBki8F/​​birds-brains-planes-and-ai-against-ap­peals-to-the-com­plex­ity#comments

Korinek, An­ton; Stiglitz, Joseph—Ar­tifi­cial In­tel­li­gence, Global­iza­tion, and Strate­gies for Eco­nomic Devel­op­ment − 2021-02-04 - https://​​pa­pers.ssrn.com/​​sol3/​​pa­pers.cfm?ab­stract_id=3812820

Laid­law, Cas­sidy; Rus­sell, Stu­art—Uncer­tain De­ci­sions Fa­cil­i­tate Bet­ter Prefer­ence Learn­ing − 2021-01-15 - https://​​pro­ceed­ings.neurips.cc/​​pa­per/​​2021/​​hash/​​7f141cf8e7136ce8701dc6636c2a6fe4-Ab­stract.html

Lee, Kimin; Smith, Laura; Abbeel, Pieter—PEBBLE: Feed­back-Effi­cient In­ter­ac­tive Re­in­force­ment Learn­ing via Re­la­bel­ing Ex­pe­rience and Un­su­per­vised Pre-train­ing − 2021-06-09 - https://​​arxiv.org/​​abs/​​2106.05091

lifelon­glearner; Hase, Peter—Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers − 2021-04-09 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​GEPX7jgLMB8vR2qaK/​​opinions-on-in­ter­pretable-ma­chine-learn­ing-and-70-summaries

Lin, Stephanie; Hil­ton, Ja­cob; Evans, Owain—Truth­fulQA: Mea­sur­ing How Models Mimic Hu­man False­hoods − 2021-10-08 - https://​​arxiv.org/​​abs/​​2109.07958

Lind­ner, David; Shah, Ro­hin; Abbeel, Pieter; Dra­gan, Anca—Learn­ing What To Do by Si­mu­lat­ing the Past − 2021-04-08 - https://​​arxiv.org/​​abs/​​2104.03946

Liu, Hin-Yan; Maas, Matthijs - ‘Solv­ing for X?’ Towards a prob­lem-find­ing frame­work to ground long-term gov­er­nance strate­gies for ar­tifi­cial in­tel­li­gence − 2021-02-00 - https://​​www.re­search­gate.net/​​pub­li­ca­tion/​​342774816_%27Solv­ing_for_X%27_Towards_a_prob­lem-find­ing_frame­work_to_ground_long-term_gov­er­nance_strate­gies_for_ar­tifi­cial_intelligence

Maas, Matthijs—AI, Gover­nance Dis­place­ment, and the (De)Frag­men­ta­tion of In­ter­na­tional Law − 2021-03-22 - https://​​www.cser.ac.uk/​​re­sources/​​ai-gov­er­nance-dis­place­ment-and-defrag­men­ta­tion-in­ter­na­tional-law/​​

Maas, Matthijs—Align­ing AI Reg­u­la­tion to So­ciotech­ni­cal Change − 2021-06-23 - https://​​pa­pers.ssrn.com/​​sol3/​​pa­pers.cfm?ab­stract_id=3871635

Maas, Matthijs; Stix, Char­lotte—Bridg­ing the gap: the case for an ‘In­com­pletely The­o­rized Agree­ment’ on AI policy − 2021-01-18 - https://​​www.cser.ac.uk/​​re­sources/​​bridg­ing-gap-case-in­com­pletely-the­o­rized-agree­ment-ai-policy/​​

Ma­clure, Jo­celyn; Rus­sell, Stu­art—AI for Hu­man­ity: The Global Challenges − 2021-02-06 - https://​​smile.ama­zon.com/​​gp/​​product/​​B08W3XZ1TJ/​​ref=ppx_yo_dt_b_d_asin_ti­tle_o00?ie=UTF8&psc=1

Man­heim, David; Sand­berg, An­ders—What is the Up­per Limit of Value? − 2021-01-27 - https://​​philarchive.org/​​rec/​​MANWIT-6

Mit­tel­steadt, Matthew—Mechanisms to En­sure AI Arms Con­trol Com­pli­ance − 2021-02-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​ai-ver­ifi­ca­tion/​​

Mo­gensen, An­dreas—Do not go gen­tle: why the Asym­me­try does not sup­port anti-na­tal­ism − 2021-05-15 - https://​​globalpri­ori­tiesin­sti­tute.org/​​do-not-go-gen­tle-why-the-asym­me­try-does-not-sup­port-anti-na­tal­ism-an­dreas-mo­gensen-global-pri­ori­ties-in­sti­tute-oxford-uni­ver­sity/​​

Mur­phy, Ben—Trans­la­tion: Eth­i­cal Norms for New Gen­er­a­tion Ar­tifi­cial In­tel­li­gence Re­leased − 2021-10-21 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​eth­i­cal-norms-for-new-gen­er­a­tion-ar­tifi­cial-in­tel­li­gence-re­leased/​​

Mur­phy, Ben—Trans­la­tion: White Paper on Trust­wor­thy Ar­tifi­cial In­tel­li­gence − 2021-09-14 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​white-pa­per-on-trust­wor­thy-ar­tifi­cial-in­tel­li­gence/​​

Ni­con­i­coni—Whole Brain Emu­la­tion: No Progress on C. el­gans After 10 Years − 2021-10-01 - https://​​www.less­wrong.com/​​posts/​​mHqQxwKuzZS69CXX5/​​whole-brain-em­u­la­tion-no-progress-on-c-el­gans-af­ter-10-years

Oester­held, Cas­par; Conitzer, Vin­cent—Safe Pareto Im­prove­ments for Del­e­gated Game Play­ing − 2021-05-03 - https://​​users.cs.duke.edu/​​~conitzer/​​safeAAMAS21.pdf

Ord, Toby; Mercer, An­gus; Dan­nreuther, So­phie; Nel­son, Cas­sidy; Lewis, Gre­gory; Millett, Piers; Whit­tle­stone, Jess; Le­ung, Jade; An­der­ljung, Markus; Hil­ton, Sam; Belfield, Haydn—Fu­ture Proof: The Op­por­tu­nity to Trans­form the UK’s Re­silience to Ex­treme Risks − 2021-06-15 - https://​​www.gov­er­nance.ai/​​re­search-pa­per/​​fu­ture­proof-ar­tifi­cial-in­tel­li­gence-chapter

Owe, An­drea; Baum, Seth—Mo­ral Con­sid­er­a­tion of Non­hu­mans in the Ethics of Ar­tifi­cial In­tel­li­gence − 2021-06-07 - https://​​gcrin­sti­tute.org/​​moral-con­sid­er­a­tion-of-non­hu­mans-in-the-ethics-of-ar­tifi­cial-in­tel­li­gence/​​

Owe, An­drea; Baum, Seth—The Ethics of Sus­tain­abil­ity for Ar­tifi­cial In­tel­li­gence − 2021-11-17 - https://​​gcrin­sti­tute.org/​​the-ethics-of-sus­tain­abil­ity-for-ar­tifi­cial-in­tel­li­gence/​​

Prunkl, Ca­rina; Ashurst, Carolyn; An­der­ljung, Markus; Webb, He­lena; Leike, Jan; Dafoe, Allan—In­sti­tu­tion­al­iz­ing ethics in AI through broader im­pact re­quire­ments − 2021-02-17 - http://​​www.cs.jhu.edu/​​~misha/​​DIRead­ingSem­i­nar/​​Papers/​​Prunkl21.pdf

Ro­man, Char­lotte; Den­nis, Michael; Critch, An­drew; Rus­sell, Stu­art—Ac­cu­mu­lat­ing Risk Cap­i­tal Through In­vest­ing in Co­op­er­a­tion − 2021-01-25 - https://​​arxiv.org/​​abs/​​2101.10305

Rud­ner, Tim; Toner, He­len—Key Con­cepts in AI Safety: An Overview − 2021-03-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​key-con­cepts-in-ai-safety-an-overview/​​

Rud­ner, Tim; Toner, He­len—Key Con­cepts in AI Safety: In­ter­pretabil­ity in Ma­chine Learn­ing − 2021-03-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​key-con­cepts-in-ai-safety-in­ter­pretabil­ity-in-ma­chine-learn­ing/​​

Rud­ner, Tim; Toner, He­len—Key Con­cepts in AI Safety: Ro­bust­ness and Ad­ver­sar­ial Ex­am­ples − 2021-03-15 - https://​​cset.george­town.edu/​​pub­li­ca­tion/​​key-con­cepts-in-ai-safety-ro­bust­ness-and-ad­ver­sar­ial-ex­am­ples/​​

Shah, Ro­hin; Wild, Cody; Wang, Steven H.; Alex, Neel; Houghton, Bran­don; Guss, William; Mo­hanty, Sharada; Kan­ervisto, Anssi; Milani, Stephanie; Topin, Ni­cholay; Abbeel, Pieter; Rus­sell, Stu­art; Dra­gan, Anca—The MineRL BASALT Com­pe­ti­tion on Learn­ing from Hu­man Feed­back − 2021-07-05 - https://​​arxiv.org/​​abs/​​2107.01969

Sh­legeris, Buck—Red­wood Re­search’s cur­rent pro­ject − 2021-09-21 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​k7oxd­bNaGATZbtEg3/​​red­wood-re­search-s-cur­rent-project

Sh­legeris, Buck—The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes − 2021-09-21 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​HHunb8FPn­hWaDAQci/​​the-al­ign­ment-prob­lem-in-differ­ent-ca­pa­bil­ity-regimes

Soares, Nate—Visi­ble Thoughts Pro­ject and Bounty An­nounce­ment − 2021-11-29 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​zRn6cLtxyNo­dudzhw/​​visi­ble-thoughts-pro­ject-and-bounty-announcement

Stastny, Ju­lian; Treut­lein, Jo­hannes; Riché, Maxime; Clif­ton, Jesse—Multi-agent learn­ing in mixed-mo­tive co­or­di­na­tion prob­lems − 2021-03-15 - https://​​longtermrisk.org/​​files/​​stastny_et_al_im­plicit_bar­gain­ing.pdf

Stooke, Adam; Ma­ha­jan, Anuj; Bar­ros, Cata­rina; Deck, Char­lie; Bauer, Jakob; Syg­nowski, Jakub; Tre­bacz, Maja; Jader­berg, Max; Mathieu, Michael; McAleese, Nat; Bradley-Sch­mieg, Nathalie; Wong, Nathaniel; Por­cel, Ni­co­las; Raileanu, Roberta; Hughes-Fitt, Steph; Czar­necki, Valentin Dal­ibard and Wo­j­ciech Mar­ian—Open-Ended Learn­ing Leads to Gen­er­ally Ca­pable Agents − 2021-07-27 - https://​​deep­mind.com/​​re­search/​​pub­li­ca­tions/​​2021/​​open-ended-learn­ing-leads-to-gen­er­ally-ca­pa­ble-agents

Thomas, Teruji—Si­mu­la­tion Ex­pec­ta­tion − 2021-09-15 - https://​​globalpri­ori­tiesin­sti­tute.org/​​simu­la­tion-ex­pec­ta­tion-teruji-thomas-global-pri­ori­ties-in­sti­tute-uni­ver­sity-of-oxford/​​

Thorstad, David—The scope of longter­mism − 2021-06-15 - https://​​globalpri­ori­tiesin­sti­tute.org/​​the-scope-of-longter­mism-david-thorstad-global-pri­ori­ties-in­sti­tute-uni­ver­sity-of-oxford/​​

Tram­mell, Philip; Korinek, An­ton—Eco­nomic Growth Un­der Trans­for­ma­tive AI: A Guide to the Vast Range of Pos­si­bil­ities for Out­put Growth, Wages, and the Labor­share − 2020-02-04 - https://​​www.gov­er­nance.ai/​​re­search-pa­per/​​eco­nomic-growth-un­der-trans­for­ma­tive-ai-a-guide-to-the-vast-range-of-pos­si­bil­ities-for-out­put-growth-wages-and-the-laborshare

Turner, Alex—Satis­ficers Tend To Seek Power: In­stru­men­tal Con­ver­gence Via Re­tar­getabil­ity − 2021-11-17 - https://​​www.less­wrong.com/​​posts/​​nZY8Np759HYFawdjH/​​satis­ficers-tend-to-seek-power-in­stru­men­tal-con­ver­gence-via

Welbl, Jo­hannes; Glaese, Amelia; Ue­sato, Jonathan; Dathathri, Su­manth; Mel­lor, John; Hen­dricks, Lisa Anne; An­der­son, Kirsty; Kohli, Push­meet; Cop­pin, Ben; Huang, Po-Sen—Challenges in De­tox­ify­ing Lan­guage Models − 2021-09-15 - https://​​arxiv.org/​​abs/​​2109.07445

Went­worth, John—How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency − 2021-11-18 - https://​​www.less­wrong.com/​​posts/​​P3Yt66Wh5g7SbkKuT/​​how-to-get-into-in­de­pen­dent-re­search-on-al­ign­ment-agency#Meta

Went­worth, John—Utility Max­i­miza­tion = De­scrip­tion Length Min­i­miza­tion − 2021-02-18 - https://​​www.al­ign­ment­fo­rum.org/​​posts/​​voLHQgNnc­n­jjgAPH7/​​util­ity-max­i­miza­tion-de­scrip­tion-length-minimization

Whit­tle­stone, Jess; Clark, Jack—Why and How Govern­ments Should Mon­i­tor AI Devel­op­ment − 2021-08-31 - https://​​www.cser.ac.uk/​​re­sources/​​why-and-how-gov­ern­ments-should-mon­i­tor-ai-de­vel­op­ment/​​

Woods, Kat—The Non­lin­ear Library − 2021-10-19 - https://​​fo­rum.effec­tivealtru­ism.org/​​posts/​​JTZTBien­qWEAjGDRv/​​listen-to-more-ea-con­tent-with-the-non­lin­ear-library

Yud­kowsky, Eliezer—Dis­cus­sion with Eliezer Yud­kowsky on AGI in­ter­ven­tions − 2021-11-10 - https://​​www.less­wrong.com/​​posts/​​Cpvy­hFy9WvCNsifkY/​​dis­cus­sion-with-eliezer-yud­kowsky-on-agi-interventions

Yud­kowsky, Eliezer—Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds” − 2021-11-22 - https://​​fo­rum.effec­tivealtru­ism.org/​​posts/​​rho5vtxSaEdXxLu3o/​​yud­kowsky-and-chris­ti­ano-dis­cuss-take­off-speeds

Zaidi, Waqar; Dafoe, Allan—In­ter­na­tional Con­trol of Pow­er­ful Tech­nol­ogy: Les­sons from the Baruch Plan for Nu­clear Weapons − 2021-03-15 - https://​​www.gov­er­nance.ai/​​re­search-pa­per/​​in­ter­na­tional-con­trol-of-pow­er­ful-tech­nol­ogy-les­sons-from-the-baruch-plan-for-nu­clear-weapons

Zhang, Baobao; An­der­ljung, Markus; Kahn, Lau­ren; Drek­sler, Noemi; Horow­itz, Michael C.; Dafoe, Allan—Ethics and Gover­nance of Ar­tifi­cial In­tel­li­gence: Ev­i­dence from a Sur­vey of Ma­chine Learn­ing Re­searchers − 2021-08-15 - https://​​jair.org/​​in­dex.php/​​jair/​​ar­ti­cle/​​view/​​12895/​​26701

Zhang, Ti­an­jun; Rashidine­jad, Paria; Jiao, Ji­an­tao; Tian, Yuan­dong; Gon­za­lez, Joseph E.; Rus­sell, Stu­art—MADE: Ex­plo­ra­tion via Max­i­miz­ing De­vi­a­tion from Ex­plored Re­gions − 2021-01-15 - https://​​pro­ceed­ings.neurips.cc/​​pa­per/​​2021/​​hash/​​5011bf6d8a37692913fce3a15a51f070-Ab­stract.html

Zhuang, Si­mon; Had­field-Menell, Dy­lan—Con­se­quences of Misal­igned AI − 2021-02-07 - https://​​arxiv.org/​​abs/​​2102.03896