Preface to EAF’s Research Agenda on Cooperation, Conflict, and TAI

The Effec­tive Altru­ism Foun­da­tion (EAF) is fo­cused on re­duc­ing risks of as­tro­nom­i­cal suffer­ing, or s-risks, from trans­for­ma­tive ar­tifi­cial in­tel­li­gence (TAI). S-risks are defined as as events that would bring about suffer­ing on an as­tro­nom­i­cal scale, vastly ex­ceed­ing all suffer­ing that has ex­isted on earth so far[1]. As has been dis­cussed el­se­where, s-risks might arise by malev­olence, by ac­ci­dent, or in the course of con­flict.

We be­lieve that s-risks aris­ing from con­flict are among the most im­por­tant, tractable, and ne­glected of these. In par­tic­u­lar, strate­gic threats by pow­er­ful AI agents or AI-as­sisted hu­mans against al­tru­is­tic val­ues may be among the largest sources of ex­pected suffer­ing. Strate­gic threats have his­tor­i­cally been a source of sig­nifi­cant dan­ger to civ­i­liza­tion (the Cold War be­ing a prime ex­am­ple). And the po­ten­tial down­sides from such threats, in­clud­ing those in­volv­ing large amounts of suffer­ing, may in­crease sig­nifi­cantly with the emer­gence of trans­for­ma­tive AI sys­tems. For this rea­son, our cur­rent fo­cus is tech­ni­cal and strate­gic anal­y­sis aimed at ad­dress­ing these risks.

There are many other im­por­tant in­ter­ven­tions for s-risk re­duc­tion which are be­yond the scope of this agenda. Th­ese in­clude macros­trat­egy re­search on ques­tions re­lat­ing to s-risk; re­duc­ing the like­li­hood of s-risks from ha­tred, sadism, and other kinds of malev­olent in­tent; and pro­mot­ing con­cern for digi­tal minds. EAF has been sup­port­ing work in these ar­eas as well, and will con­tinue to do so.

In this se­quence of posts, we will pre­sent our re­search agenda on Co­op­er­a­tion, Con­flict, and Trans­for­ma­tive Ar­tifi­cial In­tel­li­gence. It is a stan­dalone doc­u­ment in­tended to be in­ter­est­ing to peo­ple work­ing in AI safety and strat­egy, with aca­demics work­ing in rele­vant sub­fields as a sec­ondary au­di­ence. With a broad fo­cus on is­sues re­lated to co­op­er­a­tion in the con­text of pow­er­ful AI sys­tems, we think the ques­tions raised in the agenda are benefi­cial from a range of both nor­ma­tive views and em­piri­cal be­liefs about the fu­ture course of AI de­vel­op­ment, even if at EAF we are par­tic­u­larly con­cerned with s-risks.

The pur­pose of this se­quence is to

  • com­mu­ni­cate what we think are the most im­por­tant, tractable, and ne­glected tech­ni­cal AI re­search di­rec­tions for re­duc­ing s-risks;

  • com­mu­ni­cate what we think are the most promis­ing di­rec­tions for re­duc­ing down­sides from threats more gen­er­ally;

  • ex­pli­cate sev­eral novel or lit­tle-dis­cussed con­sid­er­a­tions con­cern­ing co­op­er­a­tion and AI safety, such as sur­ro­gate goals;

  • pro­pose con­crete re­search ques­tions which could be ad­dressed as part of an EAF Fund-sup­ported pro­ject, by those in­ter­ested in work­ing as a full-time re­searcher at EAF, or by re­searchers in academia, or at other EA or­ga­ni­za­tions, think tanks, or AI labs;

  • con­tribute to the port­fo­lio of re­search di­rec­tions which are of in­ter­est to the longter­mist EA and AI safety com­mu­ni­ties broadly.

The agenda is di­vided into the fol­low­ing sec­tions:

  • AI strat­egy and gov­er­nance. What does the strate­gic land­scape at time of TAI de­vel­op­ment look like (e.g., unipo­lar or mul­ti­po­lar, bal­ance be­tween offen­sive and defen­sive ca­pa­bil­ities?), and what does this im­ply for co­op­er­a­tion failures? How can we shape the gov­er­nance of AI so as to re­duce the chances of catas­trophic co­op­er­a­tion failures?

  • Cred­i­bil­ity. What might the na­ture of cred­ible com­mit­ment among TAI sys­tems look like, and what are the im­pli­ca­tions for im­prov­ing co­op­er­a­tion? Can we de­velop new the­ory (such as open-source game the­ory) to ac­count for rele­vant fea­tures of AI?

  • Peace­ful bar­gain­ing mechanisms. Can we fur­ther de­velop bar­gain­ing strate­gies which do not lead to de­struc­tive con­flict (e.g., by im­ple­ment­ing sur­ro­gate goals)?

  • Con­tem­po­rary AI ar­chi­tec­tures. How can we make progress on re­duc­ing co­op­er­a­tion failures us­ing con­tem­po­rary AI tools — for in­stance, learn­ing to solve so­cial dilem­mas among deep re­in­force­ment learn­ers?

  • Hu­mans in the loop. How do we ex­pect hu­man over­seers or op­er­a­tors of AI sys­tems to be­have in in­ter­ac­tions be­tween hu­mans and AIs? How can hu­man-in-the-loop sys­tems be de­signed to re­duce the chances of con­flict?

  • Foun­da­tions of ra­tio­nal agency, in­clud­ing bounded de­ci­sion the­ory and acausal rea­son­ing.

We plan to post two sec­tions ev­ery other day. The next post in the se­quence, “Sec­tions 1 & 2: In­tro­duc­tion, Strat­egy and Gover­nance” will be posted on Sun­day, De­cem­ber 15.


  1. This defi­ni­tion may be up­dated in the near fu­ture. ↩︎