Acknowledgements & References

This post is part of the se­quence ver­sion of the Effec­tive Altru­ism Foun­da­tion’s re­search agenda on Co­op­er­a­tion, Con­flict, and Trans­for­ma­tive Ar­tifi­cial In­tel­li­gence.


As noted in the doc­u­ment, sev­eral sec­tions of this agenda drew on writ­ings by Lukas Gloor, Daniel Koko­ta­jlo, Anni Leskelä, Cas­par Oester­held, and Jo­hannes Treut­lein. Thank you very much to David Althaus, To­bias Bau­mann, Alexis Car­lier, Alex Cloud, Max Daniel, Michael Den­nis, Lukas Gloor, Adrian Hut­ter, Daniel Koko­ta­jlo, János Kramár, David Krueger, Anni Leskelä, Matthijs Maas, Linh Chi Nguyen, Richard Ngo, Cas­par Oester­held, Ma­hen­dra Prasad, Ro­hin Shah, Carl Shul­man, Ste­fan Torges, Jo­hannes Treut­lein, and Jonas Vol­lmer for com­ments on drafts of this doc­u­ment. Thank you also to the par­ti­ci­pants of the Effec­tive Altru­ism Foun­da­tion re­search re­treat and work­shops, whose con­tri­bu­tions also helped to shape this agenda.


Arif Ahmed. Ev­i­dence, de­ci­sion and causal­ity. Cam­bridge Univer­sity Press, 2014.

AI Im­pacts. Like­li­hood of dis­con­tin­u­ous progress around the de­vel­op­ment of agi. https://​​aiim­​​like­li­hood-of-dis­con­tin­u­ous-progress-around-the-de­vel­op­ment-of-agi/​​, 2018. Ac­cessed: July 1 2019.

Riad Akrour, Marc Schoe­nauer, and Michele Se­bag. Prefer­ence-based policy learn­ing. In Joint Euro­pean Con­fer­ence on Ma­chine Learn­ing and Knowl­edge Dis­cov­ery in Databases, pages 12–27. Springer, 2011.

Steffen An­der­sen, Seda Er­taç, Uri Gneezy, Moshe Hoff­man, and John A List. Stakes mat­ter in ul­ti­ma­tum games. Amer­i­can Eco­nomic Re­view, 101(7):3427-39, 2011.

Giu­lia An­drighetto, Daniela Grieco, and Rosaria Conte. Fair­ness and com­pli­ance in the ex­tor­tion game. 2015.

Scott E Atk­in­son, Todd San­dler, and John Tschirhart. Ter­ror­ism in a bar­gain­ing frame­work. The Jour­nal of Law and Eco­nomics, 30(1):1-21, 1987.

Robert Ax­elrod. On six ad­vances in co­op­er­a­tion the­ory. Analyse & Kri­tik, 22(1):130-151, 2000.

Robert Ax­elrod and William D Hamil­ton. The evolu­tion of co­op­er­a­tion. sci­ence, 211 (4489):1390-1396, 1981.

Kyle Bag­well. Com­mit­ment and ob­serv­abil­ity in games. Games and Eco­nomic Be­hav­ior, 8(2):271-280, 1995.

To­bias Bau­mann. Sur­ro­gate goals to deflect threats. http://​​​​us­ing-sur­ro­gate-goals-to-deflect-threats/​​, 2017. Ac­cessed March 6, 2019.

To­bias Bau­mann. Challenges to im­ple­ment­ing sur­ro­gate goals. http://​​​​challenges-to-im­ple­ment­ing-sur­ro­gate-goals/​​, 2018. Ac­cessed March 6, 2019.

To­bias Bau­mann, Thore Grae­pel, and John Shawe-Tay­lor. Adap­tive mechanism de­sign: Learn­ing to pro­mote co­op­er­a­tion. arXiv preprint arXiv:1806.04067, 2018.

Ken Bin­more, Ariel Ru­bin­stein, and Asher Wolin­sky. The nash bar­gain­ing solu­tion in eco­nomic mod­el­ling. The RAND Jour­nal of Eco­nomics, pages 176-188, 1986.

Iris Bohnet, Bruno S Frey, and Steffen Huck. More or­der with less law: On con­tract en­force­ment, trust, and crowd­ing. Amer­i­can Poli­ti­cal Science Re­view, 95(1):131-144, 2001.

Friedel Bolle, Yves Bre­it­moser, and Steffen Sch­lächter. Ex­tor­tion in the lab­o­ra­tory. Jour­nal of Eco­nomic Be­hav­ior & Or­ga­ni­za­tion, 78(3):207-218, 2011.

Gary E Bolton and Axel Ock­en­fels. Erc: A the­ory of equity, re­ciproc­ity, and com­pe­ti­tion. Amer­i­can eco­nomic re­view, 90(1):166-193, 2000.

Nick Bostrom. Eth­i­cal is­sues in ad­vanced ar­tifi­cial in­tel­li­gence. Science Fic­tion and Philos­o­phy: From Time Travel to Su­per­in­tel­li­gence, pages 277-284, 2003.

Nick Bostrom. Su­per­in­tel­li­gence: paths, dan­gers, strate­gies. 2014.

Ro­nen I Braf­man and Moshe Ten­nen­holtz. Effi­cient learn­ing equil­ibrium. In Ad­vances in Neu­ral In­for­ma­tion Pro­cess­ing Sys­tems, pages 1635-1642, 2003.

R. A. Briggs. Nor­ma­tive the­o­ries of ra­tio­nal choice: Ex­pected util­ity. In Ed­ward N. Zalta, ed­i­tor, The Stan­ford En­cy­clo­pe­dia of Philos­o­phy. Me­ta­physics Re­search Lab, Stan­ford Univer­sity, fall 2019 edi­tion, 2019.

Ernst Brit­ting and Hartwig Spitzer. The open skies treaty. Ver­ifi­ca­tion Year­book, pages 221-237, 2002.

Colin Camerer and Teck Hua Ho. Ex­pe­rience-weighted at­trac­tion learn­ing in nor­mal form games. Econo­met­rica, 67(4):827-874, 1999.

Colin F Camerer. Be­havi­oural game the­ory. Springer, 2008.

Colin F Camerer, Teck-Hua Ho, and Juin-Kuan Chong. A cog­ni­tive hi­er­ar­chy model of games. The Quar­terly Jour­nal of Eco­nomics, 119(3):861-898, 2004.

Christo­pher Ch­er­niak. Com­pu­ta­tional com­plex­ity and the uni­ver­sal ac­cep­tance of logic. The Jour­nal of Philos­o­phy, 81(12):739-758, 1984.

Thomas J Christensen and Jack Sny­der. Chain gangs and passed bucks: Pre­dict­ing al­li­ance pat­terns in mul­ti­po­lar­ity. In­ter­na­tional or­ga­ni­za­tion, 44(2):137-168, 1990.

Paul Chris­ti­ano. Ap­proval di­rected agents. https://​​ai-al­ign­​​model-free-de­ci­sions-6e6609f5d99e, 2014. Ac­cessed: March 15 2019.

Paul Chris­ti­ano. Hu­mans con­sult­ing hch. https://​​ai-al­ign­​​hu­mans-con­sult­ing-hch-f893f6051455, 2016a.

Paul Chris­ti­ano. Pro­saic ai al­ign­ment. https://​​ai-al­ign­​​pro­saic-ai-con­trol-b959644d79c2, 2016b. Ac­cessed: March 13 2019.

Paul Chris­ti­ano. Clar­ify­ing “ai al­ign­ment”. https://​​ai-al­ign­​​clar­ify­ing-ai-al­ign­ment-cec47cd69dd6, 2018a. Ac­cessed: Oc­to­ber 10 2019.

Paul Chris­ti­ano. Pre­face to the se­quence on iter­ated am­plifi­ca­tion. https://​​www.less­​​s/​​XshCxPjnBec52EcLB/​​p/​​HCv2uwgDGf5dyX5y6, 2018b. Ac­cessed March 6, 2019.

Paul Chris­ti­ano. Pre­face to the se­quence on iter­ated am­plifi­ca­tion. https://​​www.less­​​posts/​​HCv2uwgDGf5dyX5y6/​​pref­ace-to-the-se­quence-on-iter­ated-am­plifi­ca­tion, 2018c. Ac­cessed: Oc­to­ber 10 2019.

Paul Chris­ti­ano. Tech­niques for op­ti­miz­ing worst-case perfor­mance. https://​​ai-al­ign­​​tech­niques-for-op­ti­miz­ing-worst-case-perfor­mance-39eafec74b99, 2018d. Ac­cessed: June 24, 2019.

Paul Chris­ti­ano. What failure looks like. https://​​www.less­​​posts/​​HBxe6wd­jxK239zajf/​​what-failure-looks-like, 2019. Ac­cessed: July 2 2019.

Paul Chris­ti­ano and Robert Wiblin. Should we leave a helpful mes­sage for fu­ture civ­i­liza­tions, just in case hu­man­ity dies out? https://​​​​pod­cast/​​epi­sodes/​​paul-chris­ti­ano-a-mes­sage-for-the-fu­ture/​​, 2019. Ac­cessed: Septem­ber 25, 2019.

Paul F Chris­ti­ano, Jan Leike, Tom Brown, Mil­jan Mar­tic, Shane Legg, and Dario Amodei. Deep re­in­force­ment learn­ing from hu­man prefer­ences. In Ad­vances in Neu­ral In­for­ma­tion Pro­cess­ing Sys­tems, pages 4299-4307, 2017.

Mark Coeck­elbergh. Can we trust robots? Ethics and in­for­ma­tion tech­nol­ogy, 14(1):53-60, 2012.

EA Con­cepts. Im­por­tance, tractabil­ity, ne­glect­ed­ness frame­work. https://​​con­cepts.effec­tivealtru­​​con­cepts/​​im­por­tance-ne­glect­ed­ness-tractabil­ity/​​, n.d. Ac­cessed: July 1 2019.

Ajeya Co­tra. Iter­ated dis­til­la­tion and am­plifi­ca­tion. https://​​­ign­ment­fo­​​posts/​​HqLxuZ4LhaFh­mAHWk/​​iter­ated-dis­til­la­tion-and-am­plifi­ca­tion, 2018. Ac­cessed: July 25 2019.

Ja­cob W Cran­dall, Mayada Ou­dah, Fa­timah Ishowo-Oloko, Sherief Ab­dal­lah, Jean-François Bon­nefon, Manuel Ce­brian, Azim Shar­iff, Michael A Goodrich, Iyad Rah­wan, et al. Co­op­er­at­ing with ma­chines. Na­ture com­mu­ni­ca­tions, 9(1):233, 2018.

An­drew Critch. A para­met­ric, re­source-bounded gen­er­al­iza­tion of loeb’s the­o­rem, and a ro­bust co­op­er­a­tion crite­rion for open-source game the­ory. The Jour­nal of Sym­bolic Logic, pages 1-15, 2019.

Allan Dafoe. Ai gov­er­nance: A re­search agenda. Gover­nance of AI Pro­gram, Fu­ture of Hu­man­ity In­sti­tute, Univer­sity of Oxford: Oxford, UK, 2018.

Wei Dai. Towards a new de­ci­sion the­ory. https://​​www.less­​​posts/​​de3xjFaACCAk6imzv/​​to­wards-a-new-de­ci­sion-the­ory, 2009. Ac­cessed: March 5 2019.

Wei Dai. The main sources of ai risk. https://​​www.less­​​posts/​​WXvt8bxYn­wBYpy9oT/​​the-main-sources-of-ai-risk, 2019. Ac­cessed: July 2 2019.

Robyn M Dawes. So­cial dilem­mas. An­nual re­view of psy­chol­ogy, 31(1):169-193, 1980.

Karl W Deutsch and J David Singer. Mul­tipo­lar power sys­tems and in­ter­na­tional sta­bil­ity. World Poli­tics, 16(3):390-406, 1964.

Daniel Dewey. My cur­rent thoughts on miri’s “highly re­li­able agent de­sign” work. https://​​fo­rum.effec­tivealtru­​​posts/​​SEL9PW8jozrvLnkb4/​​my-cur­rent-thoughts-on-miri-s-highly-re­li­able-agent-de­sign, 2017. Ac­cessed: Oc­to­ber 6 2019.

Av­inash Dixit. Trade ex­pan­sion and con­tract en­force­ment. Jour­nal of Poli­ti­cal Econ­omy, 111(6):1293-1317, 2003.

Fi­nale Doshi-Velez and Been Kim. Towards a rigor­ous sci­ence of in­ter­pretable ma­chine learn­ing. arXiv preprint arXiv:1702.08608, 2017.

K Eric Drexler. Refram­ing su­per­in­tel­li­gence: Com­pre­hen­sive ai ser­vices as gen­eral in­tel­li­gence, 2019.

Martin Dufwen­berg and Uri Gneezy. Mea­sur­ing be­liefs in an ex­per­i­men­tal lost wallet game. Games and eco­nomic Be­hav­ior, 30(2):163-182, 2000.

Daniel Ells­berg. The the­ory and prac­tice of black­mail. Tech­ni­cal re­port, RAND CORP SANTA MONICA CA, 1968.

Jo­hanna Et­ner, Me­glena Jeleva, and Jean-Marc Tal­lon. De­ci­sion the­ory un­der am­bi­guity. Jour­nal of Eco­nomic Sur­veys, 26(2):234-270, 2012.

Owain Evans, An­dreas Stuh­lmüller, Chris Cundy, Ryan Carey, Zachary Ken­ton, Thomas McGrath, and An­drew Schreiber. Pre­dict­ing hu­man de­liber­a­tive judg­ments with ma­chine learn­ing. Tech­ni­cal re­port, Tech­ni­cal re­port, Univer­sity of Oxford, 2018.

Tom Ever­itt, Jan Leike, and Mar­cus Hut­ter. Se­quen­tial ex­ten­sions of causal and ev­i­den­tial de­ci­sion the­ory. In In­ter­na­tional Con­fer­ence on Al­gorith­mic De­ci­sionThe­ory, pages 205-221. Springer, 2015.

Tom Ever­itt, Daniel Filan, Mayank Daswani, and Mar­cus Hut­ter. Self-mod­ifi­ca­tion of policy and util­ity func­tion in ra­tio­nal agents. In In­ter­na­tional Con­fer­ence on Ar­tifi­cial Gen­eral In­tel­li­gence, pages 1-11. Springer, 2016.

Tom Ever­itt, Pe­dro A Ortega, Eliz­a­beth Barnes, and Shane Legg. Un­der­stand­ing agent in­cen­tives us­ing causal in­fluence di­a­grams, part i: sin­gle ac­tion set­tings. arXiv preprint arXiv:1902.09980, 2019.

James D Fearon. Ra­tion­al­ist ex­pla­na­tions for war. In­ter­na­tional or­ga­ni­za­tion, 49(3):379-414, 1995.

Ernst Fehr and Klaus M Sch­midt. A the­ory of fair­ness, com­pe­ti­tion, and co­op­er­a­tion. The quar­terly jour­nal of eco­nomics, 114(3):817-868, 1999. Ernst Fehr, Si­mon Gächter, and Ge­org Kirch­steiger. Re­ciproc­ity as a con­tract en­force­ment de­vice: Ex­per­i­men­tal ev­i­dence. ECONOMETRICA-EVANSTON ILL-, 65:833-860, 1997.

Dan S Fel­sen­thal and Abra­ham Diskin. The bar­gain­ing prob­lem re­vis­ited: mín­i­mum util­ity point, re­stricted mono­ton­ic­ity ax­iom, and the mean as an es­ti­mate of ex­pected util­ity. Jour­nal of Con­flict Re­s­olu­tion, 26(4):664-691, 1982.

Mark Fey and Kristo­pher W Ram­say. Mu­tual op­ti­mism and war. Amer­i­can Jour­nal of Poli­ti­cal Science, 51(4):738-754, 2007.

Fey, Mark, and Kristo­pher W. Ram­say. Mechanism de­sign goes to war: peace­ful out­comes with in­ter­de­pen­dent and cor­re­lated types. Re­view of Eco­nomic De­sign, 13(3):, 233-250, 2009.

Mark Fey and Kristo­pher W Ram­say. Uncer­tainty and in­cen­tives in crisis bar­gain­ing: Game-free anal­y­sis of in­ter­na­tional con­flict. Amer­i­can Jour­nal of Poli­ti­cal Science, 55(1):149-169, 2011.

Ben Fisch, Daniel Fre­und, and Moni Naor. Phys­i­cal zero-knowl­edge proofs of phys­i­cal prop­er­ties. In An­nual Cryp­tol­ogy Con­fer­ence, pages 313-336. Springer, 2014.

Jakob Fo­er­ster, Richard Y Chen, Maruan Al-She­di­vat, Shi­mon White­son, Pieter Abbeel, and Igor Mor­datch. Learn­ing with op­po­nent-learn­ing aware­ness. In Pro­ceed­ings of the 17th In­ter­na­tional Con­fer­ence on Au­tonomous Agents and Mul­tiA­gent Sys­tems, pages 122-130. In­ter­na­tional Foun­da­tion for Au­tonomous Agents and Mul­ti­a­gent Sys­tems, 2018.

Lance Fort­now. Pro­gram equil­ibria and dis­counted com­pu­ta­tion time. In Pro­ceed­ings of the 12th Con­fer­ence on The­o­ret­i­cal Aspects of Ra­tion­al­ity and Knowl­edge, pages 128-133. ACM, 2009.

James W Fried­man. A non-co­op­er­a­tive equil­ibrium for su­pergames. The Re­view of Eco­nomic Stud­ies, 38(1):1-12, 1971.

Daniel Gar­ber. Old ev­i­dence and log­i­cal om­ni­science in bayesian con­fir­ma­tion the­ory. 1983.

Ben Garfinkel. Revent de­vel­op­ments in cryp­tog­ra­phy and pos­si­ble long-run con­se­quences. https://​​​​file/​​d/​​0B0j9LKC65n09aDh4RmEzdlloT00/​​view,2018. Ac­cessed: Novem­ber 11 2019.

Ben Garfinkel and Allan Dafoe. How does the offense-defense bal­ance scale? Jour­nal of Strate­gic Stud­ies, 42(6):736-763, 2019.

Scott Garrabrant. Two ma­jor ob­sta­cles for log­i­cal in­duc­tor de­ci­sion the­ory. https://​​agent­foun­da­​​item?id=1399, 2017. Ac­cessed: July 17 2019.

Scott Garrabrant and Abram Dem­ski. Embed­ded agency. https://​​­ign­ment­fo­​​posts/​​i3BTagvt3HbPMx6PN/​​em­bed­ded-agency-full-text-ver­sion, 2018. Ac­cessed March 6, 2019.

Scott Garrabrant, Tsvi Ben­son-Tilsen, An­drew Critch, Nate Soares, and Jes­sica Tay­lor. Log­i­cal in­duc­tion. arXiv preprint arXiv:1609.03543, 2016.

Alexan­dre Gazet. Com­par­a­tive anal­y­sis of var­i­ous ran­somware virii. Jour­nal in com­puter vi­rol­ogy, 6(1):77-90, 2010.

Sa­muel J Ger­sh­man, Eric J Horvitz, and Joshua B Te­nen­baum. Com­pu­ta­tional ra­tio­nal­ity: A con­verg­ing paradigm for in­tel­li­gence in brains, minds, and ma­chines. Science, 349(6245):273-278, 2015.

Allan Gib­bard and William L Harper. Coun­ter­fac­tu­als and two kinds of ex­pected util­ity. In Ifs, pages 153-190. Springer, 1978.

Itzhak Gilboa and David Sch­mei­dler. Maxmin ex­pected util­ity with non-unique prior. Jour­nal of math­e­mat­i­cal eco­nomics, 18(2):141-153, 1989.

Alexan­der Glaser, Boaz Barak, and Robert J Gold­ston. A zero-knowl­edge pro­to­col for nu­clear war­head ver­ifi­ca­tion. Na­ture, 510(7506):497, 2014.

Charles L Glaser. The se­cu­rity dilemma re­vis­ited. World poli­tics, 50(1):171-201, 1997.

Piotr J Gmy­trasiewicz and Prashant Doshi. A frame­work for se­quen­tial plan­ning in multi-agent set­tings. Jour­nal of Ar­tifi­cial In­tel­li­gence Re­search, 24:49-79, 2005.

Oded Goldre­ich and Yair Oren. Defi­ni­tions and prop­er­ties of zero-knowl­edge proof sys­tems. Jour­nal of Cryp­tol­ogy, 7(1):1-32, 1994.

Shafi Gold­wasser, Silvio Mi­cali, and Charles Rack­off. The knowl­edge com­plex­ity of in­ter­ac­tive proof sys­tems. SIAM Jour­nal on com­put­ing, 18(1):186-208, 1989.

Katja Grace, John Sal­vatier, Allan Dafoe, Baobao Zhang, and Owain Evans. When will ai ex­ceed hu­man perfor­mance? ev­i­dence from ai ex­perts. Jour­nal of Ar­tifi­cial In­tel­li­gence Re­search, 62:729-754, 2018.

Hilary Greaves, William MacAskill, Rossa O’Keeffe-O’Dono­van, and Philip Tram­mell. Re­search agenda–web ver­sion a re­search agenda for the global pri­ori­ties in­sti­tute. 2019.

Avner Greif, Paul Mil­grom, and Barry R We­in­gast. Co­or­di­na­tion, com­mit­ment, and en­force­ment: The case of the mer­chant guild. Jour­nal of poli­ti­cal econ­omy, 102(4):745-776, 1994.

Frances S Grodz­in­sky, Keith W Miller, and Marty J Wolf. Devel­op­ing ar­tifi­cial agents wor­thy of trust: “would you buy a used car from this ar­tifi­cial agent?”. Ethics and in­for­ma­tion tech­nol­ogy, 13(1):17-27, 2011.

Werner Güth, Rolf Sch­mit­tberger, and Bernd Sch­warze. An ex­per­i­men­tal anal­y­sis of ul­ti­ma­tum bar­gain­ing. Jour­nal of eco­nomic be­hav­ior & or­ga­ni­za­tion, 3(4):367-388, 1982.

Dy­lan Had­field-Menell, Stu­art J Rus­sell, Pieter Abbeel, and Anca Dra­gan. Co­op­er­a­tive in­verse re­in­force­ment learn­ing. In Ad­vances in neu­ral in­for­ma­tion pro­cess­ing sys­tems, pages 3909-3917, 2016.

Ed­ward H Ha­gen and Peter Ham­mer­stein. Game the­ory and hu­man evolu­tion: A cri­tique of some re­cent in­ter­pre­ta­tions of ex­per­i­men­tal games. The­o­ret­i­cal pop­u­la­tion biol­ogy, 69(3):339-348, 2006.

Joseph Y Halpern and Ra­fael Pass. Game the­ory with translu­cent play­ers. In­ter­na­tional Jour­nal of Game The­ory, 47(3):949-976, 2018.

Lars Peter Hansen and Thomas J Sar­gent. Ro­bust­ness. Prince­ton uni­ver­sity press, 2008.

Lars Peter Hansen, Mas­simo Mari­nacci, et al. Am­bi­guity aver­sion and model mis­speci­fi­ca­tion: An eco­nomic per­spec­tive. Statis­ti­cal Science, 31(4):511-515, 2016.

Gar­rett Hardin. The tragedy of the com­mons. sci­ence, 162(3859):1243-1248, 1968.

Paul Har­ren­stein, Felix Brandt, and Felix Fischer. Com­mit­ment and ex­tor­tion. In Pro­ceed­ings of the 6th in­ter­na­tional joint con­fer­ence on Au­tonomous agents and mul­ti­a­gent sys­tems, page 26. ACM, 2007.

John C Harsanyi and Rein­hard Selten. A gen­er­al­ized nash solu­tion for two-per­son bar­gain­ing games with in­com­plete in­for­ma­tion. Man­age­ment Science, 18(5-part-2): 80-106, 1972.

Joseph Hen­rich, Richard McElreath, Abi­gail Barr, Jean Ens­minger, Clark Bar­rett, Alexan­der Bolyanatz, Juan Camilo Car­de­nas, Michael Gur­ven, Ed­wins Gwako, Natalie Hen­rich, et al. Costly pun­ish­ment across hu­man so­cieties. Science, 312(5781): 1767-1770, 2006.

Jack Hir­sh­leifer. On the emo­tions as guaran­tors of threats and promises. The Dark Side of the Force, pages 198-219, 1987.

Dou­glas R Hofs­tadter. Dilem­mas for su­per­ra­tional thinkers, lead­ing up to a lur­ing lot­tery. Scien­tific Amer­i­can, 6:267-275, 1983.

Ter­ence Hor­gan. Coun­ter­fac­tu­als and new­comb’s prob­lem. The Jour­nal of Philos­o­phy, 78(6):331-356, 1981.

Ed­ward Hughes, Joel Z Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guz­man, An­to­nio Gar­cía Cas­tañeda, Iain Dun­ning, Tina Zhu, Kevin McKee, Raphael Koster, et al. Inequity aver­sion im­proves co­op­er­a­tion in in­tertem­po­ral so­cial dilem­mas. In Ad­vances in neu­ral in­for­ma­tion pro­cess­ing sys­tems, pages 3326-3336, 2018.

Max Jader­berg, Valentin Dal­ibard, Si­mon Os­in­dero, Wo­j­ciech M Czar­necki, Jeff Don­ahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dun­ning, Karen Si­monyan, et al. Pop­u­la­tion based train­ing of neu­ral net­works. arXiv preprint arXiv:1711.09846, 2017.

Robert Jervis. Co­op­er­a­tion un­der the se­cu­rity dilemma. World poli­tics, 30(2):167-214, 1978.

Robert Jervis. Per­cep­tion and Misper­cep­tion in In­ter­na­tional Poli­tics: New Edi­tion. Prince­ton Univer­sity Press, 2017.

Daniel Kah­ne­man, Ilana Ri­tov, David Schkade, Steven J Sher­man, and Hal R Var­ian. Eco­nomic prefer­ences or at­ti­tude ex­pres­sions?: An anal­y­sis of dol­lar re­sponses to pub­lic is­sues. In Elic­i­ta­tion of prefer­ences, pages 203-242. Springer, 1999.

Ehud Kalai. Pro­por­tional solu­tions to bar­gain­ing situ­a­tions: in­ter­per­sonal util­ity com­par­i­sons. Econo­met­rica: Jour­nal of the Econo­met­ric So­ciety, pages 1623-1630, 1977.

Ehud Kalai, Meir Smorod­in­sky, et al. Other solu­tions to nash’s bar­gain­ing prob­lem. Econo­met­rica, 43(3):513-518, 1975.

Fred Ka­plan. The wiz­ards of Ar­maged­don. Stan­ford Univer­sity Press, 1991.

Holden Karnofsky. Some back­ground on our views re­gard­ing ad­vanced ar­tifi­cial in­tel­li­gence. https://​​­philan­​​blog/​​some-back­ground-our-views-re­gard­ing-ad­vanced-ar­tifi­cial-in­tel­li­gence, 2016. Ac­cessed: July 7 2019.

D Marc Kil­gour and Frank C Za­gare. Cred­i­bil­ity, un­cer­tainty, and de­ter­rence. Amer­i­can Jour­nal of Poli­ti­cal Science, 35(2):305-334, 1991.

Stephen Knack and Philip Keefer. In­sti­tu­tions and eco­nomic perfor­mance: cross-coun­try tests us­ing al­ter­na­tive in­sti­tu­tional mea­sures. Eco­nomics & Poli­tics, 7(3): 207-227, 1995.

Daniel Koko­ta­jlo. The “com­mit­ment races” prob­lem. https://​​www.less­​​posts/​​brXr7PJ2W4Na2EW2q/​​the-com­mit­ment-races-prob­lem, 2019a. Ac­cessed: Septem­ber 11 2019.

Daniel Koko­ta­jlo. Cdt agents are ex­ploitable. Un­pub­lished work­ing draft, 2019b.

Peter Kol­lock. So­cial dilem­mas: The anatomy of co­op­er­a­tion. An­nual re­view of so­ciol­ogy, 24(1):183-214, 1998.

Kai A Kon­rad and Ster­gios Skaper­das. Cred­ible threats in ex­tor­tion. Jour­nal of Eco­nomic Be­hav­ior & Or­ga­ni­za­tion, 33(1):23-39, 1997.

David M Kreps and Joel So­bel. Sig­nal­ling. Hand­book of game the­ory with eco­nomic ap­pli­ca­tions, 2:849-867, 1994.

Joshua A Kroll, Solon Baro­cas, Ed­ward W Felten, Joel R Rei­den­berg, David G Robin­son, and Har­lan Yu. Ac­countable al­gorithms. U. Pa. L. Rev., 165:633, 2016.

David Krueger, Te­gan Ma­haraj, Shane Legg, and Jan Leike. Mislead­ing meta-ob­jec­tives and hid­den in­cen­tives for dis­tri­bu­tional shift. Safe Ma­chine Learn­ing work­shop at ICLR, 2019.

An­drew Kydd. Which side are you on? bias, cred­i­bil­ity, and me­di­a­tion. Amer­i­can Jour­nal of Poli­ti­cal Science, 47(4):597-611, 2003.

An­drew H Kydd. Ra­tion­al­ist ap­proaches to con­flict pre­ven­tion and re­s­olu­tion. An­nual Re­view of Poli­ti­cal Science, 13:101-121, 2010.

Marc Lanc­tot, Vini­cius Zam­baldi, Au­drunas Grus­lys, An­geliki Lazari­dou, Karl Tuyls, Julien Pero­lat, David Silver, and Thore Grae­pel. A unified game-the­o­retic ap­proach to mul­ti­a­gent re­in­force­ment learn­ing. In Ad­vances in Neu­ral In­for­ma­tion Pro­cess­ing Sys­tems, pages 4190-4203, 2017.

Daryl Lan­dau and Sy Lan­dau. Con­fi­dence-build­ing mea­sures in me­di­a­tion. Me­di­a­tion Quar­terly, 15(2):97-103, 1997.

Pa­trick LaVic­toire, Benja Fallen­stein, Eliezer Yud­kowsky, Mihaly Barasz, Paul Chris­ti­ano, and Mar­cello Her­reshoff. Pro­gram equil­ibrium in the pris­oner’s dilemma via loeb’s the­o­rem. In Work­shops at the Twenty-Eighth AAAI Con­fer­ence on Ar­tifi­cial In­tel­li­gence, 2014.

Joel Z Leibo, Vini­cius Zam­baldi, Marc Lanc­tot, Janusz Marecki, and Thore Grae­pel. Multi-agent re­in­force­ment learn­ing in se­quen­tial so­cial dilem­mas. In Pro­ceed­ings of the 16th Con­fer­ence on Au­tonomous Agents and Mul­tiA­gent Sys­tems, pages 464-473. In­ter­na­tional Foun­da­tion for Au­tonomous Agents and Mul­ti­a­gent Sys­tems, 2017.

Joel Z Leibo, Ed­ward Hughes, Marc Lanc­tot, and Thore Grae­pel. Au­tocur­ricula and the emer­gence of in­no­va­tion from so­cial in­ter­ac­tion: A man­i­festo for multi-agent in­tel­li­gence re­search. arXiv preprint arXiv:1903.00742, 2019.

Jan Leike, David Krueger, Tom Ever­itt, Mil­jan Mar­tic, Vishal Maini, and Shane Legg. Scal­able agent al­ign­ment via re­ward mod­el­ing: a re­search di­rec­tion. arXiv preprint arXiv:1811.07871, 2018.

Adam Lerer and Alexan­der Peysakhovich. Main­tain­ing co­op­er­a­tion in com­plex so­cial dilem­mas us­ing deep re­in­force­ment learn­ing. arXiv preprint arXiv:1707.01068, 2017.

Anni Leskela. Si­mu­la­tions as a tool for un­der­stand­ing other civ­i­liza­tions. Un­pub­lished work­ing draft, 2019.

Alis­tair Letcher, Jakob Fo­er­ster, David Bal­duzzi, Tim Rock­täschel, and Shi­mon White­son. Stable op­po­nent shap­ing in differ­en­tiable games. arXiv preprint arXiv:1811.08469, 2018.

David Lewis. Pri­son­ers’ dilemma is a new­comb prob­lem. Philos­o­phy & Public Af­fairs, pages 235-240, 1979.

Xiaomin Lin, Stephen C Adams, and Peter A Bel­ing. Multi-agent in­verse re­in­force­ment learn­ing for cer­tain gen­eral-sum stochas­tic games. Jour­nal of Ar­tifi­cial In­tel­li­gence Re­search, 66:473-502, 2019.

Zachary C Lip­ton. The mythos of model in­ter­pretabil­ity. arXiv preprint arXiv:1606.03490, 2016.

William MacAskill. A cri­tique of func­tional de­ci­sion the­ory. https://​​www.less­​​posts/​​ySLYSsNeFL5CoAQzN/​​a-cri­tique-of-func­tional-de­ci­sion-the­ory, 2019. Ac­cessed: Septem­ber 15 2019.

William MacAskill, Aron Val­lin­der, Cas­par Oester­held, Carl Shul­man, and Jo­hannes Treut­lein. The ev­i­den­tial­ist’s wa­ger. Manuscript, 2019.

Fabio Mac­cheroni, Mas­simo Mari­nacci, and Aldo Rus­ti­chini. Am­bi­guity aver­sion, ro­bust­ness, and the vari­a­tional rep­re­sen­ta­tion of prefer­ences. Econo­met­rica, 74(6): 1447-1498, 2006.

Michael W Macy and An­dreas Flache. Learn­ing dy­nam­ics in so­cial dilem­mas. Pro­ceed­ings of the Na­tional Academy of Sciences, 99(suppl 3):7229-7236, 2002. Christo­pher JG Meacham. Bind­ing and its con­se­quences. Philo­soph­i­cal stud­ies, 149 (1):49-71, 2010.

Kath­leen L Mosier, Linda J Sk­itka, Su­san Heers, and Mark Bur­dick. Au­toma­tion bias: De­ci­sion mak­ing and perfor­mance in high-tech cock­pits. The In­ter­na­tional jour­nal of avi­a­tion psy­chol­ogy, 8(1):47-63, 1998.

Ab­hi­nay Muthoo. A bar­gain­ing model based on the com­mit­ment tac­tic. Jour­nal of Eco­nomic The­ory, 69:134-152, 1996.

Rose­marie Nagel. Un­rav­el­ing in guess­ing games: An ex­per­i­men­tal study. The Amer­i­can Eco­nomic Re­view, 85(5):1313-1326, 1995.

John Nash. Two-per­son co­op­er­a­tive games. Econo­met­rica, 21:128-140, 1953.

John F Nash. The bar­gain­ing prob­lem. Econo­met­rica: Jour­nal of the Econo­met­ric So­ciety, pages 155-162, 1950.

An­drew Y Ng, Stu­art J Rus­sell, et al. Al­gorithms for in­verse re­in­force­ment learn­ing. In Icml, vol­ume 1, page 2, 2000.

Dou­glass C North. In­sti­tu­tions. Jour­nal of eco­nomic per­spec­tives, 5(1):97-112, 1991.

Robert Noz­ick. New­comb’s prob­lem and two prin­ci­ples of choice. In Es­says in honor of Carl G. Hem­pel, pages 114-146. Springer, 1969.

Cas­par Oester­held. Deep re­in­force­ment learn­ing from hu­man prefer­ences. https://​​cas­paroester­held.files.word­​​2018/​​01/​​rldt.pdf, 2017a.

Cas­par Oester­held. Mul­ti­verse-wide co­op­er­a­tion via cor­re­lated de­ci­sion mak­ing. 2017b.

Cas­par Oester­held. Ro­bust pro­gram equil­ibrium. The­ory and De­ci­sion, pages 1-17, 2019.

Cas­par Oester­held and Vin­cent Conitzer. Ex­tract­ing money from causal de­ci­sion the­o­rists. 2019. Ac­cessed: March 13 2019.

Stephen M Omo­hun­dro. The na­ture of self-im­prov­ing ar­tifi­cial in­tel­li­gence. Sin­gu­lar­ity Sum­mit, 2008, 2007.

Stephen M Omo­hun­dro. The ba­sic ai drives. In AGI, vol­ume 171, pages 483-492, 2008.

OpenAI. Ope­nai char­ter. https://​​ope­​​char­ter/​​, 2018. Ac­cessed: July 7 2019.

Petro A Ortega and Vishal Maini. Build­ing safe ar­tifi­cial in­tel­li­gence: speci­fi­ca­tion, ro­bust­ness, and as­surance. https://​​​​@deep­mind­safe­tyre­search/​​build­ing-safe-ar­tifi­cial-in­tel­li­gence-52f5f75058f1, 2018. Ac­cessed: July 7 2019.

Raja Para­sura­man and Diet­rich H Manzey. Com­pla­cency and bias in hu­man use of au­toma­tion: An at­ten­tional in­te­gra­tion. Hu­man fac­tors, 52(3):381-410, 2010. Judea Pearl. Causal­ity. Cam­bridge uni­ver­sity press, 2009.

Julien Pero­lat, Joel Z Leibo, Vini­cius Zam­baldi, Charles Beat­tie, Karl Tuyls, and Thore Grae­pel. A multi-agent re­in­force­ment learn­ing model of com­mon-pool re­source ap­pro­pri­a­tion. In Ad­vances in Neu­ral In­for­ma­tion Pro­cess­ing Sys­tems, pages 3643-3652, 2017.

Alexan­der Peysakhovich and Adam Lerer. Con­se­quen­tial­ist con­di­tional co­op­er­a­tion in so­cial dilem­mas with im­perfect in­for­ma­tion. arXiv preprint arXiv:1710.06975, 2017.

Robert Pow­ell. Bar­gain­ing the­ory and in­ter­na­tional con­flict. An­nual Re­view of Poli­ti­cal Science, 5(1):1-30, 2002.

Robert Pow­ell. War as a com­mit­ment prob­lem. In­ter­na­tional or­ga­ni­za­tion, 60(1): 169-203, 2006.

Kai Quek. Ra­tion­al­ist ex­per­i­ments on war. Poli­ti­cal Science Re­search and Meth­ods, 5 (1):123-142, 2017.

Matthew Rabin. In­cor­po­rat­ing fair­ness into game the­ory and eco­nomics. The Amer­i­can eco­nomic re­view, pages 1281-1302, 1993.

Neil C Rabinow­itz, Frank Per­bet, H Fran­cis Song, Chiyuan Zhang, SM Es­lami, and Matthew Botv­inick. Ma­chine the­ory of mind. arXiv preprint arXiv:1802.07740, 2018.

Werner Raub. A gen­eral game-the­o­retic model of prefer­ence adap­ta­tions in prob­le­matic so­cial situ­a­tions. Ra­tion­al­ity and So­ciety, 2(1):67-93, 1990.

Robert W Rauch­haus. Asym­met­ric in­for­ma­tion, me­di­a­tion, and con­flict man­age­ment. World Poli­tics, 58(2):207-241, 2006.

Jonathan Ren­shon, Ju­lia J Lee, and Dustin Tin­gley. Emo­tions and the microfoun­da­tions of com­mit­ment prob­lems. In­ter­na­tional Or­ga­ni­za­tion, 71(S1):S189-S218, 2017.

Stephane Ross, Ge­offrey Gor­don, and Drew Bag­nell. A re­duc­tion of imi­ta­tion learn­ing and struc­tured pre­dic­tion to no-re­gret on­line learn­ing. In Pro­ceed­ings of the four­teenth in­ter­na­tional con­fer­ence on ar­tifi­cial in­tel­li­gence and statis­tics, pages 627-635, 2011.

Ariel Ru­bin­stein. Perfect equil­ibrium in a bar­gain­ing model. Econo­met­rica: Jour­nal of the Econo­met­ric So­ciety, pages 97-109, 1982.

Stu­art Rus­sell, Daniel Dewey, and Max Teg­mark. Re­search pri­ori­ties for ro­bust and benefi­cial ar­tifi­cial in­tel­li­gence. Ai Magaz­ine, 36(4):105-114, 2015.

Stu­art J Rus­sell and De­vika Subra­ma­nian. Prov­ably bounded-op­ti­mal agents. Jour­nal of Ar­tifi­cial In­tel­li­gence Re­search, 2:575-609, 1994.

San­ti­ago Sanchez-Pages. Bar­gain­ing and con­flict with in­com­plete in­for­ma­tion. The Oxford Hand­book of the Eco­nomics of Peace and Con­flict. Oxford Univer­sity Press, New York, 2012.

Wiliam Saun­ders. Hch is not just me­chan­i­cal turk. https://​​­ign­ment­fo­​​posts/​​4JuKoFguzuMrNn6Qr/​​hch-is-not-just-me­chan­i­cal-turk?_ga=2.41060900. 708557547.1562118039-599692079.1556077623, 2019. Ac­cessed: July 2 2019.

Ste­fan Schaal. Is imi­ta­tion learn­ing the route to hu­manoid robots? Trends in cog­ni­tive sci­ences, 3(6):233-242, 1999.

Jonathan Schaf­fer. The meta­physics of cau­sa­tion. In Ed­ward N. Zalta, ed­i­tor, The Stan­ford En­cy­clo­pe­dia of Philos­o­phy. Me­ta­physics Re­search Lab, Stan­ford Univer­sity, fall 2016 edi­tion, 2016.

James A Schel­len­berg. A com­par­a­tive test of three mod­els for solv­ing “the bar­gain­ing prob­lem”. Be­hav­ioral Science, 33(2):81-96, 1988.

Thomas Schel­ling. The Strat­egy of Con­flict. Har­vard Univer­sity Press, 1960.

David Sch­midt, Robert Shupp, James Walker, TK Ahn, and Elinor Ostrom. Dilemma games: game pa­ram­e­ters and match­ing pro­to­cols. Jour­nal of Eco­nomic Be­hav­ior & Or­ga­ni­za­tion, 46(4):357-377, 2001.

Wolf­gang Sch­warz. On func­tional de­ci­sion the­ory.​wo/​2018/​688, 2018. Ac­cessed: Septem­ber 15 2019.

Anja Short­land and Russ Roberts. Short­land on kid­nap. http://​​www.econ­​​anja-short­land-on-kid­nap/​​, 2019. Ac­cessed: July 13 2019.

Carl Shul­man. Omo­hun­dro’s “ba­sic ai drives” and catas­trophic risks. Manuscript, 2010.

Linda J Sk­itka, Kath­leen L Mosier, and Mark Bur­dick. Does au­toma­tion bias de­ci­sion-mak­ing? In­ter­na­tional Jour­nal of Hu­man-Com­puter Stud­ies, 51(5):991–1006, 1999.

Alas­tair Smith and Allan C Stam. Bar­gain­ing and the na­ture of war. Jour­nal of Con­flict Re­s­olu­tion, 48(6):783-813, 2004.

Glenn H Sny­der. “pris­oner’s dilema” and “chicken” mod­els in in­ter­na­tional poli­tics. In­ter­na­tional Stud­ies Quar­terly, 15(1):66-103, 1971.

Nate Soares and Benja Fallen­stein. Toward ideal­ized de­ci­sion the­ory. arXiv preprint arXiv:1507.01986, 2015.

Nate Soares and Benya Fallen­stein. Agent foun­da­tions for al­ign­ing ma­chine in­tel­li­gence with hu­man in­ter­ests: a tech­ni­cal re­search agenda. In The Tech­nolog­i­cal Sin­gu­lar­ity, pages 103-125. Springer, 2017.

Joel So­bel. A the­ory of cred­i­bil­ity. The Re­view of Eco­nomic Stud­ies, 52(4):557-573, 1985.

Ray J Solomonoff. A for­mal the­ory of in­duc­tive in­fer­ence. part i. In­for­ma­tion and con­trol, 7(1):1-22, 1964.

Kaj So­tala. Disjunc­tive sce­nar­ios of catas­trophic ai risk. In Ar­tifi­cial In­tel­li­gence Safety and Se­cu­rity, pages 315-337. Chap­man and Hall/​CRC, 2018.

Tom Flo­rian Sterken­burg. The fou­da­tions of solomonoff pre­dic­tion. Master’s the­sis, 2013.

Jo­erg Stoye. Statis­ti­cal de­ci­sions un­der am­bi­guity. The­ory and de­ci­sion, 70(2):129-148, 2011.

Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mor­datch. Neu­ral mmo: A mas­sively mul­ti­a­gent game en­vi­ron­ment for train­ing and eval­u­at­ing in­tel­li­gent agents. arXiv preprint arXiv:1903.00784, 2019.

Chiara Su­perti. Ad­diopizzo: Can a la­bel defeat the mafia? Jour­nal of In­ter­na­tional Policy Solu­tions, 11(4):3-11, 2009.

Richard S Sut­ton and An­drew G Barto. Re­in­force­ment learn­ing: An in­tro­duc­tion. MIT press, 2018.

William Talbott. Bayesian episte­mol­ogy. In Ed­ward N. Zalta, ed­i­tor, The Stan­ford En­cy­clo­pe­dia of Philos­o­phy. Me­ta­physics Re­search Lab, Stan­ford Univer­sity, win­ter 2016 edi­tion, 2016.

Jes­sica Tay­lor. My cur­rent take on the paul-miri dis­agree­ment on al­ignabil­ity of messy ai. https://​​agent­foun­da­​​item?id=1129, 2016. Ac­cessed: Oc­to­ber 6 2019.

Max Teg­mark. Par­allel uni­verses. Scien­tific Amer­i­can, 288(5):40-51, 2003.

Moshe Ten­nen­holtz. Pro­gram equil­ibrium. Games and Eco­nomic Be­hav­ior, 49(2): 363-373, 2004.

Jo­hannes Treut­lein. Model­ing mul­ti­verse-wide su­per­ra­tional­ity. Un­pub­lished work­ing draft., 2019.

Jonathan Ue­sato, Ananya Ku­mar, Cs­aba Szepes­vari, Tom Erez, Avra­ham Ru­d­er­man, Keith An­der­son, Ni­co­las Heess, Push­meet Kohli, et al. Ri­gor­ous agent eval­u­a­tion: An ad­ver­sar­ial ap­proach to un­cover catas­trophic failures. arXiv preprint arXiv:1812.01647, 2018.

Eric Van Damme. The nash bar­gain­ing solu­tion is op­ti­mal. Jour­nal of Eco­nomic The­ory, 38(1):78-100, 1986.

Hal R Var­ian. Com­puter me­di­ated trans­ac­tions. Amer­i­can Eco­nomic Re­view, 100(2): 1-10, 2010.

Hein­rich Von Stack­elberg. Mar­ket struc­ture and equil­ibrium. Springer Science & Busi­ness Me­dia, 2010.

Ken­neth N Waltz. The sta­bil­ity of a bipo­lar world. Daedalus, pages 881-909, 1964.

Weixun Wang, Ji­anye Hao, Yixi Wang, and Matthew Tay­lor. Towards co­op­er­a­tion in se­quen­tial pris­oner’s dilem­mas: a deep mul­ti­a­gent re­in­force­ment learn­ing ap­proach. arXiv preprint arXiv:1803.00162, 2018.

E Roy Wein­traub. Game the­ory and cold war ra­tio­nal­ity: A re­view es­say. Jour­nal of Eco­nomic Liter­a­ture, 55(1):148-61, 2017.

Sylvia Wen­mack­ers and Jan-Willem Romeijn. New the­ory about old ev­i­dence. Syn­these, 193(4):1225-1250, 2016.

Lan­tao Yu, Ji­am­ing Song, and Ste­fano Er­mon. Multi-agent ad­ver­sar­ial in­verse re­in­force­ment learn­ing. arXiv preprint arXiv:1907.13220, 2019.

Eliezer Yud­kowsky. In­gre­di­ents of time­less de­ci­sion the­ory. https://​​www.less­​​posts/​​szfxvS8nsxTgJLBHs/​​in­gre­di­ents-of-time­less-de­ci­sion-the­ory, 2009. Ac­cessed: March 14 2019.

Eliezer Yud­kowsky. In­tel­li­gence ex­plo­sion microe­co­nomics. Ma­chine In­tel­li­gence Re­search In­sti­tute, ac­cessed on­line Oc­to­ber, 23:2015, 2013.

Eliezer Yud­kowsky. Model­ing dis­tant su­per­in­tel­li­gences. https://​​ar­​​p/​​dis­tant_SIs/​​, n.d. Ac­cessed: Feb. 6 2019.

Eliezer Yud­kowsky and Nate Soares. Func­tional de­ci­sion the­ory: A new the­ory of in­stru­men­tal ra­tio­nal­ity. arXiv preprint arXiv:1710.05060, 2017.

Claire Za­bel and Luke Muehlhauser. In­for­ma­tion se­cu­rity ca­reers for gcr re­duc­tion. https://​​fo­rum.effec­tivealtru­​​posts/​​ZJiCfwTy5dC4CoxqA/​​in­for­ma­tion-se­cu­rity-ca­reers-for-gcr-re­duc­tion, 2019. Ac­cessed: July 17 2019.

Chongjie Zhang and Vic­tor Lesser. Multi-agent learn­ing with policy pre­dic­tion. In Twenty-Fourth AAAI Con­fer­ence on Ar­tifi­cial In­tel­li­gence, 2010.

No comments.