Section 7: Foundations of Rational Agency

This post is part of the se­quence ver­sion of the Effec­tive Altru­ism Foun­da­tion’s re­search agenda on Co­op­er­a­tion, Con­flict, and Trans­for­ma­tive Ar­tifi­cial In­tel­li­gence.

7 Foun­da­tions of ra­tio­nal agency

We think that the effort to en­sure co­op­er­a­tive out­comes among TAI sys­tems will likely benefit from thor­ough con­cep­tual clar­ity about the na­ture of ra­tio­nal agency. Cer­tain foun­da­tional achieve­ments — prob­a­bil­ity the­ory, the the­ory of com­pu­ta­tion, al­gorith­mic in­for­ma­tion the­ory, de­ci­sion the­ory, and game the­ory to name some of the most profound — have been in­stru­men­tal in both pro­vid­ing a pow­er­ful con­cep­tual ap­para­tus for think­ing about ra­tio­nal agency, and the de­vel­op­ment of con­crete tools in ar­tifi­cial in­tel­li­gence, statis­tics, cog­ni­tive sci­ence, and so on. Like­wise, there are a num­ber of out­stand­ing foun­da­tional ques­tions sur­round­ing the na­ture of ra­tio­nal agency which we ex­pect to yield ad­di­tional clar­ity about in­ter­ac­tions be­tween TAI-en­abled sys­tems. Broadly, we want to an­swer:

  • What are the im­pli­ca­tions of com­pu­ta­tional bound­ed­ed­ness (Rus­sell and Subra-ma­nian, 1994; Ch­er­niak, 1984; Ger­sh­man et al., 2015) for nor­ma­tive de­ci­sion the­ory, in par­tic­u­lar as ap­plied to in­ter­ac­tions among TAI sys­tems?

  • How should agents han­dle non-causal de­pen­dences with other agents’ de­ci­sion-mak­ing in their own de­ci­sions?

We ac­knowl­edge, how­ever, the limi­ta­tions of the agenda for foun­da­tional ques­tions which we pre­sent. First, it is plau­si­ble that the for­mal tools we de­velop will be of limited use in un­der­stand­ing TAI sys­tems that are ac­tu­ally de­vel­oped. This may be true of black-box ma­chine learn­ing sys­tems, for in­stance [1]. Se­cond, there is plenty of po­ten­tially rele­vant foun­da­tional in­quiry scat­tered across episte­mol­ogy, de­ci­sion the­ory, game the­ory, math­e­mat­ics, philos­o­phy of prob­a­bil­ity, philos­o­phy of sci­ence, etc. which we do not pri­ori­tize in our agenda [2]. This does not nec­es­sar­ily re­flect a con­sid­ered judge­ment about all rele­vant ar­eas. How­ever, it is plau­si­ble to us that the re­search di­rec­tions listed here are among the most im­por­tant, tractable, and ne­glected (Con­cepts,n.d.) di­rec­tions for im­prov­ing our the­o­ret­i­cal pic­ture of TAI.

7.1 Bounded de­ci­sion the­ory [3]

Bayesi­anism (Talbott, 2016) is the stan­dard ideal­ized model of rea­son­ing un­der em­piri­cal un­cer­tainty. Bayesian agents main­tain prob­a­bil­ities over hy­pothe­ses; up­date these prob­a­bil­ities by con­di­tion­al­iza­tion in light of new ev­i­dence; and make de­ci­sions ac­cord­ing to some ver­sion of ex­pected util­ity de­ci­sion the­ory (Briggs, 2019). But Bayesi­anism faces a num­ber of limi­ta­tions when ap­plied to com­pu­ta­tion­ally bounded agents. Ex­am­ples in­clude:

  • Un­like Bayesian agents, com­pu­ta­tion­ally bounded agents are log­i­cally un­cer­tain. That is, they are not aware of all the log­i­cal im­pli­ca­tions of their hy­pothe­ses and ev­i­dence (Gar­ber, 1983) [4]. Log­i­cal un­cer­tainty may be par­tic­u­larly rele­vant in de­vel­op­ing a satis­fac­tory open-source game the­ory (Sec­tion 3.2), as open-source game the­ory re­quires agents to make de­ci­sions on the ba­sis of the out­put of their coun­ter­parts’ source codes (which are log­i­cal facts). In com­plex set­tings, agents are un­likely to be cer­tain about the out­put of all of the rele­vant pro­grams. Garrabrant et al. (2016) pre­sents a the­ory for as­sign­ing log­i­cal cre­dences, but it has flaws when ap­plied to de­ci­sion-mak­ing (Garrabrant, 2017). Thus one re­search di­rec­tion we are in­ter­ested in is a the­o­ret­i­cally sound and com­pu­ta­tion­ally re­al­is­tic ap­proach to de­ci­sion-mak­ing un­der log­i­cal un­cer­tainty.

  • Un­like Bayesian agents, com­pu­ta­tion­ally bounded agents can­not rea­son over the space of all pos­si­ble hy­pothe­ses. Us­ing the the ter­minol­ogy of statis­ti­cal mod­el­ing (e.g., Hansen et al. 2016), we will call this situ­a­tion model mis­speci­fi­ca­tion [5]. The de­vel­op­ment of a de­ci­sion the­ory for agents with mis­speci­fied world-mod­els would seem par­tic­u­larly im­por­tant for our un­der­stand­ing of com­mit­ment in multi-agent set­tings. Ra­tional agents may some­times want to bind them­selves to cer­tain poli­cies in or­der to, for ex­am­ple, re­duce their vuln­er­a­bil­ity to ex­ploita­tion by other agents (e.g., Schel­ling (1960); Meacham (2010); Koko­ta­jlo(2019a); see also Sec­tion 3 and the dis­cus­sion of com­mit­ment races in Sec­tion 2). In­tu­itively, how­ever, a ra­tio­nal agent may be hes­i­tant to bind them­selves to a policy by plan­ning with a model which they sus­pect is mis­speci­fied. The anal­y­sis of games of in­com­plete in­for­ma­tion may also be quite sen­si­tive to model mis­speci­fi­ca­tion [6]. To de­velop a bet­ter the­ory of rea­son­ing un­der model mis­speci­fi­ca­tion, one might start with the liter­a­tures on de­ci­sion the­ory un­der am­bi­guity (Gilboa and Sch­mei­dler, 1989; Mac­cheroni et al., 2006; Stoye, 2011; Et­ner et al.,2012) and ro­bust con­trol the­ory (Hansen and Sar­gent, 2008).

7.2 Acausal rea­son­ing [7]

New­comb’s prob­lem [8] (Noz­ick, 1969) showed that clas­si­cal de­ci­sion the­ory bifur­cates into two con­flict­ing prin­ci­ples of choice in cases where out­comes de­pend on agents’ pre­dic­tions of each other’s be­hav­ior. Since then, con­sid­er­able philo­soph­i­cal work has gone to­wards iden­ti­fy­ing ad­di­tional prob­lem cases for de­ci­sion the­ory and to­wards de­vel­op­ing new de­ci­sion the­o­ries to ad­dress them. As with New­comb’s prob­lem, many de­ci­sion-the­o­retic puz­zles in­volve de­pen­dences be­tween the choices of sev­eral agents. For in­stance, Lewis (1979) ar­gues that New­comb’s prob­lem is equiv­a­lent to a pris­oner’s dilemma played by agents with highly cor­re­lated de­ci­sion-mak­ing pro­ce­dures, and Soares and Fallen­stein (2015) give sev­eral ex­am­ples in which ar­tifi­cial agents im­ple­ment­ing cer­tain de­ci­sion the­o­ries are vuln­er­a­ble to black­mail.

In dis­cussing the de­ci­sion the­ory im­ple­mented by an agent, we will as­sume that the agent max­i­mizes some form of ex­pected util­ity. Fol­low­ing Gib­bard and Harper (1978), we write the ex­pected util­ity given an ac­tion for a sin­gle-stage de­ci­sion prob­lem in con­text as

where are pos­si­ble out­comes; is the agent’s util­ity func­tion; and stands for a given no­tion of de­pen­dence of out­comes on ac­tions. The de­pen­dence con­cept an agent uses for in part de­ter­mines its de­ci­sion the­ory.

The philo­soph­i­cal liter­a­ture has largely been con­cerned with causal de­ci­sion the­ory (CDT) (Gib­bard and Harper, 1978) an­dev­i­den­tial de­ci­sion the­ory (EDT)(Hor­gan,1981), which are dis­t­in­guished by their han­dling of de­pen­dence.

Causal con­di­tional ex­pec­ta­tions ac­count only for the causal effects of an agent’s ac­tions; in the for­mal­ism of Pearl (2009)’s do-calcu­lus, for in­stance, the rele­vant no­tion of ex­pected util­ity con­di­tional on ac­tion is . EDT, on the other hand, takes into ac­count non-causal de­pen­den­cies be­tween the agent’s ac­tions and the out­come. In par­tic­u­lar, it takes into ac­count the ev­i­dence that tak­ing the ac­tion pro­vides for the ac­tions taken by other agents in the en­vi­ron­ment with whom the de­ci­sion-maker’s ac­tions are de­pen­dent. Thus the ev­i­den­tial ex­pected util­ity is the clas­si­cal con­di­tional ex­pec­ta­tion .

Fi­nally, re­searchers in the AI safety com­mu­nity have more re­cently de­vel­oped what we will re­fer to as log­i­cal de­ci­sion the­o­ries, which em­ploy a third class of de­pen­dence for eval­u­at­ing ac­tions (Dai, 2009; Yud­kowsky, 2009; Yud­kowsky and Soares, 2017). One such the­ory is func­tional de­ci­sion the­ory (FDT) [9], which uses what Yud­kowsky and Soares (2017) re­fer to as sub­junc­tive de­pen­dence. They ex­plain this by stat­ing that “When two phys­i­cal sys­tems are com­put­ing the same func­tion, we will say that their be­hav­iors “sub­junc­tively de­pend’’ upon that func­tion’’ (p. 6). Thus, in FDT, the ex­pected util­ity given an ac­tion is com­puted by de­ter­min­ing what the out­come of the de­ci­sion prob­lem would be if all rele­vant in­stances of the agent’s de­ci­sion-mak­ing al­gorithm out­put .

In this sec­tion, we will as­sume an acausal stance on de­ci­sion the­ory, that is, one other than CDT. There are sev­eral mo­ti­va­tions for us­ing a de­ci­sion the­ory other than CDT:

  • In­tu­itions about the ap­pro­pri­ate de­ci­sions in thought ex­per­i­ments such as New­comb’s prob­lem, as well as defenses of ap­par­ent failures of acausal de­ci­sion the­ory in oth­ers (in par­tic­u­lar, the “tickle defense’’ of ev­i­den­tial de­ci­sion the­ory in the so-called smok­ing le­sion case; see Ahmed (2014) for ex­ten­sive dis­cus­sion);

  • Con­cep­tual difficul­ties with causal­ity (Schaf­fer, 2016);

  • De­mon­stra­tions that agents us­ing CDT are ex­ploitable in var­i­ous ways (Koko­ta­jlo, 2019b; Oester­held and Conitzer, 2019);

  • The ev­i­den­tial­ist wa­ger (MacAskill et al., 2019), which goes roughly as fol­lows: In a large world (more be­low), we can have a far greater in­fluence if we ac­count for the acausal ev­i­dence our ac­tions provide for the ac­tions of oth­ers. So, un­der de­ci­sion-the­o­retic un­cer­tainty, we should wa­ger in fa­vor of de­ci­sion the­o­ries which ac­count for such acausal ev­i­dence.

We con­sider these suffi­cient mo­ti­va­tion to study the im­pli­ca­tions of acausal de­ci­sion the­ory for the rea­son­ing of con­se­quen­tial­ist agents. In par­tic­u­lar, in this sec­tion we take up var­i­ous pos­si­bil­ities for acausal trade be­tween TAI sys­tems. If we ac­count for the ev­i­dence that one’s choices pro­vides for the choices that causally dis­con­nected agents, this opens up both qual­i­ta­tively new pos­si­bil­ities for in­ter­ac­tion and quan­ti­ta­tively many more agents to in­ter­act with. Cru­cially, due to the po­ten­tial scale of value that could be gained or lost via acausal in­ter­ac­tion with vast num­bers of dis­tant agents, en­sur­ing that TAI agents han­dle de­ci­sion-the­o­retic prob­lems cor­rectly may be even more im­por­tant than en­sur­ing that they have the cor­rect goals.

Agents us­ing an acausal de­ci­sion the­ory may co­or­di­nate in the ab­sence of causal in­ter­ac­tion. A con­crete illus­tra­tion is pro­vided in Ex­am­ple 7.2.1, re­pro­duced from Oester­held (2017b)’s ex­am­ple, which is it­self based on an ex­am­ple in Hofs­tadter (1983).


Ex­am­ple 7.2.1 (Hofs­tadter’s ev­i­den­tial co­op­er­a­tion game)

Hofs­tadter sends 20 par­ti­ci­pants the same let­ter, ask­ing them to re­spond with a sin­gle let­ter ‘C’ (for co­op­er­ate) or ‘D’ (for defect) with­out com­mu­ni­cat­ing with each other. Hofs­tadter ex­plains that by send­ing in ‘C’, a par­ti­ci­pant can in­crease ev­ery­one else’s pay­off by $2. By send­ing in ‘D’, par­ti­ci­pants can in­crease their own pay­off by $5. The let­ter ends by in­form­ing the par­ti­ci­pants that they were all cho­sen for their high lev­els of ra­tio­nal­ity and cor­rect de­ci­sion mak­ing in weird sce­nar­ios like this. Note that ev­ery par­ti­ci­pant only cares about the bal­ance of her own bank ac­count and not about Hofs­tadter’s or the other 19 par­ti­ci­pants’. Should you, as a par­ti­ci­pant, re­spond with ‘C’ or ‘D’?

An acausal ar­gu­ment in fa­vor of ‘C’ is: If I play ‘C’, this gives me ev­i­dence that the other par­ti­ci­pants also chose ‘C’. There­fore, even though I can­not cause oth­ers to play ‘C’ — and there­fore, on a CDT anal­y­sis — should play ‘D’ — the con­di­tional ex­pec­ta­tion of my pay­off given that I play ‘C’ is higher than my con­di­tional ex­pec­ta­tion given that I play ‘D’.


We will call this mode of co­or­di­na­tion ev­i­den­tial co­op­er­a­tion.

For a satis­fac­tory the­ory of ev­i­den­tial co­op­er­a­tion, we will need to make pre­cise what it means for agents to be ev­i­den­tially (but not causally) de­pen­dent. There are at least three pos­si­bil­ities.

  1. Agents may tend to make the same de­ci­sions on some refer­ence class of de­ci­sion prob­lems. (That is, for some prob­a­bil­ity dis­tri­bu­tion on de­ci­sion con­texts , is high.)

  2. An agent’s tak­ing ac­tion A in con­text C may provide ev­i­dence about the num­ber of agents in the world who take ac­tions like A in con­texts like C.

  3. If agents have similar source code, their de­ci­sions provide log­i­cal ev­i­dence for their coun­ter­part’s de­ci­sion. (In turn, we would like a rigor­ous ac­count of the no­tion of “source code similar­ity″.)

It is plau­si­ble that we live in an in­finite uni­verse with in­finitely many agents (Teg­mark,2003). In prin­ci­ple, ev­i­den­tial co­op­er­a­tion be­tween agents in dis­tant re­gions of the uni­verse is pos­si­ble; we may call this ev­i­den­tial co­op­er­a­tion in large wor­lds (ECL) [10]. If ECL were fea­si­ble then it is pos­si­ble that it would al­low agents to reap large amounts of value via acausal co­or­di­na­tion. Treut­lein (2019) de­vel­ops a bar­gain­ing model of ECL and lists a num­ber of open ques­tions fac­ing his for­mal­ism. Leskela (2019) ad­dresses fun­da­men­tal limi­ta­tions on simu­la­tions as a tool for learn­ing about dis­tant agents, which may be re­quired to gain from ECL and other forms of “acausal trade″. Fi­nally, Yud­kowsky (n.d.) lists po­ten­tial down­sides to which agents may be ex­posed by rea­son­ing about dis­tant agents. The is­sues dis­cussed by these au­thors, and per­haps many more, will need to be ad­dressed in or­der to es­tab­lish ECL and acausal trade as se­ri­ous pos­si­bil­ities. Nev­er­the­less, the stakes strike us as great enough to war­rant fur­ther study.

Ac­knowl­edge­ments & References


  1. Cf. dis­cus­sion of the Ma­chine In­tel­li­gence Re­search In­sti­tute foun­da­tional re­search and its ap­pli­ca­bil­ity to ma­chine-learn­ing-driven sys­tems Tay­lor (2016); Dewey (2017). ↩︎

  2. For other pro­pos­als for foun­da­tional re­search mo­ti­vated by a con­cern with im­prov­ing the long-term fu­ture, see for in­stance the re­search agen­das of the Global Pri­ori­ties Re­search In­sti­tute (Greaves et al., 2019) (es­pe­cially Sec­tions 2.1 and 2.2 and Ap­pendix B) and the Ma­chine In­tel­li­gence Re­search In­sti­tute (Soare­sand Fallen­stein, 2017; Garrabrant and Dem­ski, 2018). ↩︎

  3. This sub­sec­tion was de­vel­oped from an early-stage draft by Cas­par Oester­held and Jo­hannes Treut­lein. ↩︎

  4. Con­sider, for in­stance, that most of us are un­cer­tain about the value of the digit of , de­spite the fact that its value log­i­cally fol­lows from what we know about math­e­mat­ics. ↩︎

  5. This prob­lem has been ad­dressed in two ways. The first is sim­ply to posit that the agent rea­sons over an ex­tremely rich class of hy­pothe­ses, per­haps one rich enough to cap­ture all of the im­por­tant pos­si­bil­ities. An ex­am­ple of such a the­ory is Solomonoff in­duc­tion (Solomonoff, 1964; Sterken­burg, 2013), in which ev­i­dence takes the form of a data stream re­ceived via the agent’s sen­sors, and the hy­pothe­ses cor­re­spond to all pos­si­ble “lower semi-com­putable’’ gen­er­a­tors of such data streams. But Solomonoff in­duc­tion is in­com­putable and its com­putable ap­prox­i­ma­tions are still in­tractable. The other ap­proach is to al­low agents to have in­com­plete sets of hy­pothe­ses, and in­tro­duce an ad­di­tional rule by which hy­pothe­ses may be added to the hy­poth­e­sis space (Wen­mack­ers and Romeijn, 2016). This sort of strat­egy seems to be the way for­ward for an ad­e­quate the­ory of bounded ra­tio­nal­ity in the spirit of Bayesi­anism. How­ever, to our knowl­edge, there is no de­ci­sion the­ory which ac­counts for pos­si­ble amend­ments to the agent’s hy­poth­e­sis space. ↩︎

  6. See Sec­tion 4.1 for dis­cus­sion of games of in­com­plete in­for­ma­tion and pos­si­ble limi­ta­tions of Bayesian games. ↩︎

  7. This sub­sec­tion was de­vel­oped from an early-stage draft by Daniel Koko­ta­jlo and Jo­hannes Treut­lein. ↩︎

  8. In New­comb’s prob­lem, a player is faced with two boxes: a clear box which con­tains $1000, and an opaque box which con­tains ei­ther $0 or $1 mil­lion. They are given a choice be­tween choos­ing both boxes (Two-Box­ing) or choos­ing only the opaque box (One-Box­ing). They are told that, be­fore they were pre­sented with this choice, a highly re­li­able pre­dic­tor placed $1 mil­lion in the opaque box if they pre­dicted that the player would One-Box, and put $0 in the opaque box if they pre­dicted that the player would Two-Box. There are two stan­dard lines of ar­gu­ment about what the player should do. The first is a causal dom­i­nance ar­gu­ment which says that, be­cause the player can­not cause money to be placed in the opaque box, they will always get at least as much money by tak­ing both boxes than by tak­ing one. The sec­ond is a con­di­tional ex­pec­ta­tion ar­gu­ment which says that (be­cause the pre­dic­tor is highly re­li­able) One-Box­ing pro­vides strong ev­i­dence that there is $1 mil­lion in the opaque box, and there­fore the player should One-Box on the grounds that the con­di­tional ex­pected pay­off given One-Box­ing is higher than that of Two-Box­ing. Th­ese are ex­am­ples of causal and ev­i­den­tial de­ci­sion-the­o­retic rea­son­ing, re­spec­tively. ↩︎

  9. Note that the lit­tle pub­lic dis­cus­sion of FDT by aca­demic philoso­phers has been largely crit­i­cal (Sch­warz, 2018; MacAskill, 2019). ↩︎

  10. Oester­held (2017b), who in­tro­duced the idea, calls this “mul­ti­verse-wide su­per­ra­tional­ity’’, fol­low­ing Hofs­tadter (1983)’s use of “su­per­ra­tional’’ to de­scribe agents who co­or­di­nate acausally. ↩︎