Anthropic decision theory for selfish agents

Con­sider Nick Bostrom’s In­cu­ba­tor Gedanken­ex­per­i­ment, phrased as a de­ci­sion prob­lem. In my mind, this pro­vides the purest and sim­plest ex­am­ple of a non-triv­ial an­thropic de­ci­sion prob­lem. In an oth­er­wise empty world, the In­cu­ba­tor flips a coin. If the coin comes up heads, it cre­ates one hu­man, while if the coin comes up tails, it cre­ates two hu­mans. Each cre­ated hu­man is put into one of two in­dis­t­in­guish­able cells, and there’s no way for cre­ated hu­mans to tell whether an­other hu­man has been cre­ated or not. Each cre­ated hu­man is offered the pos­si­bil­ity to buy a lot­tery ticket which pays 1$ if the coin has shown tails. What is the max­i­mal price that you would pay for such a lot­tery ticket? (Utility is pro­por­tional to Dol­lars.) The two tra­di­tional an­swers are 1/​2$ and 2/​3$.

We can try to an­swer this ques­tion for agents with differ­ent util­ity func­tions: to­tal util­i­tar­i­ans; av­er­age util­i­tar­i­ans; and self­ish agents. UDT’s an­swer is that to­tal util­i­tar­i­ans should pay up to 2/​3$, while av­er­age util­i­tar­i­ans should pay up to 1/​2$; see Stu­art Arm­strong’s pa­per and Wei Dai’s com­ment. There are some heuris­tic ways to ar­rive at UDT prescprip­tions, such as ask­ing “What would I have pre­com­mited to?” or ar­gu­ing based on re­flec­tive con­sis­tency. For ex­am­ple, a CDT agent that ex­pects to face Coun­ter­fac­tual Mug­ging-like situ­a­tions in the fu­ture (with pre­dic­tions also made in the fu­ture) will self-mod­ify to be­come an UDT agent, i.e., one that pays the coun­ter­fac­tual mug­ger.

Now, these kinds of heuris­tics are not ap­pli­ca­ble to the In­cu­ba­tor case. It is mean­ingless to ask “What max­i­mal price should I have pre­com­mited to?” or “At what odds should I bet on coin flips of this kind in the fu­ture?”, since the very point of the Gedanken­ex­per­i­ment is that the agent’s ex­is­tence is con­tin­gent upon the out­come of the coin flip. Can we come up with a differ­ent heuris­tic that leads to the cor­rect an­swer? Imag­ine that the In­cu­ba­tor’s sub­rou­tine that is re­spon­si­ble for cre­at­ing the hu­mans is com­pletely benev­olent to­wards them (let’s call this the “Benev­olent Creator”). (We as­sume here that the hu­mans’ goals are iden­ti­cal, such that the no­tion of benev­olence to­wards all hu­mans is com­pletely un­prob­le­matic.) The Benev­olent Creator has the power to pro­gram a cer­tain max­i­mal price the hu­mans pay for the lot­tery tick­ets into them. A mo­ment’s thought shows that this leads in­deed to UDT’s an­swers for av­er­age and to­tal util­i­tar­i­ans. For ex­am­ple, con­sider the case of to­tal util­i­tar­i­ans. If the hu­mans pay x$ for the lot­tery tick­ets, the ex­pected util­ity is 1/​2*(-x) + 1/​2*2*(1-x). So in­deed, the break-even price is reached for x=2/​3.

But what about self­ish agents? For them, the Benev­olent Creator heuris­tic is no longer ap­pli­ca­ble. Since the hu­mans’ goals do not al­ign, the Creator can­not share them. As Wei Dai writes, the no­tion of self­ish val­ues does not fit well with UDT. In An­thropic de­ci­sion the­ory, Stu­art Arm­strong ar­gues that self­ish agents should pay up to 1/​2$ (Sec. 3.3.3). His ar­gu­ment is based on an alleged iso­mor­phism be­tween the av­er­age util­i­tar­ian and the self­ish case. (For in­stance, donat­ing 1$ to each hu­man in­creases util­ity by 1 for both av­er­age util­i­tar­ian and self­ish agents, while it in­creases util­ity by 2 for to­tal util­i­tar­i­ans in the tails world.) Here, I want to ar­gue that this is in­cor­rect and that self­ish agents should pay up to 2/​3$ for the lot­tery tick­ets.

(Need­less to say that all the bold state­ments I’m about to make are based on an “in­side view”. An “out­side view” tells me that Stu­art Arm­strong has thought much more care­fully about these is­sues than I have, and has dis­cussed them with a lot of smart peo­ple, which I haven’t, so chances are my ar­gu­ments are flawed some­how.)

In or­der to make my ar­gu­ment, I want to in­tro­duce yet an­other heuris­tic, which I call the Sub­mis­sive Gnome. Sup­pose each cell con­tains a gnome which is already pre­sent be­fore the coin is flipped. As soon as it sees a hu­man in its cell, it in­stantly adopts the hu­man’s goal. From the gnome’s per­spec­tive, SIA odds are clearly cor­rect: Since a hu­man is twice as likely to ap­pear in the gnome’s cell if the coin shows tails, Bayes’ The­o­rem im­plies that the prob­a­bil­ity of tails is 23 from the gnome’s per­spec­tive once it has seen a hu­man. There­fore, the gnome would ad­vise the self­ish hu­man to pay up to 2/​3$ for a lot­tery ticket that pays 1$ in the tails world. I don’t see any rea­son why the self­ish agent shouldn’t fol­low the gnome’s ad­vice. From the gnome’s per­spec­tive, the prob­lem is not even “an­thropic” in any sense, there’s just straight­for­ward Bayesian up­dat­ing.

Sup­pose we want to use the Sub­mis­sive Gnome heuris­tic to solve the prob­lem for util­i­tar­ian agents. (ETA:
To­tal/​av­er­age util­i­tar­i­anism in­cludes the well-be­ing and pop­u­la­tion of hu­mans only, not of gnomes.) The gnome rea­sons as fol­lows: “With prob­a­bil­ity 23, the coin has shown tails. For an av­er­age util­i­tar­ian, the ex­pected util­ity af­ter pay­ing x$ for a ticket is 1/​3*(-x)+2/​3*(1-x), while for a to­tal util­i­tar­ian the ex­pected util­ity is 1/​3*(-x)+2/​3*2*(1-x). Aver­age and to­tal util­i­tar­i­ans should thus pay up to 2/​3$ and 4/​5$, re­spec­tively.” The gnome’s ad­vice dis­agrees with UDT and the solu­tion based on the Benev­olent Creator. Some­thing has gone ter­ribly wrong here, but what? The mis­take in the gnome’s rea­son­ing here is in fact perfectly iso­mor­phic to the mis­take in the rea­son­ing lead­ing to the “yea” an­swer in Psy-Kosh’s non-an­thropic prob­lem.

Things be­come clear if we look at the prob­lem from the gnome’s per­spec­tive be­fore the coin is flipped. As­sume, for sim­plic­ity, that there are only two cells and gnomes, 1 and 2. If the coin shows heads, the sin­gle hu­man is placed in cell 1 and cell 2 is left empty. Since the hu­mans don’t know in which cell they are, nei­ther should the gnomes know. So from each gnome’s per­spec­tive, there are four equiprob­a­ble “wor­lds”: it can be in cell 1 or 2 and the coin flip can re­sult in heads or tails. We as­sume, of course, that the two gnomes are, like the hu­mans, suffi­ciently similar such that their de­ci­sions are “linked”.

We can as­sume that the gnomes already know what util­ity func­tions the hu­mans are go­ing to have. If the hu­mans will be (to­tal/​av­er­age) util­i­tar­i­ans, we can then even as­sume that the gnomes already are so, too, since the well-be­ing of each hu­man is as im­por­tant as that of any other. Cru­cially, then, for both util­i­tar­ian util­ity func­tions, the ques­tion whether the gnome is in cell 1 or 2 is ir­rele­vant. There is just one “gnome ad­vice” that is given iden­ti­cally to all (one or two) hu­mans. Whether this ad­vice is given by one gnome or the other or both of them is ir­rele­vant from both gnomes’ per­spec­tive. The al­ign­ment of the hu­mans’ goals leads to al­ign­ment of the gnomes’ goals. The ex­pected util­ity of some ad­vice can sim­ply be calcu­lated by tak­ing prob­a­bil­ity 12 for both heads and tails, and in­tro­duc­ing a fac­tor of 2 in the to­tal util­i­tar­ian case, lead­ing to the an­swers 12 and 23, in ac­cor­dance with UDT and the Benev­olent Creator.

The situ­a­tion looks differ­ent if the hu­mans are self­ish. We can no longer as­sume that the gnomes already have a util­ity func­tion. The gnome can­not yet care about that hu­man, since with prob­a­bil­ity 14 (if the gnome is in cell 2 and the coin shows heads) there will not be a hu­man to care for. (By con­trast, it is already pos­si­ble to care about the av­er­age util­ity of all hu­mans there will be, which is where the alleged iso­mor­phism be­tween the two cases breaks down.) It is still true that there is just one “gnome ad­vice” that is given iden­ti­cally to all (one or two) hu­mans, but the method for calcu­lat­ing the op­ti­mal ad­vice now differs. In three of the four equiprob­a­ble “wor­lds” the gnome can live in, a hu­man will ap­pear in its cell af­ter the coin flip. Two out of these three are tail wor­lds, so the gnome de­cides to ad­vise pay­ing up to 2/​3$ for the lot­tery ticket if a hu­man ap­pears in its cell.

There is a way to re­store the equiv­alence be­tween the av­er­age util­i­tar­ian and the self­ish case. If the hu­mans will be self­ish, we can say that the gnome cares about the av­er­age well-be­ing of the three hu­mans which will ap­pear in its cell with equal like­li­hood: the hu­man cre­ated af­ter heads, the first hu­man cre­ated af­ter tails, and the sec­ond hu­man cre­ated af­ter tails. The gnome ex­pects to adopt each of these three hu­mans’ self­ish util­ity func­tion with prob­a­bil­ity 14. It makes thus sense to say that the gnome cares about the av­er­age well-be­ing of these three hu­mans. This is the cor­rect cor­re­spon­dence be­tween self­ish and av­er­age util­i­tar­ian val­ues and it leads, again, to the con­clu­sion that the cor­rect ad­vise is to pay up to 2/​3$ for the lot­tery ticket.

In An­thropic Bias, Nick Bostrom ar­gues that each hu­man should as­sign prob­a­bil­ity 12 to the coin hav­ing shown tails (“SSA odds”). He also in­tro­duces the pos­si­ble an­swer 23 (“SSA+SIA”, nowa­days usu­ally sim­ply called “SIA”) and re­futes it. SIA odds have been defended by Olum. The main ar­gu­ment against SIA is the Pre­sump­tu­ous Philoso­pher. Main ar­gu­ments for SIA and against SSA odds are that SIA avoids the Dooms­day Ar­gu­ment1, which most peo­ple feel has to be wrong, that SSA odds de­pend on whom you con­sider to be part of your “refer­ence class”, and fur­ther­more, as pointed out by Bostrom him­self, that SSA odds al­low for acausal su­per­pow­ers.

The con­sen­sus view on LW seems to be that much of the SSA vs. SIA de­bate is con­fused and due to dis­cussing prob­a­bil­ities de­tached from de­ci­sion prob­lems of agents with spe­cific util­ity func­tions. (ETA: At least this was the im­pres­sion I got. Two com­menters have ex­pressed scep­ti­cism about whether this is re­ally the con­sen­sus view.) I think that “What are the odds at which a self­ish agent should bet on tails?” is the most sen­si­ble trans­la­tion of “What is the prob­a­bil­ity that the coin has shown tails?” into a de­ci­sion prob­lem. Since I’ve ar­gued that self­ish agents should take bets fol­low­ing SIA odds, one can em­ploy the Pre­sump­tu­ous Philoso­pher ar­gu­ment against my con­clu­sion: it seems to im­ply that self­ish agents, like to­tal but un­like av­er­age util­i­tar­i­ans, should bet at ex­treme odds on liv­ing in a ex­tremely large uni­verse, even if there’s no em­piri­cal ev­i­dence in fa­vor of this. I don’t think this coun­ter­ar­gu­ment is very strong. How­ever, since this post is already quite lengthy, I’ll elab­o­rate more on this if I get en­courag­ing feed­back for this post.

1 At least its stan­dard ver­sion. SIA comes with its own Dooms­day con­clu­sions, cf. Katja Grace’s the­sis An­thropic Rea­son­ing in the Great Filter.