Outlawing Anthropics: An Updateless Dilemma

Let us start with a (non-quan­tum) log­i­cal coin­flip—say, look at the heretofore-un­known-to-us-per­son­ally 256th bi­nary digit of pi, where the choice of bi­nary digit is it­self in­tended not to be ran­dom.

If the re­sult of this log­i­cal coin­flip is 1 (aka “heads”), we’ll cre­ate 18 of you in green rooms and 2 of you in red rooms, and if the re­sult is “tails” (0), we’ll cre­ate 2 of you in green rooms and 18 of you in red rooms.

After go­ing to sleep at the start of the ex­per­i­ment, you wake up in a green room.

With what de­gree of cre­dence do you be­lieve—what is your pos­te­rior prob­a­bil­ity—that the log­i­cal coin came up “heads”?

There are ex­actly two ten­able an­swers that I can see, “50%” and “90%”.

Sup­pose you re­ply 90%.

And sup­pose you also hap­pen to be “al­tru­is­tic” enough to care about what hap­pens to all the copies of your­self. (If your cur­rent sys­tem cares about your­self and your fu­ture, but doesn’t care about very similar xe­rox-siblings, then you will tend to self-mod­ify to have fu­ture copies of your­self care about each other, as this max­i­mizes your ex­pec­ta­tion of pleas­ant ex­pe­rience over fu­ture selves.)

Then I at­tempt to force a re­flec­tive in­con­sis­tency in your de­ci­sion sys­tem, as fol­lows:

I in­form you that, af­ter I look at the un­known bi­nary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to ev­ery ver­sion of you in a green room and steal $3 from ev­ery ver­sion of you in a red room. If they all re­ply “Yes”, I will do so.

(It will be un­der­stood, of course, that $1 rep­re­sents 1 utilon, with ac­tual mon­e­tary amounts rescaled as nec­es­sary to make this hap­pen. Very lit­tle rescal­ing should be nec­es­sary.)

(Time­less de­ci­sion agents re­ply as if con­trol­ling all similar de­ci­sion pro­cesses, in­clud­ing all copies of them­selves. Clas­si­cal causal de­ci­sion agents, to re­ply “Yes” as a group, will need to some­how work out that other copies of them­selves re­ply “Yes”, and then re­ply “Yes” them­selves. We can try to help out the causal de­ci­sion agents on their co­or­di­na­tion prob­lem by sup­ply­ing rules such as “If con­flict­ing an­swers are de­liv­ered, ev­ery­one loses $50″. If causal de­ci­sion agents can win on the prob­lem “If ev­ery­one says ‘Yes’ you all get $10, if ev­ery­one says ‘No’ you all lose $5, if there are con­flict­ing an­swers you all lose $50” then they can pre­sum­ably han­dle this. If not, then ul­ti­mately, I de­cline to be re­spon­si­ble for the stu­pidity of causal de­ci­sion agents.)

Sup­pose that you wake up in a green room. You rea­son, “With 90% prob­a­bil­ity, there are 18 of me in green rooms and 2 of me in red rooms; with 10% prob­a­bil­ity, there are 2 of me in green rooms and 18 of me in red rooms. Since I’m al­tru­is­tic enough to at least care about my xe­rox-siblings, I calcu­late the ex­pected util­ity of re­ply­ing ‘Yes’ as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60.” You re­ply yes.

How­ever, be­fore the ex­per­i­ment, you calcu­late the gen­eral util­ity of the con­di­tional strat­egy “Re­ply ‘Yes’ to the ques­tion if you wake up in a green room” as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your fu­ture selves to re­ply ‘No’ un­der these con­di­tions.

This is a dy­namic in­con­sis­tency—differ­ent an­swers at differ­ent times—which ar­gues that de­ci­sion sys­tems which up­date on an­thropic ev­i­dence will self-mod­ify not to up­date prob­a­bil­ities on an­thropic ev­i­dence.

I origi­nally thought, on first for­mu­lat­ing this prob­lem, that it had to do with dou­ble-count­ing the utilons gained by your vari­able num­bers of green friends, and the prob­a­bil­ity of be­ing one of your green friends.

How­ever, the prob­lem also works if we care about pa­per­clips. No self­ish­ness, no al­tru­ism, just pa­per­clips.

Let the dilemma be, “I will ask all peo­ple who wake up in green rooms if they are will­ing to take the bet ‘Create 1 pa­per­clip if the log­i­cal coin­flip came up heads, de­stroy 3 pa­per­clips if the log­i­cal coin­flip came up tails’. (Should they dis­agree on their an­swers, I will de­stroy 5 pa­per­clips.)” Then a pa­per­clip max­i­mizer, be­fore the ex­per­i­ment, wants the pa­per­clip max­i­miz­ers who wake up in green rooms to re­fuse the bet. But a con­scious pa­per­clip max­i­mizer who up­dates on an­thropic ev­i­dence, who wakes up in a green room, will want to take the bet, with ex­pected util­ity ((90% * +1 pa­per­clip) + (10% * −3 pa­per­clips)) = +0.6 pa­per­clips.

This ar­gues that, in gen­eral, de­ci­sion sys­tems—whether they start out self­ish, or start out car­ing about pa­per­clips—will not want their fu­ture ver­sions to up­date on an­thropic “ev­i­dence”.

Well, that’s not too dis­turb­ing, is it? I mean, the whole an­thropic thing seemed very con­fused to be­gin with—full of no­tions about “con­scious­ness” and “re­al­ity” and “iden­tity” and “refer­ence classes” and other poorly defined terms. Just throw out an­thropic rea­son­ing, and you won’t have to bother.

When I ex­plained this prob­lem to Mar­cello, he said, “Well, we don’t want to build con­scious AIs, so of course we don’t want them to use an­thropic rea­son­ing”, which is a fas­ci­nat­ing sort of re­ply. And I re­sponded, “But when you have a prob­lem this con­fus­ing, and you find your­self want­ing to build an AI that just doesn’t use an­thropic rea­son­ing to be­gin with, maybe that im­plies that the cor­rect re­s­olu­tion in­volves us not us­ing an­thropic rea­son­ing ei­ther.”

So we can just throw out an­thropic rea­son­ing, and re­lax, and con­clude that we are Boltz­mann brains. QED.

In gen­eral, I find the sort of ar­gu­ment given here—that a cer­tain type of de­ci­sion sys­tem is not re­flec­tively con­sis­tent—to be pretty damned com­pel­ling. But I also find the Boltz­mann con­clu­sion to be, ahem, more than or­di­nar­ily un­palat­able.

In per­sonal con­ver­sa­tion, Nick Bostrom sug­gested that a di­vi­sion-of-re­spon­si­bil­ity prin­ci­ple might can­cel out the an­thropic up­date—i.e., the pa­per­clip max­i­mizer would have to rea­son, “If the log­i­cal coin came up heads then I am 1/​18th re­spon­si­ble for adding +1 pa­per­clip, if the log­i­cal coin came up tails then I am 12 re­spon­si­ble for de­stroy­ing 3 pa­per­clips.” I con­fess that my ini­tial re­ac­tion to this sug­ges­tion was “Ewwww”, but I’m not ex­actly com­fortable con­clud­ing I’m a Boltz­mann brain, ei­ther.

EDIT: On fur­ther re­flec­tion, I also wouldn’t want to build an AI that con­cluded it was a Boltz­mann brain! Is there a form of in­fer­ence which re­jects this con­clu­sion with­out rely­ing on any rea­son­ing about sub­jec­tivity?

EDIT2: Psy-Kosh has con­verted this into a non-an­thropic prob­lem!