# A Problem About Bargaining and Logical Uncertainty

Sup­pose you wake up as a pa­per­clip max­i­mizer. Omega says “I calcu­lated the mil­lionth digit of pi, and it’s odd. If it had been even, I would have made the uni­verse ca­pa­ble of pro­duc­ing ei­ther 1020 pa­per­clips or 1010 sta­ples, and given con­trol of it to a sta­ples max­i­mizer. But since it was odd, I made the uni­verse ca­pa­ble of pro­duc­ing 1010 pa­per­clips or 1020 sta­ples, and gave you con­trol.” You dou­ble check Omega’s pi com­pu­ta­tion and your in­ter­nal calcu­la­tor gives the same an­swer.

Then a sta­ples max­i­mizer comes to you and says, “You should give me con­trol of the uni­verse, be­cause be­fore you knew the mil­lionth digit of pi, you would have wanted to pre-com­mit to a deal where each of us would give the other con­trol of the uni­verse, since that gives you 12 prob­a­bil­ity of 1020 pa­per­clips in­stead of 12 prob­a­bil­ity of 1010 pa­per­clips.”

Is the sta­ples max­i­mizer right? If so, the gen­eral prin­ci­ple seems to be that we should act as if we had pre­com­mited to a deal we would have made in ig­no­rance of log­i­cal facts we ac­tu­ally pos­sess. But how far are we sup­posed to push this? What deal would you have made if you didn’t know that the first digit of pi was odd, or if you didn’t know that 1+1=2?

On the other hand, sup­pose the sta­ples max­i­mizer is wrong. Does that mean you also shouldn’t agree to ex­change con­trol of the uni­verse be­fore you knew the mil­lionth digit of pi?

To make this more rele­vant to real life, con­sider two hu­mans ne­go­ti­at­ing over the goal sys­tem of an AI they’re jointly build­ing. They have a lot of ig­no­rance about the rele­vant log­i­cal facts, like how smart/​pow­er­ful the AI will turn out to be and how effi­cient it will be in im­ple­ment­ing each of their goals. They could ne­go­ti­ate a solu­tion now in the form of a weighted av­er­age of their util­ity func­tions, but the weights they choose now will likely turn out to be “wrong” in full view of the rele­vant log­i­cal facts (e.g., the ac­tual shape of the util­ity-pos­si­bil­ity fron­tier). Or they could pro­gram their util­ity func­tions into the AI sep­a­rately, and let the AI de­ter­mine the weights later us­ing some for­mal bar­gain­ing solu­tion when it has more knowl­edge about the rele­vant log­i­cal facts. Which is the right thing to do? Or should they fol­low the sta­ples max­i­mizer’s rea­son­ing and bar­gain un­der the pre­tense that they know even less than they ac­tu­ally do?

• To make this more rele­vant to real life, con­sider two hu­mans ne­go­ti­at­ing over the goal sys­tem of an AI they’re jointly build­ing.

“To give a prac­ti­cal down to earth ex­am­ple, …”

• Per­haps a more down to earth ex­am­ple would be value con­flict within an in­di­vi­d­ual. Without this prob­lem with log­i­cal un­cer­tainty, your con­flict­ing selves should just merge into one agent with a weighted av­er­age of their util­ity func­tions. This prob­lem sug­gests that maybe you should keep those con­flict­ing selves around un­til you know more log­i­cal facts.

• Right. But this is also the de­fault safety op­tion, you don’t throw away in­for­ma­tion if you don’t have a pre­cise un­der­stand­ing of its ir­rele­vance (given that it’s not that costly to keep), and we didn’t have such un­der­stand­ing.

• Is Omega even nec­es­sary to this prob­lem?

I would con­sider trans­fer­ring con­trol to sta­ply if and only if I were sure that sta­ply would make the same de­ci­sion were our po­si­tions re­versed (in this way it’s rem­i­nis­cent of the pris­oner’s dilemma). If I were so con­vinced, then shouldn’t I con­sider sta­ply’s ar­gu­ment even in a situ­a­tion with­out Omega?

If sta­ply is in fact us­ing the same de­ci­sion al­gorithms I am, then he shouldn’t even have to voice the offer. I should ar­rive at the con­clu­sion that he should con­trol the uni­verse as soon as I find out that it can pro­duce more sta­ples than pa­per­clips, whether it’s a rev­e­la­tion from Omega or the re­sult of cos­molog­i­cal re­search.

My in­tu­ition rebels at this con­clu­sion, but I think it’s be­ing mis­led by heuris­tics. A hu­man could not con­vince me of this pro­posal, but that’s be­cause I can’t know we share de­ci­sion al­gorithms (i.e. that s/​he would definitely do the same in my place).

This looks to me like a pris­oner’s dilemma prob­lem where ex­pected util­ity de­pends on a log­i­cal un­cer­tainty. I think I would co­op­er­ate with pris­on­ers who have differ­ent util­ity func­tions as long as they share my de­ci­sion the­ory.

(Dis­claimers: I have read most of the rele­vant LW posts on these top­ics, but have never jumped into dis­cus­sion on them and claim no ex­per­tise. I would ap­pre­ci­ate cor­rec­tions if I mi­s­un­der­stand any­thing.)

• Per­haps I am miss­ing some­thing, but if my util­ity func­tion is based on pa­per clips, how do I ever ar­rive at the con­clu­sion that Sta­ply should be in charge? I get no util­ity from it, un­less my util­ity func­tion has an even higher value on al­low­ing en­tities with util­ity func­tions that cre­ate a larger out­put than mine take prece­dence over my own util­ity on pa­per clips.

• (I’ll re­view some mo­ti­va­tions for de­ci­sion the­o­ries in the con­text of Coun­ter­fac­tual Mug­ging, lead­ing to the an­swer.)

Precom­mit­ment in the past, where it’s al­lowed, was a CDT-style solu­tion to prob­lems like this. You’d try mak­ing the most gen­eral pos­si­ble pre­com­mit­ment as far in the past as pos­si­ble that would re­spond to any pos­si­ble fu­ture ob­ser­va­tions. This had two se­vere prob­lems: it’s not always pos­si­ble to be far enough in the past to make pre­com­mit­ments that would co­or­di­nate all rele­vant fu­ture events, and you have to plan ev­ery pos­si­ble de­tail of fu­ture events in ad­vance.

TDT par­tially re­solves such prob­lems by im­ple­ment­ing co­or­di­nated de­ci­sions among the in­stances of the agent within agent’s cur­rent wor­lds (per­mit­ted by ob­ser­va­tions so far) that share the same epistemic state (or its as­pects rele­vant to the de­ci­sion) and de­cide for all of them­selves to­gether, so ar­rive at the same de­ci­sion. (It makes sense for the de­ci­sion to be a strat­egy that then can take into ac­count ad­di­tional in­for­ma­tion differ­en­ti­at­ing the in­stances of the agent.) This is enough for New­comb’s prob­lem and (some ver­sions of) Pri­soner’s Dilemma, but where co­or­di­na­tion of agents in mu­tu­ally ex­clu­sive coun­ter­fac­tu­als are con­cerned, some of the tools break down.

Coun­ter­fac­tual Mug­ging both con­cerns agents lo­cated in mu­tu­ally ex­clu­sive coun­ter­fac­tu­als, and ex­plic­itly for­bids the agent to be pre­sent in the past to make a pre­com­mit­ment, so TDT fails to ap­ply. In this case, UDT (not rely­ing on causal graphs) can define a com­mon de­ci­sion prob­lem shared by the agents from differ­ent coun­ter­fac­tu­als, if these agents can be first re­duced to a shared epistemic state, so that all of them would ar­rive at the same de­ci­sion (which takes the form of a strat­egy), which is then given each agent’s par­tic­u­lar ad­di­tional knowl­edge that differ­en­ti­ates it from the other agents within the group that makes the co­or­di­nated de­ci­sion.

In the most gen­eral case, where we at­tempt to co­or­di­nate among all UDT agents, these agents ar­rive, with­out us­ing any knowl­edge other than what can be gen­er­ated by pure in­fer­ence (as­sumed com­mon among these agents), at a sin­gle global strat­egy that speci­fies the moves of all agents (de­pend­ing on each agent’s par­tic­u­lar knowl­edge and ob­ser­va­tions). How­ever, when ap­plied to a sim­ple situ­a­tion like Coun­ter­fac­tual Mug­ging, an agent only needs to purge it­self of one bit of knowl­edge (iden­ti­fy­ing an agent) and se­lect a sim­ple co­or­di­nated strat­egy (for both agents) that takes that bit back as in­put to pro­duce a con­crete ac­tion.

So this takes us the whole cir­cle, from de­cid­ing in a mo­ment, to de­cid­ing (on a pre­com­mit­ment) in ad­vance, and to de­cid­ing (on a co­or­di­nated strat­egy) in the pre­sent (of each in­stance). How­ever, the con­di­tion for pro­duc­ing a co­or­di­nated strat­egy in the pre­sent is differ­ent from that for pro­duc­ing a pre­com­mit­ment in the past: all we need is shared state of knowl­edge among the to-be-co­or­di­nated agents, and not the state of knowl­edge they could’ve shared in the past, if they were to at­tempt a pre­com­mit­ment.

So for this prob­lem, in co­or­di­nat­ing with the other player (which let’s as­sume ab­stractly ex­ists, even if with mea­sure 0), you can use your knowl­edge of the mil­lionth digit of pi, since both play­ers share it. And us­ing this shared knowl­edge, the strat­egy you both ar­rive at would fa­vor the world that’s per­mit­ted by that value, in this case the pa­per­clip world, the other world doesn’t mat­ter, con­trary to what would be the case with a coin toss in­stead of the ac­cessible ab­stract fact. And since the other player has noth­ing of value to offer, you take the whole pie.

• Sup­pose you’re cur­rently run­ning a de­ci­sion the­ory that would “take the whole pie” in this situ­a­tion. Now what if Omega first in­formed you of the setup with­out tel­ling you what the mil­lionth digit of pi is, and gave you a chance to self-mod­ify? And sup­pose you don’t have enough com­put­ing power to com­pute the digit your­self at this point. Doesn’t it seems right to self-mod­ify into some­one who would give con­trol of the uni­verse to the sta­ples max­i­mizer, since that gives you 12 “log­i­cal” prob­a­bil­ity of 10^20 pa­per­clips in­stead of 12 “log­i­cal” prob­a­bil­ity of 10^10 pa­per­clips? What is wrong with this rea­son­ing? And if it is wrong, both UDT1 and UDT2 are wrong since UDT1 would self-mod­ify and UDT2 would give con­trol to the sta­ples max­i­mizer with­out hav­ing to self-mod­ify, so what’s the right de­ci­sion the­ory?

• And sup­pose you don’t have enough com­put­ing power to com­pute the digit your­self at this point. Doesn’t it seems right to self-mod­ify into some­one who would give con­trol of the uni­verse to the sta­ples max­i­mizer, since that gives you 12 “log­i­cal” prob­a­bil­ity of 10^20 pa­per­clips in­stead of 12 “log­i­cal” prob­a­bil­ity of 10^10 pa­per­clips?

Do you mean that I won’t have enough com­put­ing power also later, af­ter the sta­ple max­i­mizer’s pro­posal is stated, or that there isn’t enough com­put­ing power just dur­ing the thought ex­per­i­ment? (In the lat­ter case, I make the de­ci­sion to think long enough to com­pute the digit of pi be­fore mak­ing a de­ci­sion.)

What does it mean to self-mod­ify if no ac­tion is be­ing performed, that is any de­ci­sion re­gard­ing that ac­tion could be com­puted later with­out any pre­ced­ing pre­com­mit­ments?

(One way in which a “self-mod­ifi­ca­tion” might be use­ful is when you won’t have enough com­pu­ta­tional power in the fu­ture to waste what com­pu­ta­tional power you have cur­rently, and so you must make de­ci­sions con­tin­u­ously that take away some op­tions from the fu­ture (per­haps by chang­ing in­stru­men­tal pri­or­ity rather than per­ma­nently ar­rest­ing op­por­tu­nity to re­con­sider) and thereby sim­plify the fu­ture de­ci­sion-mak­ing at the cost of mak­ing it less op­ti­mal. Another is where you have to sig­nal pre­com­mit­ment to other play­ers that wouldn’t be able to fol­low your more com­pli­cated fu­ture rea­son­ing.)

• Do you mean that I won’t have enough com­put­ing power also later, af­ter the sta­ple max­i­mizer’s pro­posal is stated, or that there isn’t enough com­put­ing power just dur­ing the thought ex­per­i­ment?

You will have enough com­put­ing power later.

What does it mean to self-mod­ify if no ac­tion is be­ing performed, that is any de­ci­sion re­gard­ing that ac­tion could be com­puted later with­out any pre­ced­ing pre­com­mit­ments?

I mean sup­pose Omega gives you the op­tion (now, when you don’t have enough com­put­ing power to com­pute the mil­lionth digit of pi) of re­plac­ing your­self with an­other AI that has a differ­ent de­ci­sion the­ory, one that would later give con­trol of the uni­verse to the sta­ples max­i­mizer. Should you take this op­tion? If not, what de­ci­sion the­ory would re­fuse it? (Again, from your cur­rent per­spec­tive, tak­ing the op­tion gives you 12 “log­i­cal” prob­a­bil­ity of 10^20 pa­per­clips in­stead of 12 “log­i­cal” prob­a­bil­ity of 10^10 pa­per­clips. How do you jus­tify re­fus­ing this?)

• (con­tin­u­ing from here)

I’ve changed my mind back. The 10^20 are only on the table for the loser, and can be given by the win­ner. When the win­ner/​loser sta­tus is un­known, a win­ner might co­op­er­ate, since it al­lows the pos­si­bil­ity of be­ing a loser and re­ceiv­ing the prize. But if the win­ner knows own sta­tus, it can’t re­ceive that prize, and the loser has no lev­er­age. So there is noth­ing prob­le­matic about 10^20 be­com­ing in­ac­cessible: it is only po­ten­tially ac­cessible to the loser, when the win­ner is weak (doesn’t know own sta­tus), while an in­formed win­ner won’t give it away, so that doesn’t hap­pen. Re­solv­ing log­i­cal un­cer­tainty makes the win­ner stronger, makes the loser weaker, and so the prize for the loser be­comes smaller.

• Edit: Nope, I changed my mind back.

You’ve suc­ceeded in con­vinc­ing me that I’m con­fused about this prob­lem, and don’t know how to make de­ci­sions in prob­lems like this.

There’re two types of play­ers in this game: those that win the log­i­cal lot­tery and those that lose (here, pa­per­clip max­i­mizer is a win­ner, and sta­ple max­i­mizer is a loser). A win­ner can ei­ther co­op­er­ate or defect against its loser op­po­nent, with co­op­er­a­tion giv­ing the win­ner 0 and loser 10^20, and defec­tion giv­ing the win­ner 10^10 and loser 0.

If a player doesn’t know whether it’s a loser or a win­ner, co­or­di­nat­ing co­op­er­a­tion with its op­po­nent has higher ex­pected util­ity than co­or­di­nat­ing defec­tion, with mixed strate­gies pre­sent­ing op­tions for bar­gain­ing (the best co­or­di­nated strat­egy for a given player is to defect, with op­po­nent co­op­er­at­ing). Thus, we have a full-fledged Pri­soner’s Dilemma.

On the other hand, ob­tain­ing in­for­ma­tion about your iden­tity (loser or win­ner) trans­forms the prob­lem into one where you seem­ingly have only the choice be­tween 0 and 10^10 (if you’re a win­ner), or always 0 with no abil­ity to bar­gain for more (if you’re a loser). Thus, it looks like knowl­edge of a fact turns a prob­lem into one of lower ex­pected util­ity, ir­re­spec­tive of what the fact turns out to be, and takes away the in­cen­tives that would’ve made a higher win (10^20) pos­si­ble. This doesn’t sound right, there should be a way of mak­ing the 10^20 ac­cessible.

• It’s like an in­stance of the prob­lem in­volves not two, but four agents that should co­or­di­nate: a pos­si­ble win­ner/​loser pair, and a cor­re­spond­ing im­pos­si­ble pair. The im­pos­si­ble pair has a bizarre prop­erty that they know them­selves to be im­pos­si­ble, like self-defeat­ing the­o­ries PA+NOT(Con(PA)) (ex­cept that we’re talk­ing about agent-prov­abil­ity and not prov­abil­ity), which doesn’t make them un­able to rea­son. Th­ese four agents could form a co­or­di­nated de­ci­sion, where the co­or­di­nated de­ci­sion prob­lem is ob­tained by throw­ing away the knowl­edge that’s not com­mon be­tween these four agents, in par­tic­u­lar the digit of pi and win­ner/​loser iden­tity. After the de­ci­sion is made, they plug back their par­tic­u­lar in­for­ma­tion.

• You’ve con­vinced me that I’m con­fused. I don’t know what is the cor­rect de­ci­sion in this situ­a­tion any­more, or how to think about such de­ci­sions.

If you co­op­er­ate in such situ­a­tions, this makes the value of the out­come of such thought ex­per­i­ments higher, and that ap­plies for all in­di­vi­d­ual in­stances of the thought ex­per­i­ments as well. The prob­lem has ASP-ish feel to it, you’re pun­ished for tak­ing too much in­for­ma­tion into ac­count, even though from the point of view of hav­ing taken that in­for­ma­tion into ac­count, your re­sult­ing de­ci­sion seems cor­rect.

• I don’t know what is the cor­rect de­ci­sion in this situ­a­tion any­more, or how to think about such de­ci­sions.

Good, I’m in a similar state. :)

The prob­lem has ASP-ish feel to it, you’re pun­ished for tak­ing too much in­for­ma­tion into ac­count, even though from the point of view of hav­ing taken that in­for­ma­tion into ac­count, your re­sult­ing de­ci­sion seems cor­rect.

Yes, I no­ticed the similar­ity as well, ex­cept in the ASP case it seems clearer what the right thing to do is.

• (Grand­par­ent was my com­ment, deleted while I was try­ing to come up with a clearer state­ment of my con­fu­sion, be­fore I saw the re­ply. The new ver­sion is here.)

• So you would also keep the money in Coun­ter­fac­tual Mug­ging with a log­i­cal coin? I don’t see how that can be right. About half of log­i­cal coins fall heads, so given a rea­son­able prior over Omegas, it makes more sense for the agent to always pay up, both in Coun­ter­fac­tual Mug­ging and in Wei’s prob­lem. But of course us­ing a prior over Omegas is cheat­ing...

• Then you’d be co­or­di­nat­ing with play­ers of other CM se­tups, not just with your own coun­ter­fac­tual op­po­nent, you’d be break­ing out of your thought ex­per­i­ment, and that’s against the rules! (What­ever “log­i­cal coin” is, the pri­mary con­di­tion is for it to be shared among and ac­cessible to all co­or­di­nat­ing agents. If that’s so, like here, then I keep the money, as­sum­ing the thought ex­per­i­ment doesn’t leak con­trol.)

• as­sum­ing the thought ex­per­i­ment doesn’t leak control

:/​ The whole point of thought ex­per­i­ments is that they leak con­trol. ;P

“I seem to have found my­self in a trol­ley prob­lem! This is fan­tas­ti­cally un­likely. I’m prob­a­bly in some weird moral philos­o­phy thought ex­per­i­ment and my ac­tions are likely mostly go­ing to be used as pro­pa­ganda sup­port­ing the ‘ob­vi­ous’ con­clu­sions of one side or the other… oh and if I try to find a clever third op­tion I’ll prob­a­bly make my­self coun­ter­fac­tual in most con­texts. Does the fact that I’m think­ing these thoughts af­fect what con­texts I’m in? /​brainas­plodes”

• This is ex­actly what my down­scale copy thinks the first 3-5 times I try to run any though ex­per­i­ment. Often it’s fol­lowed by “**, I’m go­ing to die!”

I don’t run though ex­per­i­ments con­tain­ing my­self at any level of de­tail if I can avoid it any more.

• I’m still not sure. You can look at it as co­op­er­at­ing with play­ers of other CM se­tups, or as try­ing to solve the meta-ques­tion “what de­ci­sion the­ory would be good at solv­ing prob­lems like this one?” Say­ing “50% of log­i­cal coins fall heads” seems to cap­ture the in­tent of the prob­lem class quite well, no?

• The de­ci­sion al­gorithm that takes the whole pie is good at solv­ing prob­lems like this one: for each spe­cific pie it gets it whole. Mak­ing the same ac­tion is not good for solv­ing the differ­ent prob­lem of di­vid­ing all pos­si­ble pies si­mul­ta­neously, but then the differ­ence is re­flected in the prob­lem state­ment, and so the rea­sons that make it de­cide cor­rectly for in­di­vi­d­ual prob­lems won’t make it de­cide in­cor­rectly for the joint prob­lem.

I think it’s right to co­op­er­ate in this thought ex­per­i­ment only to the ex­tent that we ac­cept the im­pos­si­bil­ity of iso­lat­ing this thought ex­per­i­ment from its other pos­si­ble in­stances, but then it should just mo­ti­vate restat­ing the thought ex­per­i­ment so as to make its ex­pected ac­tual scope ex­plicit.

• I think it’s right to co­op­er­ate in this thought ex­per­i­ment only to the ex­tent that we ac­cept the im­pos­si­bil­ity of iso­lat­ing this thought ex­per­i­ment from its other pos­si­ble in­stances, but then it should just mo­ti­vate restat­ing the thought ex­per­i­ment so as to make its ex­pected ac­tual scope ex­plicit.

Agreed.

• Here’s an ar­gu­ment I made in a chat with Wei. (The prob­lem is equiv­a­lent to Coun­ter­fac­tual Mug­ging with a log­i­cal coin, so I talk about that in­stead.)

1) A good de­ci­sion the­ory should always do what it would have pre­com­mit­ted to do­ing.

2) Precom­mit­ment can be mod­eled as a de­ci­sion prob­lem where an AI is asked to write a suc­ces­sor AI.

3) Imag­ine the AI is asked to write a pro­gram P that will be faced with Coun­ter­fac­tual Mug­ging with a log­i­cal coin (e.g. par­ity of the mil­lionth digit of pi). The re­sult­ing util­ity goes to the AI. The AI writ­ing P doesn’t have enough re­sources to com­pute the coin’s out­come, but P is al­lowed to use as much re­sources as needed.

4) Writ­ing P is equiv­a­lent to sup­ply­ing only one bit: should P pay up if asked?

5) Sup­ply­ing that bit is equiv­a­lent to ac­cept­ing or de­clin­ing the bet “win \$10000 if the mil­lionth digit of pi is even, lose \$100 if it’s odd”.

6) So if your AI can make bets about the digits of pi (which means it rep­re­sents log­i­cal un­cer­tainty as prob­a­bil­ities), it should also pay up in Coun­ter­fac­tual Mug­ging with a log­i­cal coin, even if it already has enough re­sources to com­pute the coin’s out­come. The AI’s init­lal state of log­i­cal un­cer­tainty should be “frozen” into its util­ity func­tion, just like all other kinds of un­cer­tainty (the U in UDT means “up­date­less”).

Maybe this ar­gu­ment only shows that rep­re­sent­ing log­i­cal un­cer­tainty as prob­a­bil­ities is weird. Every­one is wel­come to try and figure out a bet­ter way :-)

• 1) A good de­ci­sion the­ory should always do what it would have pre­com­mit­ted to do­ing.

It’s dan­ger­ous to phrase it this way, since co­or­di­na­tion (which is what re­ally hap­pens) al­lows us­ing more knowl­edge than was available at the time of a pos­si­ble pre­com­mit­ment, as I de­scribed here.

4) Writ­ing P is equiv­a­lent to sup­ply­ing only one bit: should P pay up if asked?

Not if the cor­rect de­ci­sion de­pends on an ab­stract fact that you can’t ac­cess, but can refer­ence. In that case, P should im­ple­ment a strat­egy of act­ing de­pend­ing on the value of that fact (com­put­ing and ob­serv­ing that value to feed to the strat­egy). That is, ab­stract facts that will only be ac­cessible in the fu­ture play the same role as ob­ser­va­tions that will only be ac­cessible in the fu­ture, and a strat­egy can be writ­ten con­di­tion­ally on ei­ther.

The differ­ence be­tween ab­stract facts and ob­ser­va­tions how­ever is that ob­ser­va­tions may tell you where you are, with­out tel­ling you what ex­ists and what doesn’t (both coun­ter­fac­tu­als ex­ist and have equal value, you’re in one of them), while ab­stract facts can tell you what ex­ists and what doesn’t (the other log­i­cal coun­ter­fac­tual doesn’t ex­ist and has zero value).

• 4) Writ­ing P is equiv­a­lent to sup­ply­ing only one bit: should P pay up if asked?

Not if the cor­rect de­ci­sion de­pends on an ab­stract fact that you can’t ac­cess, but can refer­ence.

In gen­eral, the dis­tinc­tion is im­por­tant. But, for this puz­zle, the propo­si­tion “asked” is equiv­a­lent to the rele­vant “ab­stract fact”. The agent is asked iff the mil­lionth digit of pi is odd. So point (4) already pro­vides as much of a con­di­tional strat­egy as is pos­si­ble.

• It’s as­sumed that the agent doesn’t know if the digit is odd (and whether it’ll be in the situ­a­tion de­scribed in the post) at this point. The pro­posal to self-mod­ify is a sep­a­rate event that pre­cedes the thought ex­per­i­ment.

• It’s as­sumed that the agent doesn’t know if the digit is odd (and whether it’ll be in the situ­a­tion de­scribed in the post) at this point.

Yes. Similarly, it doesn’t know whether it will be asked (rather than do the ask­ing) at this point.

• I see, so there’s in­deed just one bit, and it should be “don’t co­op­er­ate”.

This is in­ter­est­ing in that UDT likes to ig­nore epistemic sig­nifi­cance of ob­ser­va­tions, but here we have an ob­ser­va­tion that im­plies some­thing about the world, and not just tells where the agent is. How does one rea­son about strate­gies if differ­ent branches of those strate­gies tell some­thing about the value of the other branches?..

• Not if the cor­rect de­ci­sion de­pends on an ab­stract fact that you can’t ac­cess, but can refer­ence.

Good point, thanks. I think it kills my ar­gu­ment.

ETA: no, it doesn’t.

• As Tyrrell points out, it’s not as sim­ple. When you’re con­sid­er­ing the strat­egy of what to do if you’re on the giv­ing side of the coun­ter­fac­tual (“Should P pay up if asked?”), the fact that you’re in that situ­a­tion already im­plies all you wanted to know about the digit of pi, so the strat­egy is not to play con­di­tion­ally on the digit of pi, but just to ei­ther pay up or not, one bit as you said. But the value of the de­ci­sion on that branch of the strat­egy fol­lows from the log­i­cal im­pli­ca­tions of be­ing on that branch, which is some­thing new for UDT!

• That is a re­ally, re­ally weird dilemma to be in.

By the way, you can ab­bre­vi­ate pa­per­clip/​sta­ple max­i­mizer as clippy/​sta­ply (un­cap­i­tal­ized).

• By the way, you can ab­bre­vi­ate pa­per­clip/​sta­ple max­i­mizer as clippy/​sta­ply (un­cap­i­tal­ized).

That seems to be a vi­o­la­tion of stan­dard English con­ven­tions. If I see peo­ple use ‘clippy’ or ‘sta­ply’ un­cap­i­tal­ized I treat it the same as any other er­ror in cap­i­tal­iza­tion.

• Do you cap­i­tal­ize ‘hu­man’?

• Do you cap­i­tal­ize ‘hu­man’?

No. I do cap­i­tal­ize names. ‘Clippy’ and ‘sta­ply’ would both be un­nat­u­ral terms for a species, were the two to be given slang species names.

If peo­ple use ‘Clippy’ or ‘Sta­ply’ they are mak­ing a refer­ence to a per­son­ified in­stance of the re­spec­tive classes of max­imiser agent.

• I took the un­cap­i­tal­ized “sta­ply” to be the name of a class, one in­di­vi­d­ual in which might be named “Sta­ply”.

• Ex­actly, good in­fer­ence. You’re a good hu­man.

Kill Sta­ply though.

• I never sus­pected that you were Wei Dai, but five min­utes is an awfully fast re­sponse time!

• I’m not User:Wei_Dai. Although if I were, you would prob­a­bly ex­pect that I would say that.

• In the old coun­ter­fac­tual mug­ging prob­lem, agents who pre­com­mit are trad­ing util­ities across pos­si­ble wor­lds, each world hav­ing a util­ity-gain called a prior that ex­presses how much the agent wants its util­ities to lie in those wor­lds in­stead of silly ones. From that per­spec­tive, it’s true that noth­ing in re­al­ity will be differ­ent as a re­sult of the agent’s de­ci­sion, just be­cause of de­ter­minism, but the agent is still de­cid­ing what re­al­ity (across all pos­si­ble wor­lds) will look like, just like in New­comb’s prob­lem.

So when I read in Nesov’s post that “Direct pre­dic­tion of your ac­tions can’t in­clude the part where you ob­serve that the digit is even, be­cause the digit is odd”, what I’m re­ally see­ing is some­one say­ing, “I give zero weight to pos­si­ble wor­lds in which math doesn’t work sen­si­bly, and tiny weights to wor­lds in which math does work, but my con­fu­sion or the con­spiring of a mal­i­cious /​ im­prob­a­ble /​ sense­less /​ in­valuable uni­verse cause me to think it does not.”

One of the rea­sons why I think pos­si­ble wor­lds of the first kind (differ­ent causal /​ pro­gram­matic his­to­ries but the same un­der­ly­ing on­tol­ogy-stuff) are valuable /​ real, is that we sort of know how to calcu­late their prop­er­ties us­ing causal net­works or time­less net­works or what­ever kind of net­works you get when you com­bine the not-quite speci­fied math­e­mat­i­cal ma­chin­ery in TDT with UDT. Our abil­ity to calcu­late their prop­er­ties reifies them, opens them up to in­ter­act­ing with this world even more di­rectly via simu­la­tion.

The next step seems to be to ask, “for agents that do care about those im­pos­si­ble pos­si­ble wor­lds, how would they act?” If omega is choos­ing in a way that can be com­puted in our world, us­ing our math (and some­how that other uni­verse and our calcu­la­tions don’t ex­plode when it gets to the con­tra­dic­tion (or it does! I sup­pose you can care about wor­lds where math ex­plodes, even if I can’t vi­su­al­ize them)), then we can simu­late his rea­son­ing in all re­spects save the iden­tify of the log­i­cal fact in ques­tion, and use that to calcu­late which be­havi­our max­i­mizes the util­ity across pos­si­ble wor­lds via their de­pen­dence on our de­ci­sion.

So in the ex­am­ple prob­lem, if a val­uer of con­tra­dic­tory wor­lds has roughly equal pri­ors for both the world we’re ex­am­in­ing and the other world in which she find her­self where the digit was even (the im­pos­si­ble one, which isn’t im­pos­si­ble for her, be­cause it wasn’t as­signed zero prior weight), then sure, she can go ahead and give up con­trol. That’s of course as­sum­ing that she has an ex­pec­ta­tion the sta­ple max­i­mizer will re­cip­ro­cate in the im­pos­si­ble world, which you didn’t spell out in your post, but that de­pen­dence on de­ci­sions is stan­dard for coun­ter­fac­tual mug­ging prob­lems. Please cor­rect me if that’s not the in­tended setup.

As as aside, this com­ment feels silly and wrong; an ex­am­ple of dis­eased thoughts un­con­nected with re­al­ity. It re­minds me a bit of Greg Egan’s short story Dark In­te­gers. I would re­ally love to see a more sen­si­ble in­ter­pre­ta­tion that this.

• While I haven’t given it much though out­side the con­text of fic­tion, one could adopt the point of view/​vo­cab­u­lary of this be­ing “the level 5 tegmark mu­ti­verse”.

Now, if that is true in any sense, it’s prob­a­bly a much less literal one, and not based on the same rea­son­ing as the other four, but it might still be an use­ful heuris­tic for hu­mans.

Another in­ter­est­ing note: By de­fault my brain seems to as­sume util­ity is lin­ear with pa­per­clips when con­sid­er­ing say differ­ent Everett branches, but the log­a­r­ithm of it when con­sid­er­ing log­i­cal un­cer­tainty. That’s kinda odd and un­jus­tified, but the in­tu­ition might have some point about hu­mans util­ity func­tion.

• Sorry, I couldn’t fol­low.

• That’s okay, there’s no for­mal­ized the­ory be­hind it. But for the sake of con­ver­sa­tion:

It seems you once agreed that mul­ti­ple agents in the same epistemic state in differ­ent pos­si­ble wor­lds can define strate­gies over their fu­ture ob­ser­va­tions in a way that looks like trad­ing util­ities: http://​​less­wrong.com/​​lw/​​102/​​in­dex­i­cal_un­cer­tainty_and_the_ax­iom_of/​​sht

When I treat pri­ors as a kind of util­ity, that’s in­ter­pre­ta­tion #4 from this Wei Dai post: http://​​less­wrong.com/​​lw/​​1iy/​​what_are_prob­a­bil­ities_any­way/​​

Really the only things that seems in any way novel here are the idea that the space of pos­si­ble wor­lds might in­clude wor­lds that work by differ­ent math­e­mat­i­cal rules and that pos­si­bil­ity is con­tin­gent on the agent’s pri­ors. I don’t know how to char­ac­ter­ize how math works in a differ­ent world, other than by say­ing ex­plic­itly what the out­come of a given com­pu­ta­tion will be. You can think of that as forc­ing the struc­tural equa­tion that would nor­mally com­pute “1+1” to out­put “5″, where the graph setup would some­how keep that log­i­cal fact from col­lid­ing with proofs that “3-1=2” (for wor­lds that don’t ex­plode) (which is what I thought Eliezer meant by cre­at­ing a fac­tored DAG of math­e­mat­ics here). That’s for a very limited case of illog­i­cal-calcu­la­tion where our rea­son­ing pro­cess pro­duced re­sults close enough to their analogues in the tar­get world that we’re even able to make some valid de­duc­tions. Maybe other wor­lds don’t have a big book of pla­tonic truths (am­bi­guity or in­sta­bil­ity) and cross-world util­ity calcu­la­tions just don’t work. In that case, I can’t think of any sen­si­ble course of ac­tion.

I don’t think this is to­tally worth­less spec­u­la­tion, even if you don’t agree that “a world with differ­ent math” makes sense, be­cause an AI with faulty hard­ware /​ rea­son­ing will still need to rea­son about math­e­mat­ics that work differ­ently from its mis­taken in­fer­ences, and that prob­a­bly re­quires a par­tial cor­re­spon­dence be­tween how the agent rea­sons and how the world works, just like how the par­tial cor­re­spon­dence be­tween wor­lds with differ­ent math­e­mat­i­cal rules al­lows some limited de­duc­tions with cross-world or other-world val­idity.

• What deal would you have made if you didn’t know that the first digit of pi was odd, or if you didn’t know that 1+1=2?

It seems, the im­por­tant fac­tor is how does Omega makes its choice of which digit of pi (or other log­i­cal fact) to check. If Omega uses a quan­tum coin to se­lect a num­ber N be­tween 1 and 84, and then checks the N-th digit of pi, then you should co­op­er­ate. If Omega searches through a roundish num­ber N such that N-th digit of pi is odd, then the an­swer ap­pears to fur­ther de­pend on Omega’s mo­ti­va­tion. If Omega made the choice of “seek­ing for ‘odd’ roundish num­bers” vs. “‘even’ roundish num­bers” by toss­ing a quan­tum coin, then you should co­op­er­ate. Other­wise… etc.

• By the way, I no­tice that the thought ex­per­i­ment, as phrased, doesn’t quite re­quire the knowl­edge of the digit of pi, if Omega in­deed states that the situ­a­tion is pred­i­cated on its odd­ness, it even states that it’s odd, and there is no coun­ter­fac­tual ver­sion of your­self, so com­pu­ta­tion of the digit be­comes a rit­ual of cog­ni­tion with­out pur­pose. There is likely a restate­ment that re­tains ev­ery­one’s in­ter­pre­ta­tion, but in the cur­rent phras­ing the role of the log­i­cal un­cer­tainty in it seems to be lost.

• Do I know that Sta­ply would have de­cided as I de­cide, had he been given con­trol of the uni­verse and been told by Omaga and his calcu­la­tor that the mil­lionth digit of pi is even?

• I took it as be­ing im­plied, yes. If Sta­ply is an un­known al­gorithm, there’s no point in trad­ing.

• Then it does seem that if Clippy had in fact been built to max­i­mize pa­per­clips by an agent with the right com­pu­ta­tional limi­ta­tions, then Clippy would have been built to take the deal.