Precommitting to paying Omega.

Re­lated to: Coun­ter­fac­tual Mug­ging, The Least Con­ve­nient Pos­si­ble World

MBlume said:

What would you do in situ­a­tion X?” and “What would you like to pre-com­mit to do­ing, should you ever en­counter situ­a­tion X?” should, to a ra­tio­nal agent, be one and the same ques­tion.

Ap­plied to Vladimir Nesov’s coun­ter­fac­tual mug­ging, the rea­son­ing is then:

Precom­mit­ting to pay­ing $100 to Omega has ex­pected util­ity of $4950.p(Omega ap­pears). Not pre­com­mit­ting has strictly less util­ity; there­fore I should pre­com­mit to pay­ing. There­fore I should, in fact, pay $100 in the event (Omega ap­pears, coin is tails).

To com­bat the ar­gu­ment that it is more likely that one is in­sane than that Omega has ap­peared, Eliezer said:

So imag­ine your­self in the most in­con­ve­nient pos­si­ble world where Omega is a known fea­ture of the en­vi­ron­ment and has long been seen to fol­low through on promises of this type; it does not par­tic­u­larly oc­cur to you or any­one that be­liev­ing this fact makes you in­sane.

My first re­ac­tion was that it is sim­ply not ra­tio­nal to give $100 away when noth­ing can pos­si­bly hap­pen in con­se­quence. I still be­lieve that, with a small mod­ifi­ca­tion: I be­lieve, with mod­er­ately high prob­a­bil­ity, that it will not be in­stru­men­tally ra­tio­nal for my fu­ture self to do so. Read on for the ex­pla­na­tion.

Sup­pose we lived in Eliezer’s most in­con­ve­nient pos­si­ble world:

  • Omega ex­ists.

  • Omega has never been found un­trust­wor­thy.

  • Direct brain simu­la­tion has ver­ified that Omega has a 100% suc­cess rate in pre­dict­ing the re­sponse to its prob­lem, thus far.

  • Omega claims that no other Omega-like be­ings ex­ist (so no per­verse Omegas that can­cel out Omega’s ac­tions!).

  • Omega never speaks to any­one ex­cept if it is ask­ing them for pay­ment. It never meets any­one more than once

  • Omega claims that ac­tual de­ci­sions never have any con­se­quences. It is only what you would have de­cided that can ever af­fect its ac­tions.

Did you see a trap? Direct brain simu­la­tion in­stan­ti­ates pre­cisely what Omega says does not ex­ist, a “you” whose de­ci­sion has con­se­quences. So for­get that. Sup­pose Omega pri­vately performs some ac­tion for you (for in­stance, a hy­per­com­pu­ta­tion) that is not simu­la­ble. Then di­rect brain simu­la­tion of this cir­cum­stance can­not oc­cur. So just as­sume that you find Omega trust­wor­thy in this world, and as­sume it does not it­self simu­late you to make its de­ci­sions. Other ob­jec­tions ex­ist: nu­mer­ous ones, ac­tu­ally. For­get them. If you find that a cer­tain set of cir­cum­stances makes it eas­ier for you to de­cide not to pay the $100, or to pay it, change the cir­cum­stances. For my­self, I had to imag­ine know­ing that the Teg­mark en­sem­ble didn’t ex­ist*. If, un­der the MWI of quan­tum me­chan­ics, you find rea­sons (not) to pay, then as­sume MWI is dis­proven. If the con­verse, then as­sume MWI is true. If you find that both sup­po­si­tions give you rea­sons (not) to pay, then as­sume some miss­ing ar­gu­ment in­val­i­dates those rea­sons.

Un­der these cir­cum­stances, should ev­ery­one pay the $100?

No. Well, it de­pends what you mean by “should”.

Sup­pose I live in the Omega world. Then prior to the coin flip, I as­sign equal value to my fu­ture self in the event that it is heads, and my fu­ture self in the event that it is tails. My util­ity func­tion is, very roughly, the ex­pected util­ity func­tion of my fu­ture self, weighted by the prob­a­bil­ities I as­sign that I will ac­tu­ally be­come some given fu­ture self. There­fore if I can pre­com­mit to pay­ing $100, my util­ity func­tion will pos­sess the term $4950.p(Omega ap­pears), and if I can only par­tially pre­com­mit, in other words I can ar­range that with prob­a­blity q I will pay $100, then my util­ity func­tion will pos­sess the term $4950.q.p(Omega ap­pears). So the dom­i­nant strat­egy is to pre­com­mit with prob­a­bil­ity one. I can in fact do this if Omega guaran­tees to con­tact me via email, or a trusted in­ter­me­di­ary, and to take in­struc­tions thereby re­ceived as “my re­sponse”, but I may have a slight difficulty if Omega chooses to ap­pear to me in bed late one night.

On the prin­ci­ple of the least con­ve­nient world, I’m go­ing to sup­pose that is in fact how Omega chooses to ap­pear to me. I’m also go­ing to sup­pose that I have no tools available to me in Omega world that I do not in fact pos­sess right now. Here comes Omega:

Hello Nathan. Tails, I’m afraid. Care to pay up?

“Be­fore I make my de­ci­sion: Tell me the short­est proof that P = NP, or the con­verse.”

Omega obliges (it will not, of course, let me re­mem­ber this proof—but I knew that when I asked).

“Do you have any way of prov­ing that you can hy­per­com­pute to me?”

Yes. (Omega proves it.)

“So, you’re re­ally Omega. And my choice will have no other con­se­quences?”

None. Had heads ap­peared, I would have pre­dicted pre­cisely this cur­rent se­quence of events and used it to make a de­ci­sion. But heads has not ap­peared. No con­se­quences will en­sue.

“So you would have simu­lated my brain perform­ing these ac­tions? No, you don’t do that, do you? Can you prove that’s pos­si­ble?”

Yes. (Omega proves it.)

“Right. No, I don’t want to give you $100.”

What the hell just hap­pened? Be­fore Omega ap­peared, I wanted this se­quence of events to play out quite differ­ently. In fact this was my wish right up to the ‘t’ of “tails”. But now I’ve de­cided to keep the $100 af­ter all!

The an­swer is that there is no equiv­alence be­tween my util­ity func­tion at time t, where t < timeOmega, and my util­ity func­tion at time T, where timeOmega < T. Be­fore timeOmega, my util­ity func­tion con­tains terms from states of the world where Omega ap­pears and the coin turns up heads; af­ter, it doesn’t. Add to that the fact that my util­ity func­tion is in­creas­ing in money pos­sessed, and my preferred ac­tion at time T changes (pre­dictably so) at timeOmega. To for­mal­ise:

Sup­pose we in­dex pos­si­ble wor­lds with a time, t, and a state, S: a world state is then (S,t). Now let the util­ity func­tion of ‘my­self’ at time t and in world state S be de­noted US,t:AS → R, where AS is my set of ac­tions and R the real num­bers. Then in the limit of a small time differ­en­tial Δt, we can use the Bel­l­man equa­tion to pick an op­ti­mal policy π*:S → AS such that we max­imise US,t as US,t(π*(S)).

Be­fore Omega ap­pears, I am in (S,t). Sup­pose that the ac­tion “pay­ing $100 to Omega if tails ap­pears” is de­noted a100. Then, ob­vi­ously, a100 is not in my ac­tion set AS. Let “not pay­ing $100 to Omega if tails ap­pears” be de­noted a0. a0 isn’t in AS ei­ther. If we sup­pose Omega is guaran­teed to ap­pear shortly be­fore time T (not a par­tic­u­larly re­strict­ing as­sump­tion for our pur­poses), then pre­com­mit­ting to pay­ing is rep­re­sented in our for­mal­ism by tak­ing an ac­tion ap at (S,t) such that ei­ther:

  1. The prob­a­bil­ity of be­ing a state § in which tails has ap­peared and for which a0 ∈ A§ at time T is 0, or

  2. For all states § with tails hav­ing ap­peared, with a0 ∈ A§ and with non-zero prob­a­bil­ity at time T, U§,T(a0) < U§,T(a100) = π*(§). Note that a ‘world state’ S in­cludes my brain.

Then if Omega uses a trusted in­ter­me­di­ary, I can eas­ily carry out an ac­tion ap = “give bank ac­count ac­cess to in­ter­me­di­ary and tell in­ter­me­di­ary to pay $100 from my ac­count to Omega un­der all cir­cum­stances”. This counts as tak­ing op­tion 1 above. But sup­pose that op­tion 1 is closed to us. Sup­pose we must take an ac­tion such that 2 is satis­fied. What does such an ac­tion look like?

Firstly, brain hacks. If my util­ity func­tion in state § at time T is in­creas­ing in money, then U§,T(a0) > U§,T(a100), con­tra the de­sired prop­erty of ap. There­fore I must ar­range for my brain in world-state § to be such that my util­ity func­tion is not so fash­ioned. But by sup­po­si­tion my util­ity func­tion can­not “change”; it is sim­ply a map­ping from world-states X pos­si­ble ac­tions to real num­bers. In fact the func­tion it­self is an ab­strac­tion de­scribing the be­havi­our of a par­tic­u­lar brain in a par­tic­u­lar world state**. If, in ad­di­tion, we de­sire that the Bel­l­man equa­tion ac­tu­ally holds, then we can­not sim­ply abol­ish the pro­cess of de­ter­min­ing an op­ti­mal policy at some ar­bi­trary point in time T. I pro­pose one more de­sired prop­erty: the gen­eral prin­ci­ple of more money be­ing bet­ter than less should not cease to op­er­ate due to ap, as this is sure to de­crease US,t(ap) be­low op­ti­mum (would we re­ally lose less than $4950?). So the mod­ifi­ca­tion I make to my brain should be min­i­mal in some sense. This is, af­ter all, a highly ex­cep­tional cir­cum­stance. What one could do is ar­range for my brain to ex­pe­rience strong re­ward for a short time pe­riod af­ter tak­ing ac­tion a100. The ac­tual amount cho­sen should be such that that the re­ward out­weighs the time-dis­counted fu­ture loss in util­ity from sur­ren­der­ing the $100 (it fol­lows that the shorter the du­ra­tion of re­ward, the stronger its mag­ni­tude must be). I must also guaran­tee that I am not sim­ply at­tach­ing a la­bel called “re­ward” to some­thing that does not ac­tu­ally rep­re­sent re­ward as defined in the Bel­l­man equa­tion. This would, I be­lieve, re­quire some pretty deep knowl­edge of the na­ture of my brain which I do not pos­sess. Add to that the fact that I do not know how to hack my brain, and in a least con­ve­nient world, this op­tion is closed to me also***.

It’s look­ing pretty grim for my ex­pected util­ity. But wait: we do not sim­ply have to in­crease U§,T(a100). We can also de­crease U§,T(a0). Now we could im­ple­ment a brain hack for this also, but the same ar­gu­ments against ap­ply. A sim­ple solu­tion might be to use a trusted in­ter­me­di­ary for an­other pur­pose: give him $1000, and tell him not to give it back un­less I do a100. This would, in fact, mo­ti­vate me, but it rein­tro­duces the fac­tor of how prob­a­ble it is Omega will ap­pear, which we were pre­vi­ously able to ne­glect, by al­ter­ing the util­ity from time t to time timeOmega. Sup­pose we give the in­ter­me­di­ary our ac­count de­tails in­stead. This solves the prob­a­bil­ity is­sue, but there is a po­ten­tial for ei­ther my­self to frus­trate him, a solv­able prob­lem, or for Omega to frus­trate him in or­der to satisfy the “no fur­ther con­se­quences” re­quire­ment. And so on: the re­quire­ments of the prob­lem are such that only our own util­ity func­tion is san­cros­act to Omega. It is through that mechanism only that we can win.

This is my real difficulty: that the prob­lem ap­pears to re­quire cog­ni­tive un­der­stand­ing and tech­nol­ogy that we do not pos­sess. Eliezer may very well give $100 when­ever he meets this prob­lem; so may Cameron; but I wouldn’t, prob­a­bly not, any­way. It wouldn’t be in­stru­men­tally ra­tio­nal for me, given my util­ity func­tion un­der those cir­cum­stances, at least not un­less some­thing hap­pens that can put the con­cepts they carry around with them into my head, and stop me—or rather, make it in­stru­men­tally ir­ra­tional for me, in the sense of be­ing part of a sub­op­ti­mal policy—from re­mov­ing those con­cepts af­ter Omega ap­pears.

How­ever, on the off-chance that Omega~, a slightly less in­con­ve­nient ver­sion of Omega, ap­pears be­fore me: I hereby pledge one beer to ev­ery mem­ber of Less Wrong, if I fail to sur­ren­der my $100 when asked. Take that, ob­nox­ious om­ni­scient be­ing!

*It’s faintly amus­ing, though only faintly, that de­spite know­ing full well that I was sup­posed to con­sider the least con­ve­nient pos­si­ble world, I ne­glected to think of my least con­ve­nient pos­si­ble world when I first tried to tackle the prob­lem. Ask your­self the ques­tion.

**There are is­sues with iden­ti­fy­ing what it means for a brain/​agent to per­sist from one world-state to an­other, but if such a per­sist­ing agent can­not be iden­ti­fied, then the whole prob­lem is non­sense. It is more in­con­ve­nient for the prob­lem to be co­her­ent, as we must then an­swer it. I’ve also de­cided to use the Bel­l­man equa­tions with dis­crete time steps, rather than the time-con­tin­u­ous HJB equa­tion, sim­ply be­cause I’ve never used the lat­ter and don’t trust my­self to ex­plain it cor­rectly.

***There is the ques­tion: would one not sim­ply de­hack af­ter Omega ar­rives an­nounc­ing ‘tails’? If that is of higher util­ity than other al­ter­na­tives: but then we must have defined “re­ward” in­ap­pro­pri­ately while mak­ing the hack, as the re­ward for be­ing in each state, to­gether with the dis­count­ing fac­tor, serves to fully de­ter­mine the util­ity func­tion in the Bel­l­man equa­tion.

(I’ve made a few small post-sub­mis­sion ed­its, the largest to clar­ify my con­clu­sion)