Decisions are not about changing the world, they are about learning what world you live in

Cross-posted from my blog.

Epistemic sta­tus: Prob­a­bly dis­cussed to death in mul­ti­ple places, but peo­ple still make this mis­take all the time. I am not well versed in UDT, but it seems along the same lines. Or maybe I am rein­vent­ing some as­pects of Game The­ory.

We know that physics does not sup­port the idea of meta­phys­i­cal free will. By meta­phys­i­cal free will I mean the mag­i­cal abil­ity of agents to change the world by just mak­ing a de­ci­sion to do so. To the best of our knowl­edge, we are all (prob­a­bil­is­tic) au­toma­tons who think them­selves as agents with free choices. A model com­pat­i­ble with the known laws of physics is that what we think of as mod­el­ing, pre­dict­ing and mak­ing choices is ac­tu­ally learn­ing which one of the pos­si­ble wor­lds we live in. Think of it as be­ing a pas­sen­ger in a car and see­ing new land­scapes all the time. The main differ­ence is that the car is in­visi­ble to us and we con­stantly up­date the map of the ex­pected land­scape based on what we see. We have a so­phis­ti­cated up­dat­ing and pre­dict­ing al­gorithm in­side, and it of­ten pro­duces ac­cu­rate guesses. We ex­pe­rience those as choices made. As if we were the ones in the driver’s seat, not just the pas­sen­gers.

Real­iz­ing that de­ci­sions are noth­ing but up­dates, that mak­ing a de­ci­sion is a sub­jec­tive ex­pe­rience of dis­cov­er­ing which of the pos­si­ble wor­lds is the ac­tual one im­me­di­ately adds clar­ity to a num­ber of de­ci­sion the­ory prob­lems. For ex­am­ple, if you ac­cept that you have no way to change the world, only to learn which of the pos­si­ble wor­lds you live in, then the New­comb’s prob­lem with a perfect pre­dic­tor be­comes triv­ial: there is no pos­si­ble world where a two-boxer wins. There are only two pos­si­ble wor­lds, one where you are a one-boxer who wins, and one where you are a two-boxer who loses. Mak­ing a de­ci­sion to ei­ther one-box or two-box is a sub­jec­tive ex­pe­rience of learn­ing what kind of a per­son are you, i.e. what world you live in.

This de­scrip­tion, while fit­ting the ob­ser­va­tions perfectly, is ex­tremely un­com­fortable emo­tion­ally. After all, what’s the point of mak­ing de­ci­sions if you are just a pas­sen­ger spin­ning a fake steer­ing wheel not at­tached to any ac­tual wheels? The an­swer is the usual com­pat­i­bil­ism one: we are com­pel­led to be­have as if we were mak­ing de­ci­sions by our built-in al­gorithm. The clas­sic quote from Am­brose Bierce ap­plies:

“There’s no free will,” says the philoso­pher; “To hang is most un­just.”
″There is no free will,” as­sents the officer; “We hang be­cause we must.”

So, while un­com­fortable emo­tion­ally, this model lets us make bet­ter de­ci­sions (the irony is not lost on me, but since “mak­ing a de­ci­sion” is noth­ing but an emo­tion­ally com­fortable ver­sion of “learn­ing what pos­si­ble world is ac­tual”, there is no con­tra­dic­tion).

An aside on quan­tum me­chan­ics. It fol­lows from the uni­tary evolu­tion of the quan­tum state, cou­pled with the Born rule for ob­ser­va­tion, that the world is only pre­dictable prob­a­bil­is­ti­cally at the quan­tum level, which, in our model of learn­ing about the world we live in, puts limits on how ac­cu­rate the world model can be. Other­wise the quan­tum na­ture of the uni­verse (or mul­ti­verse) has no bear­ing on the per­cep­tion of free will.

Let’s go through the ex­am­ples some of which are listed as the num­bered dilem­mas in a re­cent pa­per by Eliezer Yud­kowsky and Nate Soares, Func­tional de­ci­sion the­ory: A new the­ory of in­stru­men­tal ra­tio­nal­ity. From here on out we will re­fer to this pa­per as EYNS.

Psy­cholog­i­cal Twin Pri­soner’s Dilemma

An agent and her twin must both choose to ei­ther “co­op­er­ate” or “defect.” If both co­op­er­ate, they each re­ceive $1,000,000. If both defect, they each re­ceive $1,000. If one co­op­er­ates and the other defects, the defec­tor gets $1,001,000 and the co­op­er­a­tor gets noth­ing. The agent and the twin know that they rea­son the same way, us­ing the same con­sid­er­a­tions to come to their con­clu­sions. How­ever, their de­ci­sions are causally in­de­pen­dent, made in sep­a­rate rooms with­out com­mu­ni­ca­tion. Should the agent co­op­er­ate with her twin?

First we enu­mer­ate all the pos­si­ble wor­lds, which in this case are just two, once we ig­nore the mean­ingless ver­bal fluff like “their de­ci­sions are causally in­de­pen­dent, made in sep­a­rate rooms with­out com­mu­ni­ca­tion.” This sen­tence adds zero in­for­ma­tion, be­cause the “agent and the twin know that they rea­son the same way”, so there is no way for them to make differ­ent de­ci­sions. Th­ese wor­lds are

  1. Co­op­er­ate world: $1,000,000

  2. Defect world: $1,000

There is no pos­si­ble world, fac­tu­ally or coun­ter­fac­tu­ally, where one twin co­op­er­ates and the other defects, no more than there are pos­si­ble wor­lds where 1 = 2. Well, we can imag­ine wor­lds where math is bro­ken, but they do not use­fully map onto ob­ser­va­tions. The twins would prob­a­bly be smart enough to co­op­er­ate, at least af­ter read­ing this post. Or maybe they are not smart enough and will defect. Or maybe they hate each other and would rather defect than co­op­er­ate, be­cause it gives them more util­ity than money. If this was a real situ­a­tion we would wait and see which pos­si­ble world they live in, the one where they co­op­er­ate, or the one where they defect. At the same time, sub­jec­tively to the twins in the setup it would feel like they are mak­ing de­ci­sions and chang­ing their fu­ture.

The ab­sent-minded Driver prob­lem:

An ab­sent-minded driver starts driv­ing at START in Figure 1. At X he can ei­ther EXIT and get to A (for a pay­off of 0) or CONTINUE to Y. At Y he can ei­ther EXIT and get to B (pay­off 4), or CONTINUE to C (pay­off 1). The es­sen­tial as­sump­tion is that he can­not dis­t­in­guish be­tween in­ter­sec­tions X and Y, and can­not re­mem­ber whether he has already gone through one of them.

There are three pos­si­ble wor­lds here, A, B and C, with util­ities 0, 4 and 1 cor­re­spond­ingly, and by ob­serv­ing the driver “mak­ing a de­ci­sion” we learn which world they live in. If the driver is a clas­sic CDT agent, they would turn and end up at A, de­spite it be­ing the low­est-util­ity ac­tion. Sucks to be them, but that’s their world.

The Smok­ing Le­sion Problem

An agent is de­bat­ing whether or not to smoke. She knows that smok­ing is cor­re­lated with an in­vari­ably fatal va­ri­ety of lung can­cer, but the cor­re­la­tion is (in this imag­i­nary world) en­tirely due to a com­mon cause: an ar­te­rial le­sion that causes those af­flicted with it to love smok­ing and also (99% of the time) causes them to de­velop lung can­cer. There is no di­rect causal link be­tween smok­ing and lung can­cer. Agents with­out this le­sion con­tract lung can­cer only 1% of the time, and an agent can nei­ther di­rectly ob­serve, nor con­trol whether she suffers from the le­sion. The agent gains util­ity equiv­a­lent to $1,000 by smok­ing (re­gard­less of whether she dies soon), and gains util­ity equiv­a­lent to $1,000,000 if she doesn’t die of can­cer. Should she smoke, or re­frain?

The prob­lem does not spec­ify this ex­plic­itly, but it seems rea­son­able to as­sume that the agents with­out the le­sion do not en­joy smok­ing and get 0 util­ity from it.

There are 8 pos­si­ble wor­lds here, with differ­ent util­ities and prob­a­bil­ities:

An agent who “de­cides” to smoke has higher ex­pected util­ity than the one who de­cides not to, and this “de­ci­sion” lets us learn which of the 4 pos­si­ble wor­lds could be ac­tual, and even­tu­ally when she gets the test re­sults we learn which one is the ac­tual world.

Note that the anal­y­sis would be ex­actly the same if there was a “di­rect causal link be­tween de­sire for smok­ing and lung can­cer”, with­out any “ar­te­rial le­sion”. In the prob­lem as stated there is no way to dis­t­in­guish be­tween the two, since there are no other ob­serv­able con­se­quences of the le­sion. There is 99% cor­re­la­tion be­tween the de­sire to smoke and and can­cer, and that’s the only thing that mat­ters. Whether there is a “com­mon cause” or can­cer causes the de­sire to smoke, or de­sire to smoke causes can­cer is ir­rele­vant in this setup. It may be­come rele­vant if there were a way to af­fect this cor­re­la­tion, say, by cur­ing the le­sion, but it is not in the prob­lem as stated. Some de­ci­sion the­o­rists tend to get con­fused over this be­cause they think of this mag­i­cal thing they call “causal­ity,” the qualia of your de­ci­sions be­ing yours and free, caus­ing the world to change upon your meta­phys­i­cal com­mand. They draw fancy causal graphs like this one:

in­stead of list­ing and eval­u­at­ing pos­si­ble wor­lds.

Parfit’s Hitch­hiker Problem

An agent is dy­ing in the desert. A driver comes along who offers to give the agent a ride into the city, but only if the agent will agree to visit an ATM once they ar­rive and give the driver $1,000.
The driver will have no way to en­force this af­ter they ar­rive, but she does have an ex­traor­di­nary abil­ity to de­tect lies with 99% ac­cu­racy. Be­ing left to die causes the agent to lose the equiv­a­lent of $1,000,000. In the case where the agent gets to the city, should she pro­ceed to visit the ATM and pay the driver?

We note a miss­ing piece in the prob­lem state­ment: what are the odds of the agent ly­ing about not pay­ing and the driver de­tect­ing the lie and giv­ing a ride, any­way? It can be, for ex­am­ple, 0% (the driver does not bother to use her lie de­tec­tor in this case) or the same 99% ac­cu­racy as in the case where the agent lies about pay­ing. We as­sume the first case for this prob­lem, as this is what makes more sense in­tu­itively.

As usual, we draw pos­si­ble wor­lds, par­ti­tioned by the “de­ci­sion” made by the hitch­hiker and note the util­ity of each pos­si­ble world. We do not know which world would be the ac­tual one for the hitch­hiker un­til we ob­serve it (“we” in this case might de­note the agent them­selves, even though they feel like they are mak­ing a de­ci­sion).

So, while the high­est util­ity world is where the agent does not pay and the driver be­lieves they would, the odds of this pos­si­ble world be­ing ac­tual are very low, and the agent who will end up pay­ing af­ter the trip has higher ex­pected util­ity be­fore the trip. This is pretty con­fus­ing, be­cause the in­tu­itive CDT ap­proach would be to promise to pay, yet re­fuse af­ter. This is effec­tively thwarted by the driver’s lie de­tec­tor. Note that if the lie de­tec­tor was perfect, then there would be just two pos­si­ble wor­lds:

  1. pay and sur­vive,

  2. do not pay and die.

Once the pos­si­ble wor­lds are writ­ten down, it be­comes clear that the prob­lem is es­sen­tially iso­mor­phic to New­comb’s.

Another prob­lem that is iso­mor­phic to it is

The Trans­par­ent New­comb Problem

Events tran­spire as they do in New­comb’s prob­lem, ex­cept that this time both boxes are trans­par­ent — so the agent can see ex­actly what de­ci­sion the pre­dic­tor made be­fore mak­ing her own de­ci­sion. The pre­dic­tor placed $1,000,000 in box B iff she pre­dicted that the agent would leave be­hind box A (which con­tains $1,000) upon see­ing that both boxes are full. In the case where the agent faces two full boxes, should she leave the $1,000 be­hind?

Once you are used to enu­mer­at­ing pos­si­ble wor­lds, whether the boxes are trans­par­ent or not, does not mat­ter. The de­ci­sion whether to take one box or two already made be­fore the boxes are pre­sented, trans­par­ent or not. The anal­y­sis of the con­ceiv­able wor­lds is iden­ti­cal to the origi­nal New­comb’s prob­lem. To clar­ify, if you are in the world where you see two full boxes, wouldn’t it make sense to two-box? Well, yes, it would, but if this is what you “de­cide” to do (and all de­ci­sions are made in ad­vance, as far as the pre­dic­tor is con­cerned, even if the agent is not aware of this), you will never (or very rarely, if the pre­dic­tor is al­most, but not fully in­fal­lible) find your­self in this world. Con­versely, if you one-box even if you see two full boxes, that situ­a­tion is always, or al­most always hap­pens.

If you think you pre-com­mit­ted to one-box­ing but then are ca­pa­ble of two box­ing, con­grat­u­la­tions! You are in the rare world where you have suc­cess­fully fooled the pre­dic­tor!

From this anal­y­sis it be­comes clear that the word “trans­par­ent” is yet an­other su­perflu­ous stipu­la­tion, as it con­tains no new in­for­ma­tion. Two-box­ers will two-box, one-box­ers will one-box, trans­parency or not.

At this point it is worth point­ing out the differ­ence be­tween world count­ing and EDT, CDT and FDT. The lat­ter three tend to get mired in rea­son­ing about their own rea­son­ing, in­stead of rea­son­ing about the prob­lem they are try­ing to de­cide. In con­trast, we mind­lessly eval­u­ate prob­a­bil­ity-weighted util­ities, un­con­cerned with the pit­falls of causal­ity, retro-causal­ity, coun­ter­fac­tu­als, counter-pos­si­bil­ities, sub­junc­tive de­pen­dence and other hy­po­thet­i­cal epicy­cles. There are only re­cur­sion-free pos­si­ble wor­lds of differ­ent prob­a­bil­ities and util­ities, and a sin­gle ac­tual world ob­served af­ter ev­ery­thing is said and done. While rea­son­ing about rea­son­ing is clearly ex­tremely im­por­tant in the field of AI re­search, the dilem­mas pre­sented in EYNS do not re­quire any­thing as in­volved. Sim­ple count­ing does the trick bet­ter.

The next prob­lem is rather con­fus­ing in its origi­nal pre­sen­ta­tion.

The Cos­mic Ray Problem

An agent must choose whether to take $1 or $100. With van­ish­ingly small prob­a­bil­ity, a cos­mic ray will cause her to do the op­po­site of what she would have done oth­er­wise. If she learns that she has been af­fected by a cos­mic ray in this way, she will need to go to the hos­pi­tal and pay $1,000 for a check-up. Should she take the $1, or the $100?

A bit of clar­ifi­ca­tion is in or­der be­fore we pro­ceed. What does “do the op­po­site of what she would have done oth­er­wise” mean, op­er­a­tionally?. Here let us in­ter­pret it in the fol­low­ing way:

De­cid­ing and at­tempt­ing to do X, but end­ing up do­ing the op­po­site of X and re­al­iz­ing it af­ter the fact.

Some­thing like “OK, let me take $100… Oops, how come I took $1 in­stead? I must have been struck by a cos­mic ray, gotta do the $1000 check-up!”

Another point is that here again there are two prob­a­bil­ities in play, the odds of tak­ing $1 while in­tend­ing to take $100 and the odds of tak­ing $100 while in­tend­ing to take $1. We as­sume these are the same, and de­note the (small) prob­a­bil­ity of a cos­mic ray strike as p.

The anal­y­sis of the dilemma is bor­ingly similar to the pre­vi­ous ones:

Thus at­tempt­ing to take $100 has a higher pay­off as long as the “van­ish­ingly small” prob­a­bil­ity of the cos­mic ray strike is un­der 50%. Again, this is just a calcu­la­tion of ex­pected util­ities, though an agent be­liev­ing in meta­phys­i­cal free will may take it as a recom­men­da­tion to act a cer­tain way.

The fol­low­ing setup and anal­y­sis is slightly more tricky, but not by much.

The XOR Blackmail

An agent has been alerted to a ru­mor that her house has a ter­rible ter­mite in­fes­ta­tion that would cost her $1,000,000 in dam­ages. She doesn’t know whether this ru­mor is true. A greedy pre­dic­tor with a strong rep­u­ta­tion for hon­esty learns whether or not it’s true, and drafts a let­ter:
I know whether or not you have ter­mites, and I have sent you this let­ter iff ex­actly one of the fol­low­ing is true: (i) the ru­mor is false, and you are go­ing to pay me $1,000 upon re­ceiv­ing this let­ter; or (ii) the ru­mor is true, and you will not pay me upon re­ceiv­ing this let­ter.
The pre­dic­tor then pre­dicts what the agent would do upon re­ceiv­ing the let­ter, and sends the agent the let­ter iff ex­actly one of (i) or (ii) is true. 13 Thus, the claim made by the let­ter is true. As­sume the agent re­ceives the let­ter. Should she pay up?

The prob­lem is called “black­mail” be­cause those sus­cep­ti­ble to pay­ing the ran­som re­ceive the let­ter when their house doesn’t have ter­mites, while those who are not sus­cep­ti­ble do not. The pre­dic­tor has no in­fluence on the in­fes­ta­tion, only on who re­ceives the let­ter. So, by pre-com­mit­ting to not pay­ing, one avoids the black­mail and if they re­ceive the let­ter, it is ba­si­cally an ad­vanced no­tifi­ca­tion of the in­fes­ta­tion, noth­ing more. EYNS states “the ra­tio­nal move is to re­fuse to pay” as­sum­ing the agent re­ceives the let­ter. This ten­ta­tively as­sumes that the agent has a choice in the mat­ter once the let­ter is re­ceived. This turns the prob­lem on its head and gives the agent a coun­ter­in­tu­itive op­tion of hav­ing to de­cide whether to pay af­ter the let­ter has been re­ceived, as op­posed to an­a­lyz­ing the prob­lem in ad­vance (and pre­com­mit­ting to not pay­ing, thus pre­vent­ing the let­ter from be­ing sent, if you are the sort of per­son who be­lieves in choice).

The pos­si­ble wor­lds anal­y­sis of the prob­lem is as fol­lows. Let’s as­sume that the prob­a­bil­ity of hav­ing ter­mites is p, the greedy pre­dic­tor is perfect, and the let­ter is sent to ev­ery­one “el­i­gible”, i.e. to ev­ery­one with an in­fes­ta­tion who would not pay, and to ev­ery­one with­out the in­fes­ta­tion who would pay upon re­ceiv­ing the let­ter. We fur­ther as­sume that there are no para­noid agents, those who would pay “just in case” even when not re­ceiv­ing the let­ter. In gen­eral, this case would have to be con­sid­ered as a sep­a­rate world.

Now the anal­y­sis is quite rou­tine:

Thus not pay­ing is, not sur­pris­ingly, always bet­ter than pay­ing, by the “black­mail amount” 1,000(1-p).

One thing to note is that the case of where the would-pay agent has ter­mites, but does not re­ceive a let­ter is easy to over­look, since it does not in­clude re­ceiv­ing a let­ter from the pre­dic­tor. How­ever, this is a pos­si­ble world con­tribut­ing to the over­all util­ity, if it is not ex­plic­itly stated in the prob­lem.

Other dilem­mas that yield to a straight­for­ward anal­y­sis by world enu­mer­a­tion are Death in Da­m­as­cus, reg­u­lar and with a ran­dom coin, the Me­chan­i­cal Black­mail and the Psy­chopath But­ton.

One fi­nal point that I would like to ad­dress is that treat­ing the ap­par­ent de­ci­sion mak­ing as a self- and world-dis­cov­ery pro­cess, not as an at­tempt to change the world, helps one an­a­lyze ad­ver­sar­ial se­tups that stump the de­ci­sion the­o­ries that as­sume free will.

Im­mu­nity from Ad­ver­sar­ial Predictors

EYNS states in Sec­tion 9:

“There is no perfect de­ci­sion the­ory for all pos­si­ble sce­nar­ios, but there may be a gen­eral-pur­pose de­ci­sion the­ory that matches or out­performs all ri­vals in fair dilem­mas, if a satis­fac­tory no­tion of “fair­ness” can be for­mal­ized.” and later “There are some im­me­di­ate tech­ni­cal ob­sta­cles to pre­cisely ar­tic­u­lat­ing this no­tion of fair­ness. Imag­ine I have a copy of Fiona, and I pun­ish any­one who takes the same ac­tion as the copy. Fiona will always lose at this game, whereas Carl and Eve might win. In­tu­itively, this prob­lem is un­fair to Fiona, and we should com­pare her perfor­mance to Carl’s not on the “act differ­ently from Fiona” game, but on the analo­gous “act differ­ently from Carl” game. It re­mains un­clear how to trans­form a prob­lem that’s un­fair to one de­ci­sion the­ory into an analo­gous one that is un­fair to a differ­ent one (if an ana­log ex­ists) in a rea­son­ably prin­ci­pled and gen­eral way.”

I note here that sim­ply enu­mer­at­ing pos­si­ble wor­lds evades this prob­lem as far as I can tell.

Let’s con­sider a sim­ple “un­fair” prob­lem: If the agent is pre­dicted to use a cer­tain de­ci­sion the­ory DT1, she gets noth­ing, and if she is pre­dicted to use some other ap­proach (DT2), she gets $100. There are two pos­si­ble wor­lds here, one where the agent uses DT1, and the other where she uses DT2:

So a prin­ci­pled agent who always uses DT1 is pe­nal­ized. Sup­pose an­other time the agent might face the op­po­site situ­a­tion, where she is pun­ished for fol­low­ing DT2 in­stead of DT1. What is the poor agent to do, be­ing stuck be­tween Scylla and Charyb­dis? There are 4 pos­si­ble wor­lds in this case:

  1. Agent uses DT1 always

  2. Agent uses DT2 always

  3. Agent uses DT1 when re­warded for us­ing DT1 and DT2 when re­warded for us­ing DT2

  4. Agent uses DT1 when pun­ished for us­ing DT1 and DT2 when pun­ished for us­ing DT2

The world num­ber 3 is where a the agent wins, re­gard­less of how ad­ver­sar­ial or “un­fair” the pre­dic­tor is try­ing to be to her. Enu­mer­at­ing pos­si­ble wor­lds lets us crys­tal­lize the type of an agent that would always get max­i­mum pos­si­ble pay­off, no mat­ter what. Such an agent would sub­jec­tively feel that they are ex­cel­lent at mak­ing de­ci­sions, whereas they sim­ply live in the world where they hap­pen to win.

No nominations.
No reviews.