# Why conditioning on “the agent takes action a” isn’t enough

This post ex­pands a bit on a point that I didn’t have enough space to make in the pa­per Toward Ideal­ized De­ci­sion The­ory.

Say we have a de­scrip­tion of an agent pro­gram, and a de­scrip­tion of a uni­verse pro­gram , and a set of ac­tions , and a Bayesian prob­a­bil­ity dis­tri­bu­tion over propo­si­tions about the world. Say fur­ther that for each we can form the propo­si­tion “the agent takes ac­tion ”.

Part of the prob­lem with EDT is that we can’t, in fact, use this to eval­u­ate . Why not? Be­cause the prob­a­bil­ity that the agent takes ac­tion may be zero (if the agent does not in fact take ac­tion ), and so eval­u­at­ing the above might re­quire con­di­tion­ing on an event of prob­a­bil­ity zero.

There are two com­mon re­flex­ive re­sponses: one is to mod­ify the agent so that there is no ac­tion which will definitely not be taken (say, by adding code to the agent which iter­ates over each ac­tion, checks whether the prob­a­bil­ity of ex­e­cut­ing that ac­tion is zero, and then ex­e­cutes the ac­tion if it is definitely not go­ing to be ex­e­cuted). The sec­ond re­sponse is to say “Yeah, but no Bayesian would be cer­tain that an ac­tion won’t be taken, in re­al­ity. There’s always some chance of cos­mic rays, and so on. So these events will never ac­tu­ally have prob­a­bil­ity zero.”

But while both of these ob­jec­tions work—in the sense that in most re­al­is­tic uni­verses, will be defined for all ac­tions -- it does not fix the prob­lem. You’ll be able to get a value for each ac­tion , per­haps, but this value will not nec­es­sar­ily cor­re­spond to the util­ity that the agent would get if it did take that ac­tion.

Why not? Be­cause con­di­tion­ing on un­likely events can put you into very strange parts of the prob­a­bil­ity space.

Con­sider a uni­verse where the agent first has to choose be­tween a red box (worth $1) and a green box (worth$100), and then must de­cide whether or not to pay $1000 to metic­u­lously go through its hard­ware and cor­rect for bits flipped by cos­mic rays. Say that this agent rea­sons ac­cord­ing to EDT. It may be the case that this agent has ex­tremely high prob­a­bil­ity mass on choos­ing “red” but nonzero mass on choos­ing “green” (be­cause it might get hit by cos­mic rays). But if it chooses green, it ex­pects that it would no­tice that this only hap­pens when it’s been hit by cos­mic rays, and so would pay$1000 to get its hard­ware checked. That is, and .

What went wrong? In brief, “green” hav­ing nonzero prob­a­bil­ity does not im­ply that con­di­tion­ing on “the agent takes the green box” is the same as the coun­ter­fac­tual as­sump­tion that the agent takes the green box. The con­di­tional prob­a­bil­ity dis­tri­bu­tion may be very differ­ent from the un­con­di­tioned prob­a­bil­ity dis­tri­bu­tion (as in the ex­am­ple above, where con­di­tioned on “the agent takes the green box”, the agent would ex­pect that it had been hit by cos­mic rays). More gen­er­ally, con­di­tion­ing the dis­tri­bu­tion on “the agent takes the green box” may in­tro­duce spu­ri­ous cor­re­la­tions with ex­pla­na­tions for the ac­tion (e.g., cos­mic rays), and there­fore does not mea­sure the coun­ter­fac­tual value that the agent would get if it did take the green box “of it’s own vo­li­tion” /​ “for good rea­sons”.

Roughly speak­ing, ev­i­den­tial de­ci­sion the­ory has us look at the prob­a­bil­ity dis­tri­bu­tion where the agent does in fact take a par­tic­u­lar ac­tion, whereas (when do­ing de­ci­sion the­ory) we want the prob­a­bil­ity dis­tri­bu­tion over what would hap­pen if the agent did take the ac­tion. Forc­ing the event “the agent takes ac­tion ” to have pos­i­tive prob­a­bil­ity does not make the former dis­tri­bu­tion look like the lat­ter dis­tri­bu­tion: in­deed, if the event has pos­i­tive prob­a­bil­ity for strange rea­sons (cos­mic rays, small prob­a­bil­ity that re­al­ity is a hal­lu­ci­na­tion, or be­cause you played chicken with your dis­tri­bu­tion) then it’s quite un­likely that the con­di­tional dis­tri­bu­tion will look like the de­sired coun­ter­fac­tual dis­tri­bu­tion.

We don’t want to ask “tell me about the (po­ten­tially crazy) cor­ner of the prob­a­bil­ity dis­tri­bu­tion where the agent ac­tu­ally does take ac­tion ”, we want to ask “tell me about the prob­a­bil­ity dis­tri­bu­tion that is as close as pos­si­ble to the cur­rent world model, ex­cept imag­in­ing that the agent takes ac­tion .”

The lat­ter thing is still vague and un­der­speci­fied, of course; figur­ing out how to for­mal­ize it is pretty much our goal with study­ing de­ci­sion the­ory.

• UDT has this same prob­lem, though. In UDT, model un­cer­tainty is be­ing ex­ploited in­stead of en­vi­ron­men­tal un­cer­tainty, but con­di­tion­ing on “Agent takes ac­tion A” in­tro­duces spu­ri­ous cor­re­la­tions with fea­tures of the model where it takes ac­tion A.

In par­tic­u­lar, only one of the ac­tions will hap­pen in the mod­els where con(PA) is true, so the rest of the ac­tions oc­cur in mod­els where con(PA) is false, and this causes prob­lems as de­tailed in “The Odd Coun­ter­fac­tu­als of Play­ing Chicken” and the com­ments on “An In­for­mal Con­jec­ture on Proof Length and Log­i­cal Coun­ter­fac­tu­als”.

I sus­pect this may also be rele­vant to non-op­ti­mal­ity when the en­vi­ron­ment is prov­ing things about the agent. The heart of do­ing well on those sorts of prob­lems seems to be the agent trust­ing that the pre­dic­tor will cor­rectly pre­dict its de­ci­sion, but of course, a PA-based ver­sion of UDT can’t know that a PA or ZFC-based proof searcher will be sound re­gard­ing its own ac­tions.