A Rationality Condition for CDT Is That It Equal EDT (Part 1)

[Epistemic Sta­tus: this se­ries of two posts gives some ar­gu­ments which, in my eyes, make it difficult to main­tain a po­si­tion other than CDT=EDT, but not im­pos­si­ble. As I ex­plain at the end of the sec­ond post, it is still quite ten­able to sup­pose that CDT and EDT end up tak­ing differ­ent ac­tions.]

Pre­vi­ously, I ar­gued that fair com­par­i­sons of CDT and EDT (in which the same prob­lem rep­re­sen­ta­tion is given to both de­ci­sion the­o­ries) will con­clude that CDT=EDT, un­der what I see as rea­son­able as­sump­tions. Re­cently, Paul Chris­ti­ano wrote a post ar­gu­ing that, all things con­sid­ered, the ev­i­dence strongly fa­vors EDT. Jes­sica Tay­lor pointed out that Paul didn’t ad­dress the prob­lem of con­di­tion­ing on prob­a­bil­ity zero events, but she came up with a novel way of ad­dress­ing that prob­lem by tak­ing the limit of small prob­a­bil­ities: COEDT.

Here, I provide fur­ther ar­gu­ments that ra­tio­nal­ity con­straints point in the di­rec­tion of COEDT-like solu­tions.

Note that I ar­gue for the con­clu­sion that CDT=EDT, which is some­what differ­ent from ar­gu­ing di­rectly for EDT; my line of rea­son­ing sug­gests some ad­di­tional struc­ture which could be missed by ad­vo­cat­ing EDT in iso­la­tion (or CDT in iso­la­tion). Paul’s post de­scribed CDT as a very spe­cial case of EDT, in which our ac­tion is in­de­pen­dent of other things we care about. This is true, but, we can also ac­cu­rately de­scribe EDT is a very spe­cial case of CDT where all prob­a­bil­is­tic re­la­tion­ships which re­main af­ter con­di­tion­ing on what we know turn out to also be causal re­la­tion­ships. I more of­ten think in the sec­ond way, be­cause CDT can have all sorts of coun­ter­fac­tu­als based on how cau­sa­tion works. EDT claims that these are only cor­rect when they agree with the con­di­tional prob­a­bil­ities.

(ETA: When I say “CDT”, I’m point­ing at some kind of steel-man of CDT which uses log­i­cal coun­ter­fac­tu­als rather than phys­i­cal coun­ter­fac­tu­als. TDT is a CDT in this sense, whereas UDT could be ei­ther CDT or EDT.)

This post will be full of con­jec­tural sketches, and mainly serves to con­vey my in­tu­itions about how COEDT could fit into the larger pic­ture.

Hyper­real Probability

Ini­tially, think­ing about COEDT, I was con­cerned that al­though some­thing im­por­tant had been ac­com­plished, the con­struc­tion via limits didn’t seem fun­da­men­tal enough that it should be­long in our ba­sic no­tion of ra­tio­nal­ity. Then, I re­called how hy­per­real num­bers (which can be thought of as se­quences of real num­bers) are a nat­u­ral gen­er­al­iza­tion of de­ci­sion the­ory. This crops up in sev­eral differ­ent forms in differ­ent ar­eas of Bayesian foun­da­tions, but most crit­i­cally for the cur­rent dis­cus­sion, in the ques­tion of how to con­di­tion on prob­a­bil­ity zero events. Quot­ing an ear­lier post of mine:

In What Con­di­tional Prob­a­bil­ities Could Not Be, Alan Ha­jek ar­gues that con­di­tional prob­a­bil­ity can­not pos­si­bly be defined by Bayes’ fa­mous for­mula, due pri­mar­ily to its in­ad­e­quacy when con­di­tion­ing on events of prob­a­bil­ity zero. He also takes is­sue with other pro­posed defi­ni­tions, ar­gu­ing that con­di­tional prob­a­bil­ity should in­stead be taken as prim­i­tive.
The most pop­u­lar way of do­ing this are Pop­per’s ax­ioms of con­di­tional prob­a­bil­ity. In Learn­ing the Im­pos­si­ble (Vann McGee, 1994), it’s shown that con­di­tional prob­a­bil­ity func­tions fol­low­ing Pop­per’s ax­ioms and non­stan­dard-real prob­a­bil­ity func­tions with con­di­tion­als defined ac­cord­ing to Bayes’ the­o­rem are in­ter-trans­lat­able. Ha­jek doesn’t like the in­finites­i­mal ap­proach be­cause of the re­sult­ing non-unique­ness of rep­re­sen­ta­tion; but, for those who don’t see this as a prob­lem but who put some stock in Ha­jek’s other ar­gu­ments, this would be an­other point in fa­vor of in­finites­i­mal prob­a­bil­ity.

In other words, there is an ax­iom­a­ti­za­tion of prob­a­bil­ity—Pop­per’s ax­ioms—which takes con­di­tional prob­a­bil­ity to be fun­da­men­tal rather than de­rived. This ap­proach is rel­a­tively un­known out­side philos­o­phy, but of­ten ad­vo­cated by philoso­phers as a su­pe­rior no­tion of prob­a­bil­ity, largely be­cause it al­lows one to con­di­tion on prob­a­bil­ity zero events. Pop­per’s ax­ioms are in some sense equiv­a­lent to al­low­ing hy­per­real prob­a­bil­ities, which also means (with a lit­tle math­e­mat­i­cal hand-wav­ing; I haven’t worked this out in de­tail) we can think of them as a limit of a se­quence of strictly nonzero prob­a­bil­ity dis­tri­bu­tions.

All of this agrees nicely with Jes­sica’s ap­proach.

I take this to strongly sug­gest that rea­son­able ap­proaches to con­di­tion­ing on prob­a­bil­ity zero events in EDT will share the limit-like as­pect of Jes­sica’s ap­proach, even if it isn’t ob­vi­ous that they do. (Pop­per’s ax­ioms are “limit-like”, but this was prob­a­bly not ob­vi­ous to Pop­per.) The ma­jor con­tri­bu­tion of COEDT be­yond this is to provide a par­tic­u­lar way of con­struct­ing such limits.

(Hav­ing the idea “coun­ter­fac­tu­als should look like con­di­tion­als in hy­per­real prob­a­bil­ity dis­tri­bu­tions” is not enough to solve de­ci­sion the­ory prob­lems alone, since it is far from ob­vi­ous how we should con­struct hy­per­real prob­a­bil­ity dis­tri­bu­tions over logic to get rea­son­able log­i­cal coun­ter­fac­tu­als.)

Hyper­real Bayes Nets & CDT=EDT

(The fol­low­ing ar­gu­ment is the only jus­tifi­ca­tion of the ti­tle of the post which will ap­pear in Part 1. I’ll have a differ­ent ar­gu­ment for the claim in the ti­tle in Part 2.)

The CDT=EDT ar­gu­ment can now be adapted to hy­per­real struc­tures. My origi­nal ar­gu­ment re­quired:

1. Prob­a­bil­ities & Causal Struc­ture are Com­pat­i­ble: The de­ci­sion prob­lem is given as a Bayes net, in­clud­ing an ac­tion node (for the ac­tual ac­tion taken by the agent) and a de­ci­sion node (for the mixed strat­egy the agent de­cides on). The CDT agent in­ter­prets this as a causal net, whereas the EDT agent ig­nores the causal in­for­ma­tion and treats it as a prob­a­bil­ity dis­tri­bu­tion.

2. Ex­plo­ra­tion: all ac­tion prob­a­bil­ities are bounded away from zero in the de­ci­sion; that is, the de­ci­sion node is re­stricted to mixed strate­gies in which each ac­tion gets some min­i­mal prob­a­bil­ity.

3. Mixed-Strat­egy Rat­ifi­a­bil­ity: The agents know the state of the de­ci­sion node. (This can be re­laxed to ap­prox­i­mate self-knowl­edge un­der some ad­di­tional as­sump­tions.)

4. Mixed-Strat­egy Im­ple­mentabil­ity: The ac­tion node doesn’t have any par­ents other than the de­ci­sion node.

I jus­tified as­sump­tion #2 as an ex­ten­sion of the de­sire to give EDT a fair trial: EDT is only clearly-defined in cases with ep­silon ex­plo­ra­tion, so I ar­gued that CDT and EDT should be com­pared with ep­silon-ex­plo­ra­tion. How­ever, if you pre­fer CDT be­cause EDT isn’t well-defined when con­di­tion­ing on prob­a­bil­ity zero ac­tions, this isn’t much of an ar­gu­ment.

We can now ad­dress this by re­quiring con­di­tion­als on prob­a­bil­ity zero events to be limits of se­quences of con­di­tion­als in which the event has greater than zero prob­a­bil­ity. Or (I think equiv­a­lently), we think of the prob­a­bil­ity dis­tri­bu­tion as be­ing the real part of a hy­per­real prob­a­bil­ity dis­tri­bu­tion.

Hav­ing done this, we can ap­ply the same CDT=EDT re­sult to Bayes nets with hy­per­real con­di­tional prob­a­bil­ity ta­bles. This shows that CDT still equals EDT with­out re­strict­ing to mixed strate­gies, so long as con­di­tion­als on zero-prob­a­bil­ity ac­tions are defined via limits.

This still leaves the other ques­tion­able as­sump­tions be­hind the CDT=EDT the­o­rem.

#1 (com­pat­i­ble prob­a­bil­ity & causal­ity): I framed this as­sump­tion as the main con­di­tion for a fair fight be­tween CDT and EDT: if the causal struc­ture is not com­pat­i­ble with the prob­a­bil­ity dis­tri­bu­tion, then you are ba­si­cally hand­ing differ­ent prob­lems to CDT and EDT and then com­plain­ing that one gets worse re­sults than the other. How­ever, the case is not so clear as I made it out to be. In cases where CDT/​EDT are in spe­cific de­ci­sion prob­lems which they un­der­stand well, the causal struc­ture and prob­a­bil­is­tic struc­ture must be com­pat­i­ble. How­ever, bound­edly ra­tio­nal agents will have in­con­sis­tent be­liefs, and it may be that be­liefs about causal struc­ture are some­times in­con­sis­tent with other be­liefs. An ad­vo­cate of CDT or EDT might say that the differ­en­ti­at­ing cases are on ex­actly such in­con­sis­tent ex­am­ples.

Although I agree that it’s im­por­tant to con­sider how agents deal with in­con­sis­tent be­liefs (that’s log­i­cal un­cer­tainty!), I don’t cur­rently think it makes sense to judge them on in­con­sis­tent de­ci­sion prob­lems. So, I’ll set aside such prob­lems.

No­tice, how­ever, that one might con­test whether there’s nec­es­sar­ily a rea­son­able causal struc­ture at all, and deny #1 that way.

#3 (rat­ifi­a­bil­ity): The rat­ifi­a­bil­ity as­sump­tion is a kind of equil­ibrium con­cept; the agent’s mixed strat­egy has to be in equil­ibrium with knowl­edge of that very mixed strat­egy. I ar­gued that it is as much a part of un­der­stand­ing the situ­a­tion the agent is in as any­thing else, and that it is usu­ally ap­prox­i­mately achiev­able (IE, doesn’t cause ter­rible self-refer­ence prob­lems or im­ply log­i­cal om­ni­science). How­ever, I didn’t prove that a rat­ifi­able equil­ibrium always ex­ists! Non-ex­is­tence would triv­ial­ize the re­sult, mak­ing it into an ar­gu­ment from false premises to a false con­clu­sion.

Jes­sica’s COEDT re­sults ad­dress this con­cern, show­ing that this level of self-knowl­edge is in­deed fea­si­ble.

#4 (im­ple­mentabil­ity): I think of this as the shak­iest as­sump­tion; it is easy to set up de­ci­sion prob­lems which vi­o­late it. How­ever, I tend to think such se­tups get the causal struc­ture wrong. Other par­ents of the ac­tion should in­stead be thought of as chil­dren of the ac­tion. Fur­ther­more, if an agent is learn­ing about the struc­ture of a situ­a­tion by re­peated ex­po­sure to that situ­a­tion, im­ple­mentabil­ity seems nec­es­sary for the agent to come to un­der­stand the situ­a­tion it is in: par­ents of the ac­tion will look like chil­dren if you try to perform ex­per­i­ments to see what hap­pens when you do differ­ent things.

I won’t provide any di­rect ar­gu­ments for the im­ple­mentabil­ity con­straint in the rest of this post, but I’ll be dis­cussing other con­nec­tions be­tween learn­ing and coun­ter­fac­tual rea­son­ing.

Are We Really Elimi­nat­ing Ex­plo­ra­tion?

Ways of Tak­ing Coun­ter­fac­tu­als are Some­what Interchangeable

When think­ing about de­ci­sion the­ory, we tend to fo­cus on putting the agent in a par­tic­u­lar well-defined prob­lem. How­ever, re­al­is­ti­cally, an agent has a large amount of un­cer­tainty about the struc­ture of the situ­a­tion it is in. So, a big part of get­ting things right is learn­ing what situ­a­tion you’re in.

Any rea­son­able way of defin­ing coun­ter­fac­tu­als for ac­tions, be it CDT or COEDT or some­thing else, is go­ing to be able to de­scribe es­sen­tially any com­bi­na­tion of con­se­quences for the differ­ent ac­tions. So, for an agent who doesn’t know what situ­a­tion it is in, any sys­tem of coun­ter­fac­tu­als is pos­si­ble no mat­ter how coun­ter­fac­tu­als are defined. In some sense, this means that get­ting coun­ter­fac­tu­als right will be mainly up to the learn­ing. Choos­ing be­tween differ­ent kinds of coun­ter­fac­tual rea­son­ing is a bit like choos­ing differ­ent pri­ors—you would hope it gets washed out by learn­ing.

Ex­plo­ra­tion is Always Ne­c­es­sary for Learn­ing Guarantees

COEDT elimi­nates the need for ex­plo­ra­tion in 5-and-10, which in­tu­itively means cases where it should be re­ally, re­ally ob­vi­ous what to do. It isn’t clear to what ex­tent COEDT helps with other is­sues. I’m skep­ti­cal that COEDT alone will al­low us to get the right coun­ter­fac­tu­als for game-the­o­retic rea­son­ing. But, it is re­ally clear that COEDT doesn’t change the fun­da­men­tal trade-off be­tween learn­ing guaran­tees (via ex­plo­ra­tion) and Bayesian op­ti­mal­ity (with­out ex­plo­ra­tion).

This is illus­trated by the fol­low­ing prob­lem:

Scary Door Prob­lem. Ac­cord­ing to your prior, there is some chance that doors of a cer­tain red color con­ceal mon­sters who will de­stroy the uni­verse if dis­turbed. Your prior holds that this is not very strongly cor­re­lated to any facts you could ob­serve with­out open­ing such a door. So, there is no way to know whether such doors con­ceal uni­verse-de­stroy­ing mon­sters with­out try­ing them. If you knew such doors were free of uni­verse-de­stroy­ing mon­sters, there are var­i­ous rea­sons why you might some­times want to open them.

The scary door prob­lem illus­trates the ba­sic trade-off be­tween asymp­totic op­ti­mal­ity and sub­jec­tive op­ti­mal­ity. Ep­silon ex­plo­ra­tion would guaran­tee that you oc­ca­sion­ally open scary doors. If such doors con­ceal mon­sters, you de­stroy the uni­verse. How­ever, if you re­fuse to open scary doors, then it may be that you never learn to perform op­ti­mally in the world you’re in.

What COEDT does is show that the scary door and 5-and-10 re­ally are differ­ent sorts of prob­lem. If there weren’t ap­proaches like COEDT which elimi­nate the need for ex­plo­ra­tion in 5-and-10, we would be forced to con­clude that they’re the same: no mat­ter how easy the prob­lem looks, you have to ex­plore in or­der to learn the right coun­ter­fac­tu­als.

So, COEDT shows that not all coun­ter­fac­tual rea­son­ing has to re­duce to learn­ing. There are prob­lems you can get right by rea­son­ing alone. You don’t always have to ex­plore; you can re­fuse to open scary doors, while still re­li­ably pick­ing up $10.

I men­tioned that choos­ing be­tween differ­ent no­tions of coun­ter­fac­tual is kind of like choos­ing be­tween differ­ent pri­ors—you might hope it gets washed out by learn­ing. The scary door prob­lem illus­trates why we might not want the learn­ing to be pow­er­ful enough to wash out the prior. This means get­ting the prior right is quite im­por­tant.

You Still Ex­plore in Log­i­cal Time

If you fol­low the log­i­cal time anal­ogy, it seems like you can’t ever re­ally con­struct log­i­cal coun­ter­fac­tu­als with­out ex­plo­ra­tion in some sense: if you rea­son about a coun­ter­fac­tual, the coun­ter­fac­tual sce­nario ex­ists some­where in your log­i­cal past, since it is a real math­e­mat­i­cal ob­ject. Hence, you must take the al­ter­nate ac­tion some­times in or­der to rea­son about it at all.

So, how does a COEDT agent man­age not to ex­plore?

COEDT can be thought of as “learn­ing” from an in­finite se­quence of agents who ex­plore less and less. None of those agents are COEDT agents, but they get closer and closer. If each of these agents ex­ists at a finite log­i­cal time, COEDT ex­ists at an in­finite log­i­cal time, greater than any of the agents COEDT learns from. So, COEDT doesn’t need to ex­plore be­cause COEDT doesn’t try to learn from agents max­i­mally similar to it­self; it is OK with a sys­tem­atic differ­ence be­tween it­self and the refer­ence class it log­i­cally learns from.

This sys­tem­atic differ­ence may al­low us to drive a wedge be­tween the agent and its refer­ence class to demon­strate prob­le­matic be­hav­ior. I won’t try to con­struct such a case to­day.

In the COEDT post, Jes­sica says:

I con­sider COEDT to be ma­jor progress in de­ci­sion the­ory. Be­fore COEDT, there were (as far as I know) 3 differ­ent ways to solve 5 and 10, all based on coun­ter­fac­tu­als:
• Causal coun­ter­fac­tu­als (as in CDT), where coun­ter­fac­tu­als are wor­lds where phys­i­cal magic hap­pens to force the agent’s ac­tion to be some­thing spe­cific.
• Model-the­o­retic coun­ter­fac­tu­als (as in modal UDT), where coun­ter­fac­tu­als are mod­els in which false state­ments are true, e.g. where PA is in­con­sis­tent.
• Prob­a­bil­is­tic con­di­tion­als (as in re­in­force­ment learn­ing and log­i­cal in­duc­tor based de­ci­sion the­o­ries such as LIEDT/​LICDT and asymp­totic de­ci­sion the­ory), where coun­ter­fac­tu­als are pos­si­ble wor­lds as­signed a small but nonzero prob­a­bil­ity by the agent in which the agent takes a differ­ent ac­tion through “ex­plo­ra­tion”; note that ADT-style op­ti­mism is a type of ex­plo­ra­tion.
COEDT is a new way to solve 5 and 10. My best in­tu­itive un­der­stand­ing is that, whereas or­di­nary EDT (us­ing or­di­nary re­flec­tive or­a­cles) seeks any equil­ibrium be­tween be­liefs and policy, COEDT speci­fi­cally seeks a not-ex­tremely-un­sta­ble equil­ibrium (though not nec­es­sar­ily one that is sta­ble in the sense of dy­nam­i­cal sys­tems), where the equil­ibrium is “jus­tified” by the fact that there are ar­bi­trar­ily close al­most-equil­ibria. This is similar to trem­bling hand perfect equil­ibrium. To the ex­tent that COEDT has coun­ter­fac­tu­als, they are these wor­lds where the or­a­cle dis­tri­bu­tion is not ac­tu­ally re­flec­tive but is very close to the ac­tual or­a­cle dis­tri­bu­tion, and in which the agent takes a sub­op­ti­mal ac­tion with very small prob­a­bil­ity.

Based on my pic­ture, I think COEDT be­longs in the modal UDT class. Both pro­pos­als can be seen as a spe­cial sort of ex­plo­ra­tion where we ex­plore if we are in a non­stan­dard model. Mo­dal UDT ex­plores if PA is in­con­sis­tent. COEDT ex­plores if a ran­domly sam­pled pos­i­tive real in the unit in­ter­val hap­pens to be less than some non­stan­dard ep­silon. :)

(Note that de­scribing them in this way is a lit­tle mis­lead­ing, since it makes them sound un­com­putable. Mo­dal UDT in par­tic­u­lar is quite com­putable, if the de­ci­sion prob­lem has the right form and if we are happy to as­sume that PA is con­sis­tent.)

I’ll be cu­ri­ous to see how well this anal­ogy holds up. Will COEDT have fun­da­men­tally new be­hav­ior in some sense?

More thoughts to fol­low in Part 2.