So… what’s the deal with coun­ter­fac­tu­als?

Over the past cou­ple of years, I’ve been writ­ing about the CDT=EDT per­spec­tive. I’ve now or­ga­nized those posts into a se­quence for easy read­ing.

I call CDT=EDT a “per­spec­tive” be­cause it is a way of con­sis­tently an­swer­ing ques­tions about what coun­ter­fac­tu­als are and how they work. At times, I’ve ar­gued strongly that it is the cor­rect way. That’s ba­si­cally be­cause:

  • it has been the only co­her­ent frame­work I put any stock in (more for lack of other pro­pos­als for deal­ing with log­i­cal coun­ter­fac­tu­als than for an abun­dance of bad ones);

  • there are strong ar­gu­ments for it, if you’re will­ing to make cer­tain as­sump­tions;

  • it would be awfully nice to set­tle this whole ques­tion of coun­ter­fac­tual rea­son­ing and move on. CDT=EDT is in a sense the most bor­ing pos­si­ble an­swer, IE that all ap­proaches we’ve thought of are es­sen­tially equiv­a­lent and there’s no hope for any­thing bet­ter.

How­ever, re­cently I’ve re­al­ized that there’s a per­spec­tive which unifies even more ap­proaches, while be­ing less bor­ing (more op­ti­mistic about coun­ter­fac­tual rea­son­ing helping us to do well in de­ci­sion-the­o­retic prob­lems). It’s been right in front of me the whole time, but I was blind to it due to the way I fac­tored the prob­lem of for­mu­lat­ing de­ci­sion the­ory. It sug­gests a re­search di­rec­tion for mak­ing progress in our un­der­stand­ing of coun­ter­fac­tu­als; I’ll try to in­di­cate some open cu­ri­osi­ties of mine by the end.

Three > Two

The claim I’ll be elab­o­rat­ing on in this post is, es­sen­tially, that the frame­work in Jes­sica Tay­lor’s post about mem­o­ryless carte­sian en­vi­ron­ments is bet­ter than the CDT=EDT way of think­ing. You’ll have to read the post to get the full pic­ture if you haven’t, but to briefly sum­ma­rize: if we for­mal­ize de­ci­sion prob­lems in a frame­work which Jes­sica Tay­lor calls “mem­o­ryless carte­sian en­vi­ron­ments” (which we can call “mem­o­ryless POMDPs” if we want to be closer to aca­demic CS/​ML ter­minol­ogy), rea­son­ing about an­thropic un­cer­tainty in a cer­tain way (via the self-in­di­ca­tion as­sump­tion, SIA for short) makes it pos­si­ble for CDT to be­have like UDT.

The re­sult there is some­times ab­bre­vi­ated as UDT=CDT+SIA, al­though UDTCDT+SIA is more ac­cu­rate, be­cause the op­ti­mal UDT poli­cies are a sub­set of the poli­cies which CDT+SIA can fol­low. This is be­cause UDT has self-co­or­di­na­tion power which CDT+SIA lacks. (We could say UDT=CDT+SIA+co­or­di­na­tion, but un­for­tu­nately “co­or­di­na­tion” lacks a snappy three-let­ter acronym. Or, to be even more pedan­tic, we could say that UDT1.0 = CDT+SIA, and UDT1.1 = CDT+SIA+co­or­di­na­tion. (The differ­ence be­tween 1.0 and 1.1 is, af­ter all, the pres­ence of global policy co­or­di­na­tion.)) [EDIT: This isn’t cor­rect. See Wei Dai’s com­ment.]

Cas­par Oester­held com­mented on that post with an analo­gous EDT+SSA re­sult. SSA (the self-sam­pling as­sump­tion) is one of the main con­tenders beside SIA for cor­rect an­thropic rea­son­ing. Cas­par’s com­ment shows that we can think of the cor­rect an­throp­ics as a func­tion of your prefer­ence be­tween CDT and EDT. So, we could say that CDT+SIA = EDT+SSA = UDT1.0; or, CDT=EDT=UDT for short. [EDIT: As per Wei Dai’s com­ment, the equa­tion “CDT+SIA = EDT+SSA = UDT1.0” is re­ally not cor­rect due to differ­ing co­or­di­na­tion strengths; as he put it, UDT1.0 > EDT+SSA > CDT+SIA.]

My CDT=EDT view came from be­ing pedan­tic about how de­ci­sion prob­lems are rep­re­sented, and notic­ing that when you’re pedan­tic, it be­comes awfully hard to drive a wedge be­tween CDT and EDT; you’ve got to do things which are strange enough that it be­comes ques­tion­able whether it’s a fair com­par­i­son be­tween CDT and EDT. How­ever, I didn’t no­tice the ex­tent to which my “be­ing very care­ful about the rep­re­sen­ta­tion” was re­ally in­sist­ing that bayes nets are the proper rep­re­sen­ta­tion.

(Aside: Bayes nets which are rep­re­sent­ing de­ci­sion prob­lems are usu­ally called in­fluence di­a­grams rather than Bayes nets. I think this con­ven­tion is silly; why do we need a spe­cial term for that?)

It is rather cu­ri­ous that LIDT also illus­trated CDT=EDT-style be­hav­ior. It is part of what made me feel like CDT=EDT was a con­ver­gent re­sult of many differ­ent ap­proaches, rather than notic­ing its re­li­ance on cer­tain Bayes-net for­mu­la­tions of de­ci­sion prob­lems. Now, I in­stead find it to be cu­ri­ous and re­mark­able that log­i­cal in­duc­tion seems to think as if the world were made of bayes nets.

If CDT=EDT comes from in­sist­ing that de­ci­sion prob­lems are rep­re­sented as Bayes nets, CDT=EDT=UDT is the view which comes from in­sist­ing that de­ci­sion prob­lems be rep­re­sented as mem­o­ryless carte­sian en­vi­ron­ments. At the mo­ment, this just seems like a bet­ter way to be pedan­tic about rep­re­sen­ta­tion. It unifies three de­ci­sion the­o­ries in­stead of two.

Up­date­less­ness Doesn’t Fac­tor Out

In fact, I thought about Jes­sica’s frame­work fre­quently, but I didn’t think of it as an ob­jec­tion to my CDT=EDT way of think­ing. I was blind to this ob­jec­tion be­cause I thought (log­i­cal-)coun­ter­fac­tual rea­son­ing and (log­i­cally-)up­date­less rea­son­ing could be dealt with as sep­a­rate prob­lems. The claim was not that CDT=EDT-style de­ci­sion-mak­ing did well, but rather, that any de­ci­sion prob­lem where it performed poorly could be an­a­lyzed as a case where up­date­less rea­son­ing is needed in or­der to do well. I let my coun­ter­fac­tual rea­son­ing be sim­ple, blam­ing all the hard prob­lems on the difficulty of log­i­cal up­date­less­ness.

Once I thought to ques­tion this view, it seemed very likely wrong. The Dutch Book ar­gu­ment for CDT=EDT seems closer to the true jus­tifi­ca­tion for CDT=EDT rea­son­ing than the Bayes-net ar­gu­ment, but the Dutch Book ar­gu­ment is a dy­namic con­sis­tency ar­gu­ment. I know that CDT and EDT both vi­o­late dy­namic con­sis­tency, in gen­eral. So, why pick on one spe­cial type of dy­namic con­sis­tency vi­o­la­tion which CDT can illus­trate but EDT can­not? In other words, the grounds on which I can ar­gue CDT=EDT seem to point more di­rectly to UDT in­stead.

What about all those ar­gu­ments for CDT=EDT?

Non-Zero Prob­a­bil­ity Assumptions

I’ve noted be­fore that each ar­gu­ment I make for CDT=EDT seems to rely on an as­sump­tion that ac­tions have non-zero prob­a­bil­ity. I leaned heav­ily on an as­sump­tion of ep­silon ex­plo­ra­tion, al­though one could also ar­gue that all ac­tions must have non-zero prob­a­bil­ity on differ­ent grounds (such as the im­plau­si­bil­ity of know­ing so much about what you are go­ing to do that you can com­pletely rule out any ac­tion, be­fore you’ve made the de­ci­sion). Fo­cus­ing on cases where we have to as­sign prob­a­bil­ity zero to some ac­tion was a big part of fi­nally break­ing my­self of the CDT=EDT view and mov­ing to the CDT=EDT=UDT view.

(I was al­most bro­ken of the view about a year ago by think­ing about the XOR black­mail prob­lem, which has fea­tures in com­mon with the case I’ll con­sider now; but, it didn’t stick, per­haps be­cause the ex­am­ple doesn’t ac­tu­ally force ac­tions to have prob­a­bil­ity zero and so doesn’t point so di­rectly to where the ar­gu­ments break down.)

Con­sider the trans­par­ent New­comb prob­lem with a perfect pre­dic­tor:

Trans­par­ent New­comb. Omega runs a perfect simu­la­tion of you, in which you face two boxes, a large box and a small box. Both boxes are made of trans­par­ent glass. The small box con­tains $100, while the large one con­tains $1,000. In the Si­mu­la­tion, Omega gives you the op­tion of ei­ther tak­ing both boxes or only tak­ing the large box. If Omega pre­dicts that you will take only one box, then Omega puts you in this situ­a­tion for real. Other­wise, Omega gives the real you the same de­ci­sion, but with the large box empty. You find your­self in front of two full boxes. Do you take one, or two?

Ap­par­ently, since Omega is a perfect pre­dic­tor, we are forced to as­sign prob­a­bil­ity zero to one-box­ing even if we fol­low a policy of ep­silon-ex­plor­ing. In fact, if you im­ple­ment ep­silon-ex­plo­ra­tion by re­fus­ing to take any ac­tion which you’re very con­fi­dent you’ll take (you have a hard-coded re­sponse: if P(“I do ac­tion X”)>1-ep­silon, do any­thing but X), which is how I of­ten like to think about it, then you are forced to 2-box in trans­par­ent New­comb. I was ex­pect­ing CDT=EDT type rea­son­ing to 2-box (at which point I’d say “but we can fix that by be­ing up­date­less”), but this is a re­ally weird rea­son to 2-box.

Still, that’s not in it­self an ar­gu­ment against CDT=EDT. Maybe the rule that we can’t take ac­tions we’re over­con­fi­dent in is at fault. The ar­gu­ment against CDT=EDT style coun­ter­fac­tu­als in this prob­lem is that the agent should ex­pect that if it 2-boxes, then it won’t ever be in the situ­a­tion to be­gin with; at least, not in the real world. As dis­cussed some­what in the happy dance prob­lem, this breaks im­por­tant prop­er­ties that you might want out of con­di­tion­ing on con­di­tion­als. (There are some in­ter­est­ing con­se­quences of this, but they’ll have to wait for a differ­ent post.) More im­por­tantly for the CDT=EDT ques­tion, this can’t fol­low from ev­i­den­tial con­di­tion­ing, or learn­ing about con­se­quences of ac­tions through ep­silon-ex­plo­ra­tion, or any other prin­ci­ples in the CDT=EDT cluster. So, there would at least have to be other prin­ci­ples in play.

A very nat­u­ral way of deal­ing with the prob­lem is to rep­re­sent the agent’s un­cer­tainty about whether it is in a simu­la­tion. If you think you might be in Omega’s simu­la­tion, ob­serv­ing a full box doesn’t im­ply cer­tainty about your own ac­tion any­more, or even about whether the box is re­ally full. This is ex­actly how you deal with the prob­lem in mem­o­ryless carte­sian en­vi­ron­ments. But, if we are will­ing to do this here, we might as well think about things in the mem­o­ryless carte­sian frame­work all over the place. This con­tra­dicts the CDT=EDT way of think­ing about things in lots of prob­lems where up­date­less rea­son­ing gives differ­ent an­swers than up­date­full rea­son­ing, such as coun­ter­fac­tual mug­ging, rather than only in cases where some ac­tion has prob­a­bil­ity zero.

(I should ac­tu­ally say “prob­lems where up­date­less rea­son­ing gives differ­ent an­swers than non-an­thropic up­date­ful rea­son­ing”, since the whole point here is that up­date­ful rea­son­ing can be con­sis­tent with up­date­less rea­son­ing so long as we take an­throp­ics into ac­count in the right way.)

I also note that try­ing to rep­re­sent this prob­lem in bayes nets, while pos­si­ble, is very awk­ward and dis­satis­fy­ing com­pared to the rep­re­sen­ta­tion in mem­o­ryless carte­sian en­vi­ron­ments. You could say I shouldn’t have got­ten my­self into a po­si­tion where this felt like sig­nifi­cant ev­i­dence, but, re­li­ant on Bayes-net think­ing as I was, it did.

Ok, so, look­ing at ex­am­ples which force ac­tions to have prob­a­bil­ity zero made me re­vise my view even for cases where ac­tions all have non-zero prob­a­bil­ity. So again, it makes sense to ask: but what about the ar­gu­ments in fa­vor of CDT=EDT?

Bayes Net Struc­ture Assumptions

The ar­gu­ment in the bayes net set­ting makes some as­sump­tions about the struc­ture of the Bayes net, illus­trated ear­lier. Where do those go wrong?

In the Bayes net set­ting, ob­ser­va­tions are rep­re­sented as par­ents of the epistemic state (which is a par­ent of the ac­tion). To rep­re­sent the de­ci­sion con­di­tional on an ob­ser­va­tion, we con­di­tion on the ob­ser­va­tion be­ing true. This stops us from putting some prob­a­bil­ity on our ob­ser­va­tions be­ing false due to us be­ing in a simu­la­tion, as we do in the mem­o­ryless carte­sian setup.

In other words: the CDT=EDT setup makes it im­pos­si­ble to up­date on some­thing and still have ra­tio­nal doubt in it, which is what we need to do in or­der to have an up­date­ful DT act like UDT.

There’s likely some way to fix this while keep­ing the Bayes-net for­mal­ism. How­ever, mem­o­ryless carte­sian en­vi­ron­ments model it nat­u­rally.

Ques­tion: how can we model mem­o­ryless carte­sian en­vi­ron­ments in Bayes nets? Can we do this in a way such that the CDT=EDT the­o­rem ap­plies (mak­ing the CDT=EDT way of think­ing com­pat­i­ble with the CDT=EDT=UDT way of think­ing)?

CDT Dutch Book

What about the Dutch-book ar­gu­ment for CDT=EDT? I’m not quite sure how this one plays out. I need to think more about the set­ting in which the Dutch-book can be car­ried out, es­pe­cially as it re­lates to an­thropic prob­lems and an­thropic Dutch-books.

Learn­ing Theory

I said that I think the Dutch-book ar­gu­ment gets closer to the real rea­son CDT=EDT seems com­pel­ling than the Bayes-net pic­ture does. Well, al­though the Dutch Book ar­gu­ment against CDT gives a crisp jus­tifi­ca­tion of a CDT=EDT view, I felt the learn­ing-the­o­retic in­tu­itions which lead me to for­mu­late the dutch book are closer to the real story. It doesn’t make sense to ask an agent to have good coun­ter­fac­tu­als in any sin­gle situ­a­tion, be­cause the agent may be ig­no­rant about how to rea­son about the situ­a­tion. How­ever, any er­rors in coun­ter­fac­tual rea­son­ing which re­sult in ob­served con­se­quences pre­dictably differ­ing from coun­ter­fac­tual ex­pec­ta­tions should even­tu­ally be cor­rected.

I’m still in the dark about how this ar­gu­ment con­nects to the CDT=EDT=UDT pic­ture, just as with the Dutch-book ar­gu­ment. I’ll dis­cuss this more in the next sec­tion.

Static vs Dynamic

A big up­date in my think­ing re­cently has been to cluster frame­works into “static” and “dy­namic”, and ask how to trans­late back and forth be­tween static and dy­namic ver­sions of par­tic­u­lar ideas. Clas­si­cal de­ci­sion the­ory has a strong ten­dency to think in terms of stat­i­cally given de­ci­sion prob­lems. You could say that the epistemic prob­lem of figur­ing out what situ­a­tion you’re in is as­sumed to fac­tor out: de­ci­sion the­ory deals only with what to do once you’re in a par­tic­u­lar situ­a­tion. On the other hand, learn­ing the­ory deals with more “dy­namic” no­tions of ra­tio­nal­ity: ra­tio­nal­ity-as-im­prove­ment-over-time, rather than an ab­solute no­tion of perfect perfor­mance. (For our pur­poses, “time” in­cludes log­i­cal time; even in a sin­gle-shot game, you can learn from rele­vantly similar games which play out in thought-ex­per­i­ment form.)

This is a messy dis­tinc­tion. Here are a few choice ex­am­ples:

Static ver­sion: Dutch-book and money-pump ar­gu­ments.

Dy­namic ver­sion: Re­gret bounds.

Dutch-book ar­gu­ments rely on the idea that you shouldn’t ever be able to ex­tract money from a ra­tio­nal gam­bler with­out a chance of los­ing it in­stead. Re­gret bounds in learn­ing the­ory offer a more re­laxed prin­ci­ple, that you can’t ever ex­tract too much money (for some no­tion of “too much” given by the par­tic­u­lar re­gret bound). The more re­laxed con­di­tion is more broadly ap­pli­ca­ble; Dutch-book ar­gu­ments only give us the prob­a­bil­is­tic ana­log of log­i­cal con­sis­tency prop­er­ties, whereas re­gret bounds give us in­duc­tive learn­ing.

Static: Prob­a­bil­ity the­ory.

Dy­namic: Log­i­cal in­duc­tion.

In par­tic­u­lar, the log­i­cal in­duc­tion crite­rion gives a no­tion of re­gret which im­plies a large num­ber of nice prop­er­ties. Typ­i­cally, the differ­ence be­tween log­i­cal in­duc­tion and clas­si­cal prob­a­bil­ity the­ory is framed as one of log­i­cal om­ni­science vs log­i­cal un­cer­tainty. The static-vs-dy­namic frame in­stead sees the crit­i­cal differ­ence as one of ra­tio­nal­ity in a static situ­a­tion (where it makes sense to think about perfect rea­son­ing) vs learn­ing-the­o­retic ra­tio­nal­ity (where it doesn’t make sense to ask for perfec­tion, and in­stead, one thinks in terms of re­gret bounds).

Static: Bayes-net de­ci­sion the­ory (ei­ther CDT or EDT as set up in the CDT=EDT ar­gu­ment).

Dy­namic: LIDT.

As I men­tioned be­fore, the way LIDT seems to nat­u­rally rea­son as if the world were made of Bayes nets now seems like a cu­ri­ous co­in­ci­dence rather than a con­ver­gent con­se­quence of cor­rect coun­ter­fac­tual con­di­tion­ing. I would like a bet­ter ex­pla­na­tion of why this hap­pens. Here is my think­ing so far:

  • Log­i­cal in­duc­tion lacks a way to ques­tion its per­cep­tion. As with the Bayes-net setup used in the CDT=EDT ar­gu­ment, to ob­serve some­thing is to think that thing is true. There is not a nat­u­ral way for log­i­cal in­duc­tion to rea­son an­throp­i­cally, es­pe­cially for in­for­ma­tion which comes in through the traders think­ing longer. If one of the traders calcu­lates digits of and bets ac­cord­ingly, this in­for­ma­tion is sim­ply known by the log­i­cal in­duc­tor; how can it en­ter­tain the pos­si­bil­ity that it’s in a simu­la­tion and the trader’s calcu­la­tion is be­ing mod­ified by Omega?

  • Log­i­cal in­duc­tion knows its own epistemic state to within high ac­cu­racy, as is as­sumed in the Bayes-net CDT=EDT the­o­rem.

  • LIDT makes the ac­tion a func­tion of the epistemic state alone, as re­quired.

There’s a lot of for­mal work one could do to try to make the con­nec­tion more rigor­ous (and look for places where the con­nec­tion breaks down!).

Static: UDT.

Dy­namic: ???

The prob­lem of log­i­cal up­date­less­ness has been a thorn in my side for some time now. UDT is a good re­ply to a lot of de­ci­sion-the­o­retic prob­lems when they’re framed in a prob­a­bil­ity-the­o­retic set­ting, but mov­ing to a log­i­cally un­cer­tain set­ting, it’s un­clear how to ap­ply UDT. UDT re­quires a fixed prior, whereas log­i­cal in­duc­tion gives us a pic­ture in which log­i­cal un­cer­tainty is fun­da­men­tally about how to re­vise be­liefs as you think longer.

The main rea­son the static-vs-dy­namic idea has been a big up­date for me is that I re­al­ized that a lot of my think­ing has been aimed at turn­ing log­i­cal un­cer­tainty into a “static” ob­ject, to be able to ap­ply UDT. I haven’t even posted about most of those ideas, be­cause they haven’t lead any­where in­ter­est­ing. Tsvi’s post on thin log­i­cal pri­ors is definitely an ex­am­ple, though. I now think this type of ap­proach is likely doomed to failure, be­cause the dy­namic per­spec­tive is sim­ply su­pe­rior to the static one.

The in­ter­est­ing ques­tion is: how do we trans­late UDT to a dy­namic per­spec­tive? How do we learn up­date­less be­hav­ior?

For all its flaws, tak­ing the dy­namic per­spec­tive on de­ci­sion the­ory feels like some­thing asymp­totic de­ci­sion the­ory got right. I have more to say about what ADT does right and wrong, but per­haps it is too much of an aside for this post.

A gen­eral strat­egy we might take to ap­proach that ques­tion is: how do we trans­late in­di­vi­d­ual things which UDT does right into learn­ing-the­o­retic desider­ata? (This may be more tractable than try­ing to trans­late the UDT op­ti­mal­ity no­tion into a learn­ing-the­o­retic desider­a­tum whole-hog.)

Static: Me­moryless Carte­sian de­ci­sion the­o­ries (CDT+SIA or EDT+SSA).

Dy­namic: ???

The CDT=EDT=UDT per­spec­tive on coun­ter­fac­tu­als is that we can ap­proach the ques­tion of learn­ing log­i­cally up­date­less be­hav­ior by think­ing about the learn­ing-the­o­retic ver­sion of an­thropic rea­son­ing. How do we learn which ob­ser­va­tions to take se­ri­ously? How do we learn about what to ex­pect sup­pos­ing we are be­ing fooled by a simu­la­tion? Some op­ti­mistic spec­u­la­tion on that is the sub­ject of the next sec­tion.

We Have the Data

Part of why I was pre­vi­ously very pes­simistic about do­ing any bet­ter than the CDT=EDT-style coun­ter­fac­tu­als was that we don’t have any data about coun­ter­fac­tu­als, al­most by defi­ni­tion. How are we sup­posed to learn what to coun­ter­fac­tu­ally ex­pect? We only ob­serve the real world.

Con­sider LIDT play­ing trans­par­ent New­comb with a perfect pre­dic­tor. Its be­lief that it will 1-box in cases where it sees that the large box is full must con­verge to 100%, be­cause it only ever sees a full box in cases where it does in­deed 1-box. Fur­ther­more, the ex­pected util­ity of 2-box­ing can be any­thing, since it will never see cases where it sees a full box and 2-boxes. This means I can make LIDT 1-box by de­sign­ing my LI to think 2-box­ing upon see­ing a full box will be catas­troph­i­cally bad: I sim­ply in­clude a trader with high ini­tial wealth who bets it will be bad. Similarly, I can make LIDT 2-box when­ever it sees the full box by in­clud­ing a trader who bets 2-box­ing will be great. Then, the LIDT will never see a full box ex­cept on rounds where it is go­ing to ep­silon-ex­plore into 1-box­ing.

(The above anal­y­sis de­pends on de­tails of how ep­silon ex­plo­ra­tion is im­ple­mented. If it is im­ple­mented via the prob­a­bil­is­tic chicken-rule, men­tioned ear­lier, mak­ing the agent ex­plore when­ever it is very con­fi­dent about which ac­tion it takes, then the situ­a­tion gets pretty weird. As­sume that LIDT is ep­silon-ex­plor­ing pseu­do­ran­domly in­stead.)

LIDT’s con­fi­dence that it 1-boxes when­ever it sees a full box is jar­ring, be­cause I’ve just shown that I can make it ei­ther 1-box or 2-box de­pend­ing on the un­der­ly­ing LI. In­tu­itively, an LIDT agent who 2-boxes upon see­ing the full box should not be near-100% con­fi­dent that it 1-boxes.

The prob­lem is that the cases where LIDT sees a full box and 2-boxes are all coun­ter­fac­tual, since Omega is a perfect pre­dic­tor and doesn’t show us a full box un­less we in fact 1-box. LIDT doesn’t learn from coun­ter­fac­tual cases; the ver­sion of the agent in Omega’s head is shut down when Omega is done with it, and never re­ports its ob­ser­va­tions back to the main unit.

(The LI does cor­rectly learn the math­e­mat­i­cal fact that its al­gorithm 2-boxes when in­put ob­ser­va­tions of a full box, but, this does not help it to have the in­tu­itively cor­rect ex­pec­ta­tions when Omega feeds it false sense-data.)

In the ter­minol­ogy of The Happy Dance Prob­lem, LIDT isn’t learn­ing the right ob­ser­va­tion-coun­ter­fac­tu­als: the pre­dic­tions about what ac­tion it takes given differ­ent pos­si­ble ob­ser­va­tions. How­ever, we have the data: the agent could simu­late it­self un­der al­ter­na­tive epistemic con­di­tions, and train its ob­ser­va­tion-coun­ter­fac­tu­als on what ac­tion it in fact takes in those con­di­tions.

Similarly, the ac­tion-coun­ter­fac­tu­als are wrong: LIDT can be­lieve any­thing about what hap­pens when it 2-boxes upon see­ing a full box. Again, we have the data: LI can ob­serve that on rounds when it is math­e­mat­i­cally true that the LIDT agent would have 2-boxed upon see­ing a full box, it doesn’t get the chance. This knowl­edge sim­ply isn’t be­ing “plugged in” to the de­ci­sion pro­ce­dure in the right way. Gen­er­ally speak­ing, an agent can ob­serve the real con­se­quences of coun­ter­fac­tual ac­tions, be­cause (1) the coun­ter­fac­tual ac­tion is a math­e­mat­i­cal fact of what the agent does un­der a coun­ter­fac­tual ob­ser­va­tion, and (2) the im­por­tant effects of this coun­ter­fac­tual ac­tion oc­cur in the real world, which we can ob­serve di­rectly.

This ob­ser­va­tion makes me much more op­ti­mistic about learn­ing in­ter­est­ing coun­ter­fac­tu­als. Pre­vi­ously, it seemed like by defi­ni­tion there would be no data from which to learn the cor­rect coun­ter­fac­tu­als, other than the (EDTish) re­quire­ment that they should match the ac­tual world for ac­tions ac­tu­ally taken. Now, it seems like I have not one, but two sources of data: the ob­ser­va­tion-coun­ter­fac­tu­als can be simu­lated out­right, and the ac­tion-coun­ter­fac­tu­als can be trained on what ac­tu­ally hap­pens when coun­ter­fac­tual ac­tions are taken.

I haven’t been able to plug these pieces to­gether to get a work­ing coun­ter­fac­tual-learn­ing al­gorithm yet. It might be that I’m still miss­ing a com­po­nent. But … it re­ally feels like there should be some­thing here.