Deconfusing Logical Counterfactuals

This post aims to provide a strong philo­soph­i­cal foun­da­tion for log­i­cal coun­ter­fac­tu­als, while sketch­ing out an in­for­mal scheme that will hope­fully be for­mal­ised fur­ther in the fu­ture. I be­lieve that pro­vid­ing such a philo­soph­i­cal foun­da­tion is im­por­tant for the same rea­sons that Sam listed in Mo­ti­vat­ing a Se­man­tics of Log­i­cal Coun­ter­fac­tu­als.

In­tro­duc­tory Material

  • Miri’s Func­tional De­ci­sion The­ory (FDT) pa­per defines sub­junc­tive de­pen­dence to re­fer to the situ­a­tion where two de­ci­sion pro­cess in­volve the same calcu­la­tion. This can re­sult from cau­sa­tion, such as if I calcu­late an equa­tion, then tell you the an­swer and we both im­ple­ment a de­ci­sion based on this sum. It can also oc­cur non-causally, such as a pre­dic­tion be­ing linked to a de­ci­sion as per New­comb’s prob­lem. The prob­lem of log­i­cal coun­ter­fac­tu­als can be char­ac­ter­ised as figur­ing out what pro­cesses sub­junc­tively de­pend on other pro­cesses, so that we can ap­ply FDT.

  • In the Co-op­er­a­tion Game, I ar­gued that log­i­cal coun­ter­fac­tu­als are more about your knowl­edge of the state of the world than the world it­self. Sup­pose there are two peo­ple who can choose A or B. Sup­pose that a pre­dic­tor knows that both peo­ple will choose A con­di­tional on them be­ing told one of the fol­low­ing two facts a) the other per­son will choose A b) the other per­son will choose the same as you. Then whether your de­ci­sion is mod­el­led to sub­junc­tively de­pend on the other per­son de­pends on which of the two facts you are told. Go­ing fur­ther than the origi­nal post, one might be told a) and the other b), so that the first sees them­selves as not sub­junc­tively linked to the sec­ond, while the sec­ond sees them­selves as sub­junc­tively linked to the first.

  • “Your knowl­edge of the state of the world” can be ex­pli­cated as be­ing about in­ter­nally con­sis­tent coun­ter­fac­tu­als, which we’ll la­bel Raw Coun­ter­fac­tu­als. When there are mul­ti­ple raw coun­ter­fac­tu­als con­sis­tent with your state of knowl­edge, you can pick the one with the high­est util­ity.

  • How­ever, there will also be cases where only a sin­gle coun­ter­fac­tual is con­sis­tent with your state of knowl­edge, which re­sults in a rather triv­ial prob­lem. Con­sider for ex­am­ple Trans­par­ent New­comb’s Prob­lem, where a perfect pre­dic­tor places the mil­lion in a trans­par­ent box if and only if it pre­dicts that you will one-box if it does. If you see the mil­lion, you know that you must have one-boxed so it doesn’t strictly make sense to ask what you should do in this situ­a­tion. In­stead, we’ll have to ask some­thing slightly differ­ent in­stead. So, I’ve slightly mod­ified my po­si­tion since writ­ing the Co-op­er­a­tion Game: in some situ­a­tions log­i­cal coun­ter­fac­tu­als will be defined rel­a­tive to an imag­ined, rather than ac­tual epistemic state. We will con­struct these states by eras­ing some in­for­ma­tion as de­scribed later in the post.

  • Other de­gen­er­a­tive cases in­clude when you already know what de­ci­sion you’ll make or when you have the abil­ity to figure it out. For ex­am­ple, when you have perfect knowl­edge of the en­vi­ron­ment and the agent, un­less you run into is­sues with un­prov­abil­ity. Note that de­gen­er­acy is more com­mon than you might think since know­ing, for ex­am­ple, that it is a util­ity max­imiser, tells you its ex­act be­havi­our in situ­a­tions with­out op­tions that are tied. Again, in these cases, the an­swer to the ques­tion, “What should the agent do?” is, “The only ac­tion con­sis­tent with the prob­lem state­ment”. How­ever, as we’ll see, it is pos­si­ble some­times to make these ques­tions less triv­ial if you’re will­ing to ac­cept a slight tweak to the prob­lem state­ment.

  • We’ll con­sider two kinds of prob­lems, ac­knowl­edg­ing that these aren’t the only types. In ex­ter­nal prob­lems, we imag­ine de­ci­sions from the per­spec­tive of a the­o­ret­i­cal, un­bounded, non-em­bed­ded ob­server who ex­ists out­side of the prob­lem state­ment. Clearly we can’t fully adopt the per­spec­tive of such an ex­ter­nal agent, but de­scribing the high-level de­tails will usu­ally suffice. Crit­i­cally, in the ex­ter­nal per­spec­tive, the ob­server can have goals, such as choos­ing the agent with the max­i­mum util­ity, with­out those be­ing the goals of the agent within the prob­lem.

  • In (fully) re­flec­tive prob­lems, we imag­ine de­ci­sions from the per­spec­tive of an agent con­sid­er­ing its own de­ci­sions or po­ten­tial de­ci­sions with full knowl­edge of its own source code. Th­ese prob­lems will com­pli­cate the coun­ter­fac­tu­als since the agent’s goals limit the kind of agent that it could be. For ex­am­ple, an agent that wants to max­imise util­ity should only search over the pos­si­bil­ity space where it is a util­ity max­imiser.

  • Mak­ing this dis­tinc­tion more ex­plicit: An ex­ter­nal prob­lem would ask “What de­ci­sion max­imises util­ity?“, as op­posed to a re­flec­tive prob­lem which asks: “What de­ci­sion max­imises util­ity for a util­ity max­imiser?“. This dis­tinc­tion will mainly be im­por­tant here in terms of when it makes a prob­lem triv­ial or not.

  • The ex­ter­nal/​re­flec­tive dis­tinc­tion is very similar to the differ­ence be­tween em­bed­ded and non-em­bed­ded prob­lems, but ex­ter­nal prob­lems can in­clude em­bed­ded agents, just from the per­spec­tive of a non-em­bed­ded agent. So we can do a sur­pris­ing amount of our the­o­ris­ing from within the ex­ter­nal per­spec­tive.

Raw Counterfactuals

  • Raw coun­ter­fac­tu­als are pro­duced as fol­lows: Start­ing with the ter­ri­to­rywe use some pro­cess to pro­duce a causal model. We can then imag­ine con­struct­ing differ­ent mod­els by switch­ing out or al­ter­ing parts of the model. Th­ese rep­re­sent a co­her­ent con­cept seper­ate from any dis­cus­sion of de­ci­sions. In so far as we care about what could have been, we need to ul­ti­mately re­late our claims to raw coun­ter­fac­tu­als as in­con­sis­tent mod­els could not have been.

  • Causal De­ci­sion The­ory uses its own no­tion of coun­ter­fac­tu­als, which we’ll term De­ci­sion Coun­ter­fac­tu­als. Th­ese are cre­ated by perform­ing world surgery on mod­els of causal pro­cesses. Un­like raw coun­ter­fac­tu­als, de­ci­sion coun­ter­fac­tu­als are in­con­sis­tent. Sup­pose that you ac­tu­ally defect in the Pri­soner’s Dilemma, but we are con­sid­er­ing the coun­ter­fac­tual where you co­op­er­ate. Up un­til the point of the de­ci­sion you are the kind of per­son who defects, but when we ar­rive at the de­ci­sion, you mag­i­cally co­op­er­ate.

  • De­ci­sion coun­ter­fac­tu­als are use­ful be­cause they ap­prox­i­mate raw coun­ter­fac­tu­als. Perform­ing world surgery all the way back in time re­quires would re­quire a lot of work. In the­o­ret­i­cal de­ci­sion prob­lems it is usu­ally easy to imag­ine a raw coun­ter­fac­tual that would match the prob­lem de­scrip­tion and provide the same an­swer as the de­ci­sion coun­ter­fac­tual. In prac­ti­cal de­ci­sion prob­lems, we don’t have the data to do this.

  • Un­for­tu­nately, this ap­prox­i­ma­tion breaks down when perform­ing world surgery to make a coun­ter­fac­tual de­ci­sion con­sis­tent with the past re­quires us to change an el­e­ment of the en­vi­ron­ment that is im­por­tant for the spe­cific prob­lem. For ex­am­ple, in New­comb’s prob­lem, chang­ing your cur­rent de­ci­sion re­quires chang­ing your past self which in­volves chang­ing a pre­dic­tor which is con­sid­ered part of the en­vi­ron­ment. In this case, it makes sense to fall back to raw coun­ter­fac­tu­als and build a new de­ci­sion the­ory on top.

  • Func­tional de­ci­sion the­ory, as nor­mally char­ac­ter­ised, is closer to this ideal as world surgery isn’t just performed on your de­ci­sion, but also on all de­ci­sions that sub­junc­tively de­pend on you. This re­moves a bunch of in­con­sis­ten­cies, how­ever, we’ve still in­tro­duced an in­con­sis­tency by as­sum­ing that f(x)=b when f(x) re­ally equals a. The raw coun­ter­fac­tual ap­proach pro­vides a stronger foun­da­tion be­cause it avoids this is­sue. How­ever, since proof-based FDT is very effec­tive at han­dling re­flec­tive prob­lems, it would be worth­while re­build­ing it upon this new foun­da­tion.

  • Let’s con­sider New­comb’s prob­lem from the ex­ter­nal per­spec­tive. The ex­ter­nal ob­server is try­ing to max­imise util­ity rather than the agent within the prob­lem, so there is no re­stric­tion on whether the agent can one-box or two-box whilst be­ing con­sis­tent with the prob­lem state­ment. We can then im­me­di­ately ob­serve that if we use raw coun­ter­fac­tu­als, the re­ward only de­pends on the agent’s ul­ti­mate de­ci­sion and agents that one-box score bet­ter than those who don’t. Sim­ple cases like this which al­low mul­ti­ple con­sis­tent coun­ter­fac­tu­als don’t re­quire era­sure.

  • On the other hand, there are prob­lems which only al­low a sin­gle raw coun­ter­fac­tual and hence re­quire us to tweak the prob­lem to make it well-defined. Con­sider, for ex­am­ples, Trans­par­ent New­comb’s, where if you see money in the box, you know that you will re­ceive ex­actly $1 mil­lion. Some peo­ple say this fails to ac­count for the agent in the simu­la­tor, but it’s en­tirely pos­si­ble that Omega may be able to figure out what ac­tion you will take based on high level rea­son­ing, as op­posed hav­ing to run a com­plete simu­la­tion of you. We’ll de­scribe later a way of tweak­ing the prob­lem state­ment into some­thing that is both con­sis­tent and non-triv­ial.

Gen­eral Ap­proach to Log­i­cal Counterfactuals

  • We will now at­tempt to pro­duce a more solid the­ory of ex­ter­nal prob­lems us­ing FDT. This will al­low us to in­ter­pret de­ci­sion prob­lems where only one de­ci­sion is con­sis­tent with the prob­lem state­ment in a non-triv­ial way.

  • FDT frames log­i­cal coun­ter­fac­tu­als as “What would the world be like if f(x)=b in­stead of a?” which doesn’t strictly make sense as noted in the dis­cus­sion on raw coun­ter­fac­tu­als. Two points: a) I think it should be clear that this ques­tion only makes sense in terms of think­ing of per­tur­ba­tions of the map and not as a di­rect claim about the ter­ri­tory (see map and ter­ri­tory). b) We’ll ad­dress this prob­lem by propos­ing a differ­ent ap­proach for foun­da­tions, which these proof-based ap­proaches should ul­ti­mately be jus­tified in terms of.

  • There are two pos­si­ble paths to­wards a more con­sis­tent the­ory of log­i­cal coun­ter­fac­tu­als for these situ­a­tions. In both cases we in­ter­pret the ques­tion of what it would mean to change the out­put of a func­tion as an in­for­mal de­scrip­tion of a similar ques­tion that is ac­tu­ally well-defined. The first ap­proach is see what con­se­quences can be log­i­cally de­duced from f(x)=b while im­ple­ment­ing a strat­egy to pre­vent us from de­duc­ing in­cor­rect state­ments from the in­con­sis­ten­cies. This is of­ten done by play­ing chicken with the uni­verse. We will term this a para­con­sis­tent ap­proach, even though it doesn’t ex­plic­itly make use of para­con­sis­tent logic as it is para­con­sis­tent in spirit.

  • An al­ter­na­tive ap­proach would be to in­ter­pret this sen­tence as mak­ing claims about raw coun­ter­fac­tu­als. In FDT terms, the raw coun­ter­fac­tual ap­proach finds an f’ such that f’(x)=b and also with cer­tain as of yet un­stated similar­i­ties to f and sub­sti­tute this into all sub­junc­tively linked pro­cesses. The para­con­sis­tent ap­proach is eas­ier to do in­for­mally, but I sus­pect that the raw coun­ter­fac­tu­ally ap­proach would be more amenable to for­mal­i­sa­tion and pro­vides more philo­soph­i­cal in­sight into what is ac­tu­ally go­ing on. In so far as the para­con­sis­tent ap­proach may be more con­ve­nient for an im­ple­men­ta­tion per­spec­tive than the first, we can jus­tify it by ty­ing it to raw coun­ter­fac­tu­als.


  • Now that we’ve out­lined the broad ap­proach, we should dig more into the ques­tion of what ex­actly it means to make a de­ci­sion. As I ex­plained in a pre­vi­ous post, there’s a sense in which you don’t so much ‘make’ a de­ci­sion as im­ple­ment one. If you make some­thing, it im­plies that it didn’t ex­ist be­fore and now it does. In the case of de­ci­sions, it nudges you to­wards be­liev­ing that the de­ci­sion you were go­ing to im­ple­ment wasn’t set, then you made a de­ci­sion, and then it was. How­ever, when “you” and the en­vi­ron­ment are defined down to the atom, you can only im­ple­ment one de­ci­sion. It was always the case from the start of time that you were go­ing to im­ple­ment that de­ci­sion.

  • We note that if you have perfect in­for­ma­tion about the agent and the en­vi­ron­ment, you need to for­get or at least pre­tend to for­get some in­for­ma­tion about the agent so that we can pro­duce coun­ter­fac­tual ver­sions of the agent who de­cide slightly differ­ently. See Sh­minux’s post on Log­i­cal Coun­ter­fac­tu­als are Low Res for a similar ar­gu­ment, but framed slightly differ­ently. The key differ­ence is that I’m not sug­gest­ing just adding noise to the model, but for­get­ting spe­cific in­for­ma­tion that doesn’t af­fect the out­come.

  • In Trans­par­ent New­comb’s, it would be nat­u­ral to erase the knowl­edge that the box is full. This would then re­sult in two coun­ter­fac­tu­als: a) the one where the agent sees an empty box and two boxes, b) the one where the agent sees a full box and one boxes. It would be nat­u­ral to re­lax the scope of who we care about from the agent who sees a mil­lion in the box to the agent at the end of the prob­lem re­gard­less of what they see. If we do so, we then have a proper de­ci­sion prob­lem and we can see that one-box­ing is bet­ter.

  • Ac­tu­ally there’s a slight hitch here. In or­der to define the out­come an agent re­ceives, we need to define what the pre­dic­tor will pre­dict when an agent sees the box con­tain­ing the mil­lion. But it is im­pos­si­ble to place a two-boxer in this situ­a­tion. We can re­solve this by defin­ing the pre­dic­tor as simu­lat­ing the agent re­spond­ing to an in­put rep­re­sent­ing an in­con­sis­tent situ­a­tion as I’ve de­scribed in Coun­ter­fac­tu­als for Perfect Pre­dic­tors.

  • In or­der to imag­ine a con­sis­tent world, when we imag­ine a differ­ent “you”, we must also imag­ine the en­vi­ron­ment in­ter­act­ing with that differ­ent “you” so that, for ex­am­ple, the pre­dic­tor makes a differ­ent pre­dic­tion. Causal de­ci­sion the­o­rists con­struct these coun­ter­fac­tu­als in­cor­rectly and hence they be­lieve that they can change their de­ci­sion with­out chang­ing the pre­dic­tion. They fail to re­al­ise that they can’t ac­tu­ally “change” their de­ci­sion as there is a sin­gle de­ci­sion that they will in­evitably im­ple­ment. I sug­gest re­plac­ing “change a de­ci­sion” with “shift coun­ter­fac­tu­als” when it is im­por­tant to be able to think clearly about these top­ics. It also clar­ifies why the pre­dic­tion can change with­out back­wards cau­sa­tion (my pre­vi­ous post on New­comb’s prob­lem con­tains fur­ther ma­te­rial on why this isn’t an is­sue).


  • Here’s how the era­sure may pro­ceed for the ex­am­ple of Trans­par­ent’s New­comb Prob­lem. Sup­pose we erase all in­for­ma­tion about what de­ci­sion the agent is go­ing to make. This also re­quires eras­ing the fact that you see a mil­lion in the trans­par­ent box. Then we look at all coun­ter­fac­tu­ally pos­si­ble agents and no­tice that the re­ward de­pends only on whether you are an agent who ul­ti­mately one-boxes or an agent who two-boxes. Those who one-box see the mil­lion and then re­ceive it, those who two-box see no money in the trans­par­ent box and re­ceive $1000 only. The coun­ter­fac­tual in­volv­ing one-box­ing performs bet­ter than that in­volv­ing two-box­ing, so we en­dorse one-box­ing.

  • Things be­come more com­pli­cated if we want to erase less in­for­ma­tion about the agent. For ex­am­ple, we might want an agent to know that it is a util­ity max­imiser as this might be rele­vant to eval­u­at­ing the out­come the agent will re­ceive from fu­ture de­ci­sions. Sup­pose that if you one-box in Trans­par­ent New­comb’s you’ll then be offered a choice of $1000 or $2000, but if you two-box you’ll be offered $0 or $10 mil­lion. We can’t naively erase all in­for­ma­tion about your de­ci­sion pro­cess in or­der to com­pute a coun­ter­fac­tual of whether you should one-box or two-box. Other­wise, we end up with a situ­a­tion where, for ex­am­ple, there are agents who one-box and get differ­ent re­wards. Here the eas­iest solu­tion is to “col­lapse” all of the de­ci­sions and ask about a policy in­stead that cov­ers all three de­ci­sions that may be faced. Then we can calcu­late the ex­pec­ta­tions with­out pro­duc­ing any in­con­sis­ten­cies in the coun­ter­fac­tu­als as it then be­comes safe to erase the knowl­edge that it is a util­ity max­imiser.

  • The 5 and 10 prob­lem doesn’t oc­cur with the for­get­ting ap­proach. Com­pare: If you keep the be­lief that you are a util­ity max­imiser then the only choice you can im­ple­ment is 10 so we don’t have a de­ci­sion prob­lem. We can define all the pos­si­ble strate­gies as fol­lows, p prob­a­bil­ity of choos­ing 5 and 1-p prob­a­bil­ity of choos­ing 10, so for­get ev­ery­thing about your­self ex­cept that you are one of these strate­gies. There’s no down­side as there is no need for an agent to know whether or not it is a util­ity max­imiser. So we can solve the 5 and 10 prob­lem with­out ep­silon ex­plo­ra­tion.

  • When will we be able to use this for­get­ting tech­nique? One ini­tially thought might be the same scope FDT is de­signed to be op­ti­mal on—prob­lems where the re­ward only de­pends on your out­puts or pre­dicted out­puts. Be­cause only the out­put or pre­dicted out­put mat­ters and not the al­gorithm, these can be con­sid­ered fair, un­like a prob­lem where an alpha­bet­i­cal de­ci­sion the­o­rist (picks the first de­ci­sion or­dered in alpha­bet­i­cal or­der) is re­warded and ev­ery other type of agent is pun­ished.

  • How­ever, some prob­lems where this con­di­tion doesn’t hold also seem fair. Like sup­pose there are long pro­grams and short pro­grams (in terms of run­ning time). Fur­ther sup­pose pro­grams can out­put ei­ther A or B. The re­ward is then de­ter­mined purely based on these two fac­tors. Now sup­pose that there ex­ists a pro­gram that can calcu­late the util­ities in each of these four wor­lds and then based upon this ei­ther ter­mi­nate im­me­di­ately or run for a long time and then out­put its choice of A or B. As­sume that if it ter­mi­nates im­me­di­ately af­ter the de­ci­sion it’ll qual­ify as a short pro­gram, while if it ter­mi­nates af­ter a long time it is a long pro­gram. Then, as a first ap­prox­i­ma­tion, we can say that this is also a fair prob­lem since it is pos­si­ble to win in all cases.

  • It’s ac­tu­ally slightly more com­plex than this. An AI usu­ally doesn’t have to win on only one prob­lem. Ad­ding code to han­dle more situ­a­tions will ex­tend the run­ning time and may pre­vent such an AI from always be­ing able to choose the dom­i­nant op­tion. An AI might also want to do things like dou­ble check calcu­la­tions or con­sider whether it is ac­tu­ally run­ning on con­sis­tent out­comes, so win­ning the prob­lem might put limi­ta­tions on the AI in other ways.

  • But nonethe­less, so long as we can write such a pro­gram that picks the best op­tion, we can call the prob­lem “fair” in a limited sense. It’s pos­si­ble to ex­tend our defi­ni­tion of “fair” fur­ther. Like sup­pose that it’s im­pos­si­ble to analyse all the op­tions and still re­turn in a short amount of time. This isn’t a prob­lem if the max­i­mum util­ity is in a long op­tion.

  • In re­gards to run­ning time, we can also end up with a non-bi­nary no­tion of “fair” ac­cord­ing to how much ex­tra pro­cess­ing a pro­gram can squeeze in be­fore hav­ing to de­cide the short op­tion. This limits the abil­ity of the AI to check/​recheck its work and ex­am­ine its philo­soph­i­cal as­sump­tions be­fore hav­ing to make the de­ci­sion.

Fi­nal Thoughts

  • Log­i­cal coun­ter­fac­tu­als are of­ten framed in such a way that it seems that we should be build­ing a model of sub­junc­tive de­pen­dence di­rectly from the atoms in the uni­verse. In­stead we pro­duce these from causal model that iden­ti­fies the cur­rent agent and a model of for­get­ting. This makes our job much eas­ier as it al­lows us to dis­en­tan­gle this prob­lem from many of the fun­da­men­tal is­sues in on­tol­ogy and philos­o­phy of sci­ence.

  • Com­par­ing the method in this post to play­ing chicken with the uni­verse: us­ing raw coun­ter­fac­tu­als clar­ifies that such proof-based meth­ods are sim­ply tricks that al­low us to act as though we’ve for­got­ten with­out for­get­ting.

  • In gen­eral, I would sug­gest that log­i­cal coun­ter­fac­tu­als are about know­ing which in­for­ma­tion to erase such that you can pro­duce perfectly con­sis­tent coun­ter­fac­tu­als. Fur­ther, I would sug­gest that if you can’t find in­for­ma­tion to erase that pro­duces perfectly con­sis­tent coun­ter­fac­tu­als then you don’t have a de­ci­sion the­ory prob­lem. Fu­ture work could ex­plore ex­actly when this is pos­si­ble and gen­eral tech­niques for mak­ing this work as the cur­rent ex­plo­ra­tion has been mainly in­for­mal.


  • I changed my ter­minol­ogy from Point Coun­ter­fac­tu­als to De­ci­sion Coun­ter­fac­tu­als and Time­less Coun­ter­fac­tu­als to Raw Coun­ter­fac­tu­als in this post as this bet­ter high­lights where these mod­els come from.

This post was writ­ten with the sup­port of the EA Hotel