# Deconfusing Logical Counterfactuals

This post aims to provide a strong philo­soph­i­cal foun­da­tion for log­i­cal coun­ter­fac­tu­als, while sketch­ing out an in­for­mal scheme that will hope­fully be for­mal­ised fur­ther in the fu­ture. I be­lieve that pro­vid­ing such a philo­soph­i­cal foun­da­tion is im­por­tant for the same rea­sons that Sam listed in Mo­ti­vat­ing a Se­man­tics of Log­i­cal Coun­ter­fac­tu­als.

In­tro­duc­tory Material

• Miri’s Func­tional De­ci­sion The­ory (FDT) pa­per defines sub­junc­tive de­pen­dence to re­fer to the situ­a­tion where two de­ci­sion pro­cess in­volve the same calcu­la­tion. This can re­sult from cau­sa­tion, such as if I calcu­late an equa­tion, then tell you the an­swer and we both im­ple­ment a de­ci­sion based on this sum. It can also oc­cur non-causally, such as a pre­dic­tion be­ing linked to a de­ci­sion as per New­comb’s prob­lem. The prob­lem of log­i­cal coun­ter­fac­tu­als can be char­ac­ter­ised as figur­ing out what pro­cesses sub­junc­tively de­pend on other pro­cesses, so that we can ap­ply FDT.

• In the Co-op­er­a­tion Game, I ar­gued that log­i­cal coun­ter­fac­tu­als are more about your knowl­edge of the state of the world than the world it­self. Sup­pose there are two peo­ple who can choose A or B. Sup­pose that a pre­dic­tor knows that both peo­ple will choose A con­di­tional on them be­ing told one of the fol­low­ing two facts a) the other per­son will choose A b) the other per­son will choose the same as you. Then whether your de­ci­sion is mod­el­led to sub­junc­tively de­pend on the other per­son de­pends on which of the two facts you are told. Go­ing fur­ther than the origi­nal post, one might be told a) and the other b), so that the first sees them­selves as not sub­junc­tively linked to the sec­ond, while the sec­ond sees them­selves as sub­junc­tively linked to the first.

• “Your knowl­edge of the state of the world” can be ex­pli­cated as be­ing about in­ter­nally con­sis­tent coun­ter­fac­tu­als, which we’ll la­bel Raw Coun­ter­fac­tu­als. When there are mul­ti­ple raw coun­ter­fac­tu­als con­sis­tent with your state of knowl­edge, you can pick the one with the high­est util­ity.

• How­ever, there will also be cases where only a sin­gle coun­ter­fac­tual is con­sis­tent with your state of knowl­edge, which re­sults in a rather triv­ial prob­lem. Con­sider for ex­am­ple Trans­par­ent New­comb’s Prob­lem, where a perfect pre­dic­tor places the mil­lion in a trans­par­ent box if and only if it pre­dicts that you will one-box if it does. If you see the mil­lion, you know that you must have one-boxed so it doesn’t strictly make sense to ask what you should do in this situ­a­tion. In­stead, we’ll have to ask some­thing slightly differ­ent in­stead. So, I’ve slightly mod­ified my po­si­tion since writ­ing the Co-op­er­a­tion Game: in some situ­a­tions log­i­cal coun­ter­fac­tu­als will be defined rel­a­tive to an imag­ined, rather than ac­tual epistemic state. We will con­struct these states by eras­ing some in­for­ma­tion as de­scribed later in the post.

• Other de­gen­er­a­tive cases in­clude when you already know what de­ci­sion you’ll make or when you have the abil­ity to figure it out. For ex­am­ple, when you have perfect knowl­edge of the en­vi­ron­ment and the agent, un­less you run into is­sues with un­prov­abil­ity. Note that de­gen­er­acy is more com­mon than you might think since know­ing, for ex­am­ple, that it is a util­ity max­imiser, tells you its ex­act be­havi­our in situ­a­tions with­out op­tions that are tied. Again, in these cases, the an­swer to the ques­tion, “What should the agent do?” is, “The only ac­tion con­sis­tent with the prob­lem state­ment”. How­ever, as we’ll see, it is pos­si­ble some­times to make these ques­tions less triv­ial if you’re will­ing to ac­cept a slight tweak to the prob­lem state­ment.

• We’ll con­sider two kinds of prob­lems, ac­knowl­edg­ing that these aren’t the only types. In ex­ter­nal prob­lems, we imag­ine de­ci­sions from the per­spec­tive of a the­o­ret­i­cal, un­bounded, non-em­bed­ded ob­server who ex­ists out­side of the prob­lem state­ment. Clearly we can’t fully adopt the per­spec­tive of such an ex­ter­nal agent, but de­scribing the high-level de­tails will usu­ally suffice. Crit­i­cally, in the ex­ter­nal per­spec­tive, the ob­server can have goals, such as choos­ing the agent with the max­i­mum util­ity, with­out those be­ing the goals of the agent within the prob­lem.

• In (fully) re­flec­tive prob­lems, we imag­ine de­ci­sions from the per­spec­tive of an agent con­sid­er­ing its own de­ci­sions or po­ten­tial de­ci­sions with full knowl­edge of its own source code. Th­ese prob­lems will com­pli­cate the coun­ter­fac­tu­als since the agent’s goals limit the kind of agent that it could be. For ex­am­ple, an agent that wants to max­imise util­ity should only search over the pos­si­bil­ity space where it is a util­ity max­imiser.

• Mak­ing this dis­tinc­tion more ex­plicit: An ex­ter­nal prob­lem would ask “What de­ci­sion max­imises util­ity?”, as op­posed to a re­flec­tive prob­lem which asks: “What de­ci­sion max­imises util­ity for a util­ity max­imiser?”. This dis­tinc­tion will mainly be im­por­tant here in terms of when it makes a prob­lem triv­ial or not.

• The ex­ter­nal/​re­flec­tive dis­tinc­tion is very similar to the differ­ence be­tween em­bed­ded and non-em­bed­ded prob­lems, but ex­ter­nal prob­lems can in­clude em­bed­ded agents, just from the per­spec­tive of a non-em­bed­ded agent. So we can do a sur­pris­ing amount of our the­o­ris­ing from within the ex­ter­nal per­spec­tive.

Raw Counterfactuals

• Raw coun­ter­fac­tu­als are pro­duced as fol­lows: Start­ing with the ter­ri­to­rywe use some pro­cess to pro­duce a causal model. We can then imag­ine con­struct­ing differ­ent mod­els by switch­ing out or al­ter­ing parts of the model. Th­ese rep­re­sent a co­her­ent con­cept seper­ate from any dis­cus­sion of de­ci­sions. In so far as we care about what could have been, we need to ul­ti­mately re­late our claims to raw coun­ter­fac­tu­als as in­con­sis­tent mod­els could not have been.

• Causal De­ci­sion The­ory uses its own no­tion of coun­ter­fac­tu­als, which we’ll term De­ci­sion Coun­ter­fac­tu­als. Th­ese are cre­ated by perform­ing world surgery on mod­els of causal pro­cesses. Un­like raw coun­ter­fac­tu­als, de­ci­sion coun­ter­fac­tu­als are in­con­sis­tent. Sup­pose that you ac­tu­ally defect in the Pri­soner’s Dilemma, but we are con­sid­er­ing the coun­ter­fac­tual where you co­op­er­ate. Up un­til the point of the de­ci­sion you are the kind of per­son who defects, but when we ar­rive at the de­ci­sion, you mag­i­cally co­op­er­ate.

• De­ci­sion coun­ter­fac­tu­als are use­ful be­cause they ap­prox­i­mate raw coun­ter­fac­tu­als. Perform­ing world surgery all the way back in time re­quires would re­quire a lot of work. In the­o­ret­i­cal de­ci­sion prob­lems it is usu­ally easy to imag­ine a raw coun­ter­fac­tual that would match the prob­lem de­scrip­tion and provide the same an­swer as the de­ci­sion coun­ter­fac­tual. In prac­ti­cal de­ci­sion prob­lems, we don’t have the data to do this.

• Un­for­tu­nately, this ap­prox­i­ma­tion breaks down when perform­ing world surgery to make a coun­ter­fac­tual de­ci­sion con­sis­tent with the past re­quires us to change an el­e­ment of the en­vi­ron­ment that is im­por­tant for the spe­cific prob­lem. For ex­am­ple, in New­comb’s prob­lem, chang­ing your cur­rent de­ci­sion re­quires chang­ing your past self which in­volves chang­ing a pre­dic­tor which is con­sid­ered part of the en­vi­ron­ment. In this case, it makes sense to fall back to raw coun­ter­fac­tu­als and build a new de­ci­sion the­ory on top.

• Func­tional de­ci­sion the­ory, as nor­mally char­ac­ter­ised, is closer to this ideal as world surgery isn’t just performed on your de­ci­sion, but also on all de­ci­sions that sub­junc­tively de­pend on you. This re­moves a bunch of in­con­sis­ten­cies, how­ever, we’ve still in­tro­duced an in­con­sis­tency by as­sum­ing that f(x)=b when f(x) re­ally equals a. The raw coun­ter­fac­tual ap­proach pro­vides a stronger foun­da­tion be­cause it avoids this is­sue. How­ever, since proof-based FDT is very effec­tive at han­dling re­flec­tive prob­lems, it would be worth­while re­build­ing it upon this new foun­da­tion.

• Let’s con­sider New­comb’s prob­lem from the ex­ter­nal per­spec­tive. The ex­ter­nal ob­server is try­ing to max­imise util­ity rather than the agent within the prob­lem, so there is no re­stric­tion on whether the agent can one-box or two-box whilst be­ing con­sis­tent with the prob­lem state­ment. We can then im­me­di­ately ob­serve that if we use raw coun­ter­fac­tu­als, the re­ward only de­pends on the agent’s ul­ti­mate de­ci­sion and agents that one-box score bet­ter than those who don’t. Sim­ple cases like this which al­low mul­ti­ple con­sis­tent coun­ter­fac­tu­als don’t re­quire era­sure.

• On the other hand, there are prob­lems which only al­low a sin­gle raw coun­ter­fac­tual and hence re­quire us to tweak the prob­lem to make it well-defined. Con­sider, for ex­am­ples, Trans­par­ent New­comb’s, where if you see money in the box, you know that you will re­ceive ex­actly \$1 mil­lion. Some peo­ple say this fails to ac­count for the agent in the simu­la­tor, but it’s en­tirely pos­si­ble that Omega may be able to figure out what ac­tion you will take based on high level rea­son­ing, as op­posed hav­ing to run a com­plete simu­la­tion of you. We’ll de­scribe later a way of tweak­ing the prob­lem state­ment into some­thing that is both con­sis­tent and non-triv­ial.

Gen­eral Ap­proach to Log­i­cal Counterfactuals

• We will now at­tempt to pro­duce a more solid the­ory of ex­ter­nal prob­lems us­ing FDT. This will al­low us to in­ter­pret de­ci­sion prob­lems where only one de­ci­sion is con­sis­tent with the prob­lem state­ment in a non-triv­ial way.

• FDT frames log­i­cal coun­ter­fac­tu­als as “What would the world be like if f(x)=b in­stead of a?” which doesn’t strictly make sense as noted in the dis­cus­sion on raw coun­ter­fac­tu­als. Two points: a) I think it should be clear that this ques­tion only makes sense in terms of think­ing of per­tur­ba­tions of the map and not as a di­rect claim about the ter­ri­tory (see map and ter­ri­tory). b) We’ll ad­dress this prob­lem by propos­ing a differ­ent ap­proach for foun­da­tions, which these proof-based ap­proaches should ul­ti­mately be jus­tified in terms of.

• There are two pos­si­ble paths to­wards a more con­sis­tent the­ory of log­i­cal coun­ter­fac­tu­als for these situ­a­tions. In both cases we in­ter­pret the ques­tion of what it would mean to change the out­put of a func­tion as an in­for­mal de­scrip­tion of a similar ques­tion that is ac­tu­ally well-defined. The first ap­proach is see what con­se­quences can be log­i­cally de­duced from f(x)=b while im­ple­ment­ing a strat­egy to pre­vent us from de­duc­ing in­cor­rect state­ments from the in­con­sis­ten­cies. This is of­ten done by play­ing chicken with the uni­verse. We will term this a para­con­sis­tent ap­proach, even though it doesn’t ex­plic­itly make use of para­con­sis­tent logic as it is para­con­sis­tent in spirit.

• An al­ter­na­tive ap­proach would be to in­ter­pret this sen­tence as mak­ing claims about raw coun­ter­fac­tu­als. In FDT terms, the raw coun­ter­fac­tual ap­proach finds an f’ such that f’(x)=b and also with cer­tain as of yet un­stated similar­i­ties to f and sub­sti­tute this into all sub­junc­tively linked pro­cesses. The para­con­sis­tent ap­proach is eas­ier to do in­for­mally, but I sus­pect that the raw coun­ter­fac­tu­ally ap­proach would be more amenable to for­mal­i­sa­tion and pro­vides more philo­soph­i­cal in­sight into what is ac­tu­ally go­ing on. In so far as the para­con­sis­tent ap­proach may be more con­ve­nient for an im­ple­men­ta­tion per­spec­tive than the first, we can jus­tify it by ty­ing it to raw coun­ter­fac­tu­als.

Decisions

• Now that we’ve out­lined the broad ap­proach, we should dig more into the ques­tion of what ex­actly it means to make a de­ci­sion. As I ex­plained in a pre­vi­ous post, there’s a sense in which you don’t so much ‘make’ a de­ci­sion as im­ple­ment one. If you make some­thing, it im­plies that it didn’t ex­ist be­fore and now it does. In the case of de­ci­sions, it nudges you to­wards be­liev­ing that the de­ci­sion you were go­ing to im­ple­ment wasn’t set, then you made a de­ci­sion, and then it was. How­ever, when “you” and the en­vi­ron­ment are defined down to the atom, you can only im­ple­ment one de­ci­sion. It was always the case from the start of time that you were go­ing to im­ple­ment that de­ci­sion.

• We note that if you have perfect in­for­ma­tion about the agent and the en­vi­ron­ment, you need to for­get or at least pre­tend to for­get some in­for­ma­tion about the agent so that we can pro­duce coun­ter­fac­tual ver­sions of the agent who de­cide slightly differ­ently. See Sh­minux’s post on Log­i­cal Coun­ter­fac­tu­als are Low Res for a similar ar­gu­ment, but framed slightly differ­ently. The key differ­ence is that I’m not sug­gest­ing just adding noise to the model, but for­get­ting spe­cific in­for­ma­tion that doesn’t af­fect the out­come.

• In Trans­par­ent New­comb’s, it would be nat­u­ral to erase the knowl­edge that the box is full. This would then re­sult in two coun­ter­fac­tu­als: a) the one where the agent sees an empty box and two boxes, b) the one where the agent sees a full box and one boxes. It would be nat­u­ral to re­lax the scope of who we care about from the agent who sees a mil­lion in the box to the agent at the end of the prob­lem re­gard­less of what they see. If we do so, we then have a proper de­ci­sion prob­lem and we can see that one-box­ing is bet­ter.

• Ac­tu­ally there’s a slight hitch here. In or­der to define the out­come an agent re­ceives, we need to define what the pre­dic­tor will pre­dict when an agent sees the box con­tain­ing the mil­lion. But it is im­pos­si­ble to place a two-boxer in this situ­a­tion. We can re­solve this by defin­ing the pre­dic­tor as simu­lat­ing the agent re­spond­ing to an in­put rep­re­sent­ing an in­con­sis­tent situ­a­tion as I’ve de­scribed in Coun­ter­fac­tu­als for Perfect Pre­dic­tors.

• In or­der to imag­ine a con­sis­tent world, when we imag­ine a differ­ent “you”, we must also imag­ine the en­vi­ron­ment in­ter­act­ing with that differ­ent “you” so that, for ex­am­ple, the pre­dic­tor makes a differ­ent pre­dic­tion. Causal de­ci­sion the­o­rists con­struct these coun­ter­fac­tu­als in­cor­rectly and hence they be­lieve that they can change their de­ci­sion with­out chang­ing the pre­dic­tion. They fail to re­al­ise that they can’t ac­tu­ally “change” their de­ci­sion as there is a sin­gle de­ci­sion that they will in­evitably im­ple­ment. I sug­gest re­plac­ing “change a de­ci­sion” with “shift coun­ter­fac­tu­als” when it is im­por­tant to be able to think clearly about these top­ics. It also clar­ifies why the pre­dic­tion can change with­out back­wards cau­sa­tion (my pre­vi­ous post on New­comb’s prob­lem con­tains fur­ther ma­te­rial on why this isn’t an is­sue).

Erasure

• Here’s how the era­sure may pro­ceed for the ex­am­ple of Trans­par­ent’s New­comb Prob­lem. Sup­pose we erase all in­for­ma­tion about what de­ci­sion the agent is go­ing to make. This also re­quires eras­ing the fact that you see a mil­lion in the trans­par­ent box. Then we look at all coun­ter­fac­tu­ally pos­si­ble agents and no­tice that the re­ward de­pends only on whether you are an agent who ul­ti­mately one-boxes or an agent who two-boxes. Those who one-box see the mil­lion and then re­ceive it, those who two-box see no money in the trans­par­ent box and re­ceive \$1000 only. The coun­ter­fac­tual in­volv­ing one-box­ing performs bet­ter than that in­volv­ing two-box­ing, so we en­dorse one-box­ing.

• Things be­come more com­pli­cated if we want to erase less in­for­ma­tion about the agent. For ex­am­ple, we might want an agent to know that it is a util­ity max­imiser as this might be rele­vant to eval­u­at­ing the out­come the agent will re­ceive from fu­ture de­ci­sions. Sup­pose that if you one-box in Trans­par­ent New­comb’s you’ll then be offered a choice of \$1000 or \$2000, but if you two-box you’ll be offered \$0 or \$10 mil­lion. We can’t naively erase all in­for­ma­tion about your de­ci­sion pro­cess in or­der to com­pute a coun­ter­fac­tual of whether you should one-box or two-box. Other­wise, we end up with a situ­a­tion where, for ex­am­ple, there are agents who one-box and get differ­ent re­wards. Here the eas­iest solu­tion is to “col­lapse” all of the de­ci­sions and ask about a policy in­stead that cov­ers all three de­ci­sions that may be faced. Then we can calcu­late the ex­pec­ta­tions with­out pro­duc­ing any in­con­sis­ten­cies in the coun­ter­fac­tu­als as it then be­comes safe to erase the knowl­edge that it is a util­ity max­imiser.

• The 5 and 10 prob­lem doesn’t oc­cur with the for­get­ting ap­proach. Com­pare: If you keep the be­lief that you are a util­ity max­imiser then the only choice you can im­ple­ment is 10 so we don’t have a de­ci­sion prob­lem. We can define all the pos­si­ble strate­gies as fol­lows, p prob­a­bil­ity of choos­ing 5 and 1-p prob­a­bil­ity of choos­ing 10, so for­get ev­ery­thing about your­self ex­cept that you are one of these strate­gies. There’s no down­side as there is no need for an agent to know whether or not it is a util­ity max­imiser. So we can solve the 5 and 10 prob­lem with­out ep­silon ex­plo­ra­tion.

• When will we be able to use this for­get­ting tech­nique? One ini­tially thought might be the same scope FDT is de­signed to be op­ti­mal on—prob­lems where the re­ward only de­pends on your out­puts or pre­dicted out­puts. Be­cause only the out­put or pre­dicted out­put mat­ters and not the al­gorithm, these can be con­sid­ered fair, un­like a prob­lem where an alpha­bet­i­cal de­ci­sion the­o­rist (picks the first de­ci­sion or­dered in alpha­bet­i­cal or­der) is re­warded and ev­ery other type of agent is pun­ished.

• How­ever, some prob­lems where this con­di­tion doesn’t hold also seem fair. Like sup­pose there are long pro­grams and short pro­grams (in terms of run­ning time). Fur­ther sup­pose pro­grams can out­put ei­ther A or B. The re­ward is then de­ter­mined purely based on these two fac­tors. Now sup­pose that there ex­ists a pro­gram that can calcu­late the util­ities in each of these four wor­lds and then based upon this ei­ther ter­mi­nate im­me­di­ately or run for a long time and then out­put its choice of A or B. As­sume that if it ter­mi­nates im­me­di­ately af­ter the de­ci­sion it’ll qual­ify as a short pro­gram, while if it ter­mi­nates af­ter a long time it is a long pro­gram. Then, as a first ap­prox­i­ma­tion, we can say that this is also a fair prob­lem since it is pos­si­ble to win in all cases.

• It’s ac­tu­ally slightly more com­plex than this. An AI usu­ally doesn’t have to win on only one prob­lem. Ad­ding code to han­dle more situ­a­tions will ex­tend the run­ning time and may pre­vent such an AI from always be­ing able to choose the dom­i­nant op­tion. An AI might also want to do things like dou­ble check calcu­la­tions or con­sider whether it is ac­tu­ally run­ning on con­sis­tent out­comes, so win­ning the prob­lem might put limi­ta­tions on the AI in other ways.

• But nonethe­less, so long as we can write such a pro­gram that picks the best op­tion, we can call the prob­lem “fair” in a limited sense. It’s pos­si­ble to ex­tend our defi­ni­tion of “fair” fur­ther. Like sup­pose that it’s im­pos­si­ble to analyse all the op­tions and still re­turn in a short amount of time. This isn’t a prob­lem if the max­i­mum util­ity is in a long op­tion.

• In re­gards to run­ning time, we can also end up with a non-bi­nary no­tion of “fair” ac­cord­ing to how much ex­tra pro­cess­ing a pro­gram can squeeze in be­fore hav­ing to de­cide the short op­tion. This limits the abil­ity of the AI to check/​recheck its work and ex­am­ine its philo­soph­i­cal as­sump­tions be­fore hav­ing to make the de­ci­sion.

Fi­nal Thoughts

• Log­i­cal coun­ter­fac­tu­als are of­ten framed in such a way that it seems that we should be build­ing a model of sub­junc­tive de­pen­dence di­rectly from the atoms in the uni­verse. In­stead we pro­duce these from causal model that iden­ti­fies the cur­rent agent and a model of for­get­ting. This makes our job much eas­ier as it al­lows us to dis­en­tan­gle this prob­lem from many of the fun­da­men­tal is­sues in on­tol­ogy and philos­o­phy of sci­ence.

• Com­par­ing the method in this post to play­ing chicken with the uni­verse: us­ing raw coun­ter­fac­tu­als clar­ifies that such proof-based meth­ods are sim­ply tricks that al­low us to act as though we’ve for­got­ten with­out for­get­ting.

• In gen­eral, I would sug­gest that log­i­cal coun­ter­fac­tu­als are about know­ing which in­for­ma­tion to erase such that you can pro­duce perfectly con­sis­tent coun­ter­fac­tu­als. Fur­ther, I would sug­gest that if you can’t find in­for­ma­tion to erase that pro­duces perfectly con­sis­tent coun­ter­fac­tu­als then you don’t have a de­ci­sion the­ory prob­lem. Fu­ture work could ex­plore ex­actly when this is pos­si­ble and gen­eral tech­niques for mak­ing this work as the cur­rent ex­plo­ra­tion has been mainly in­for­mal.

Notes:

• I changed my ter­minol­ogy from Point Coun­ter­fac­tu­als to De­ci­sion Coun­ter­fac­tu­als and Time­less Coun­ter­fac­tu­als to Raw Coun­ter­fac­tu­als in this post as this bet­ter high­lights where these mod­els come from.

This post was writ­ten with the sup­port of the EA Hotel

• Other de­gen­er­a­tive cases in­clude when you already know what de­ci­sion you’ll make or when you have the abil­ity to figure it out. For ex­am­ple, when you have perfect knowl­edge of the en­vi­ron­ment and the agent, un­less you run into is­sues with un­prov­abil­ity. Note that de­gen­er­acy is more com­mon than you might think since know­ing, for ex­am­ple, that it is a util­ity max­imiser, tells you its ex­act be­havi­our in situ­a­tions with­out op­tions that are tied. Again, in these cases, the an­swer to the ques­tion, “What should the agent do?” is, “The only ac­tion con­sis­tent with the prob­lem state­ment”.

Why should this be the case? What do you think of the motto de­ci­sions are for mak­ing bat out­comes in­con­sis­tent?

• I kind of agree with it, but in a way that makes it triv­ially true. Once you have erased in­for­ma­tion to provide mul­ti­ple pos­si­ble raw coun­ter­fac­tu­als, you have the choice to frame the de­ci­sion prob­lem as ei­ther choos­ing the best out­come or avoid­ing sub-op­ti­mal out­comes. But of course, this doesn’t re­ally make a differ­ence.

It seems rather strange to talk about mak­ing an out­come in­con­sis­tent which was already in­con­sis­tent. Why is this con­sid­ered an op­tion that was available for you to choose, in­stead of one that was never available to choose? Con­sider a situ­a­tion where the world and agent have both been pre­cisely defined. Deter­minism means there is only one pos­si­ble op­tion, but de­ci­sions prob­lems have mul­ti­ple pos­si­ble op­tions. It is not clear which de­ci­sions that are in­con­sis­tent with what ac­tu­ally hap­pened count as “could have been cho­sen” and which count as, “were never pos­si­ble”.

Ac­tu­ally, this re­lates to my post on Coun­ter­fac­tu­als for Perfect Pre­dic­tors. Talk­ing about mak­ing your cur­rent situ­a­tion in­con­sis­tent doesn’t make sense liter­ally, only analog­i­cally. After all, if you’re in a situ­a­tion it has to be con­sis­tent. The way that I get round this in my post is by re­plac­ing talk of de­ci­sions given a situ­a­tion with talk of de­ci­sions given an in­put rep­re­sent­ing a situ­a­tion. While you can’t make your cur­rent situ­a­tion in­con­sis­tent, it is some­times pos­si­ble for a pro­gram to be writ­ten such that it can­not be put in the situ­a­tion rep­re­sent­ing an in­put as its out­put would be in­con­sis­tent with that. And that lets us define what we wanted to define, with­out hav­ing to fudge philo­soph­i­cally.

• I kind of agree with it, but in a way that makes it triv­ially true. Once you have erased in­for­ma­tion to provide mul­ti­ple pos­si­ble raw coun­ter­fac­tu­als, you have the choice to frame the de­ci­sion prob­lem as ei­ther choos­ing the best out­come or avoid­ing sub-op­ti­mal out­comes. But of course, this doesn’t re­ally make a differ­ence.

I think our dis­agree­ment is around the sta­tus of de­ci­sion prob­lems be­fore you’ve erased in­for­ma­tion, not af­ter. In your post, you say that be­fore eras­ing in­for­ma­tion, a prob­lem where what you do is de­ter­mined is triv­ial, in that you only have the one op­tion. That’s the po­si­tion I’m dis­agree­ing with. To the ex­tent that eras­ing in­for­ma­tion is a use­ful idea, it is use­ful pre­cisely for deal­ing with such prob­lems—oth­er­wise you would not need to erase the in­for­ma­tion. The way you’re de­scribing it, it sounds like eras­ing in­for­ma­tion isn’t some­thing agents them­selves are sup­posed to ever have to do. In­stead, it is a use­ful tool for a de­ci­sion the­o­rist, to trans­form triv­ial/​mean­ingless de­ci­sion prob­lems into non­triv­ial/​mean­ingful ones. This seems wrong to me.

It seems rather strange to talk about mak­ing an out­come in­con­sis­tent which was already in­con­sis­tent. Why is this con­sid­ered an op­tion that was available for you to choose, in­stead of one that was never available to choose? Con­sider a situ­a­tion where the world and agent have both been pre­cisely defined. Deter­minism means there is only one pos­si­ble op­tion, but de­ci­sions prob­lems have mul­ti­ple pos­si­ble op­tions. It is not clear which de­ci­sions that are in­con­sis­tent with what ac­tu­ally hap­pened count as “could have been cho­sen” and which count as, “were never pos­si­ble”.

I’m some­what con­fused about what you’re say­ing in this para­graph and what as­sump­tions you might be mak­ing. I think it might help to fo­cus on ex­am­ples. Two ex­am­ples which I think mo­ti­vate the idea:

• Smok­ing le­sion. It can of­ten be quite a stretch to put an agent into a smok­ing le­sion prob­lem, be­cause the prob­lem as­sumes cer­tain pop­u­la­tion statis­tics which may be im­pos­si­ble to achieve if the pop­u­la­tion is as­sumed to make de­ci­sions in a par­tic­u­lar way. My im­pres­sion is that some philoso­phers hold a de­ci­sion the­ory like CDT and EDT re­spon­si­ble for what ad­vice it offers in a par­tic­u­lar situ­a­tion, even if it would be im­pos­si­ble to put agents in that situ­a­tion if they were the sort of agents who fol­lowed the ad­vice of the de­ci­sion the­ory in ques­tion. In other words, even if it is im­pos­si­ble to put EDT agents in a situ­a­tion where they are rep­re­sen­ta­tive of a pop­u­la­tion as de­scribed in the smok­ing le­sion prob­lem, EDT is held re­spon­si­ble for offer­ing bad ad­vice to agents in such a situ­a­tion. I take the motto “de­ci­sions are for mak­ing bad out­comes in­con­sis­tent” as speak­ing against this view, in­stead giv­ing EDT credit for mak­ing it im­pos­si­ble for an agent to end up in such a situ­a­tion.

• (In my post on smok­ing le­sion, I came up with a way to get EDT agents into a smok­ing-le­sion situ­a­tion; how­ever, it re­quired cer­tain as­sump­tions about their in­ter­nal ar­chi­tec­ture. We could take the ar­gu­ment as speak­ing against such an ar­chi­tec­ture, rather than EDT. This in­ter­pre­ta­tion seems quite nat­u­ral to me, be­cause the setup re­quired to get EDT into a smok­ing le­sion situ­a­tion is fairly un­nat­u­ral, and one could sim­ply re­fuse to build agents with such an un­nat­u­ral ar­chi­tec­ture.)

• Trans­par­ent New­comb. In the usual setup, the agent is de­scribed as already fac­ing a large sum of money. We are also told that this situ­a­tion is only pos­si­ble if the agent one-boxes—a two-box­ing agent won’t get this op­por­tu­nity (or, will get it with much smaller prob­a­bil­ity). Aca­demic de­ci­sion the­o­rists tend to, again, judge the de­ci­sion the­ory on the qual­ity of ad­vice offered un­der the as­sump­tion that an agent ends up in the situ­a­tion, dis­re­gard­ing the effect of the de­ci­sion on whether an agent could be in the situ­a­tion in the first place. On this view, de­ci­sion the­o­ries such as UDT which one-box are giv­ing bad ad­vice, be­cause if you are already in the situ­a­tion, you can get more money by two-box­ing. In this case, the motto “de­ci­sions are for mak­ing bad out­comes in­con­sis­tent” is sup­posed to in­di­cate that agents should one-box, so that they can end up in the bet­ter situ­a­tion. A two-box­ing de­ci­sion the­ory like CDT is judged poorly for mak­ing it im­pos­si­ble to get a very good pay­out.

Im­por­tantly, trans­par­ent New­comb (with a perfect pre­dic­tor) is a case where the agent has enough in­for­ma­tion to know its own ac­tion: it must one-box, since it could not be in this situ­a­tion if it two-boxed. Yet, we can talk about de­ci­sion the­o­ries such as CDT which two-box in such cases. So it is not mean­ingless to talk about what hap­pens if you take an ac­tion which is in­con­sis­tent with what you know! What you do in such situ­a­tions has con­se­quences.

I don’t know that you dis­agree with any of this, since in your origi­nal es­say you say:

For ex­am­ple, when you have perfect knowl­edge of the en­vi­ron­ment and the agent, un­less you run into is­sues with un­prov­abil­ity. Note that de­gen­er­acy is more com­mon than you might think since know­ing, for ex­am­ple, that it is a util­ity max­imiser, tells you its ex­act be­havi­our in situ­a­tions with­out op­tions that are tied.

How­ever, you go on to say:

Again, in these cases, the an­swer to the ques­tion, “What should the agent do?” is, “The only ac­tion con­sis­tent with the prob­lem state­ment”.

which is what I was dis­agree­ing with. We can set up a sort of re­verse trans­par­ent New­comb, where you should take the ac­tion which makes the situ­a­tion im­pos­si­ble: Omega cooks you a din­ner se­lected out of those which it pre­dicts you will eat. Know­ing this, you should re­fuse to eat meals which you don’t like, even though when pre­sented with such a meal you know you must eat it (since Omega only pre­sents you with a meal you will eat).

(Aside: the prob­lem isn’t fully speci­fied un­til we also say what Omega does if there is noth­ing you will eat. We could say that Omega serves you noth­ing in that case.)

Talk­ing about mak­ing your cur­rent situ­a­tion in­con­sis­tent doesn’t make sense liter­ally, only analog­i­cally. After all, if you’re in a situ­a­tion it has to be con­sis­tent. The way that I get round this in my post is by re­plac­ing talk of de­ci­sions given a situ­a­tion with talk of de­ci­sions given an in­put rep­re­sent­ing a situ­a­tion. While you can’t make your cur­rent situ­a­tion in­con­sis­tent, it is some­times pos­si­ble for a pro­gram to be writ­ten such that it can­not be put in the situ­a­tion rep­re­sent­ing an in­put as its out­put would be in­con­sis­tent with that. And that let’s us define what we wanted to define, with­out the nudge to fudge philo­soph­i­cally.

This seems ba­si­cally con­sis­tent with what I’m say­ing (in­deed, al­most the same as what I’m say­ing), ex­cept I take strong ob­jec­tion to some of your lan­guage. I don’t think you “analog­i­cally” make situ­a­tions in­con­sis­tent; I think you ac­tu­ally do. Re­plac­ing “situ­a­tion” with “in­put rep­re­sent­ing a situ­a­tion” seems sort of in the right di­rec­tion, but the no­tion of “in­put” is prob­le­matic, be­cause it can be your own in­ter­nal rea­son­ing which pre­dicts your ac­tion ac­cu­rately.

Of the chicken rule, for ex­am­ple, it is liter­ally (not analog­i­cally) cor­rect to say that the al­gorithm takes a differ­ent ac­tion if it ever proves that it takes a cer­tain ac­tion. It is also true that it never ends up in this situ­a­tion. We could also say that you never take an ac­tion if you have an in­ter­nal state rep­re­sent­ing tak­ing cer­tainty that you take that ac­tion. How­ever, it is fur­ther­more true that you never get into such an in­ter­nal state.

Similarly, in the ex­am­ple where Omega cooks you some­thing which you will eat, I would think it liter­ally cor­rect to say that you would not eat pud­ding (sup­pos­ing that’s a prop­erty of your de­ci­sion al­gorithm).

• (This com­ment was writ­ten be­fore read­ing EDT=CDT. I think some of my views might up­date based on that when I have more time to think about it)

In your post, you say that be­fore eras­ing in­for­ma­tion, a prob­lem where what you do is de­ter­mined is triv­ial, in that you only have the one op­tion. That’s the po­si­tion I’m dis­agree­ing with.

It will be con­ve­nient for me to make a slightly differ­ent claim than the one I made above. In­stead of claiming that the prob­lem is triv­ial in com­pletely de­ter­mined situ­a­tions, I will claim that it is triv­ial given the most straight­for­ward in­ter­pre­ta­tion of a prob­lem* (the set of pos­si­ble ac­tions for an agent are all those which are con­sis­tent with the prob­lem state­ment and the ac­tion which is cho­sen is se­lected from this set of pos­si­ble ac­tions). In so far as both of us want to talk about de­ci­sion prob­lems where mul­ti­ple pos­si­ble op­tions are con­sid­ered, we need to provide a differ­ent in­ter­pre­ta­tion of what de­ci­sion prob­lems are. Your ap­proach is to al­low the se­lec­tion of in­con­sis­tent ac­tions, while I sug­gest eras­ing in­for­ma­tion to provide a con­sis­tent situ­a­tion.

My re­sponse is to ar­gue as per my pre­vi­ous com­ment that there doesn’t seem to be any crite­ria for de­ter­min­ing which in­con­sis­tent ac­tions are con­sid­ered and which ones aren’t. I sup­pose you could re­spond that I haven’t pro­vided crite­ria for de­ter­min­ing what in­for­ma­tion should be erased, but my ap­proach has the benefit that if you do provide such crite­ria, log­i­cal coun­ter­fac­tu­als are solved for free, while it’s much more un­clear how to ap­proach this prob­lem in the al­low­ing in­con­sis­tency ap­proach (al­though there has been some progress with things like play­ing chicken with the uni­verse).

*ex­clud­ing un­prov­abil­ity issues

The way you’re de­scribing it, it sounds like eras­ing in­for­ma­tion isn’t some­thing agents them­selves are sup­posed to ever have to do

You’re at the stage of try­ing to figure out how agents should make de­ci­sions. I’m at the stage of try­ing to un­der­stand what a mak­ing a good de­ci­sion even means. Once there is a clearer un­der­stand­ing of what a de­ci­sion is, we can then write an al­gorithm to make good de­ci­sions or we may dis­cover that the con­cept dis­solves, in which case we will have to spec­ify the prob­lem more pre­cisely. Right now, I’d be perfectly happy just to have a clear crite­ria by which an ex­ter­nal eval­u­a­tor could say whether an agent made a good de­ci­sion or not, as that would con­sti­tute sub­stan­tial progress.

I’m some­what con­fused about what you’re say­ing in this para­graph and what as­sump­tions you might be making

My point was that there isn’t any crite­ria for de­ter­min­ing which in­con­sis­tent ac­tions are con­sid­ered and which ones aren’t if you are just thrown a com­plete de­scrip­tion of a uni­verse and an agent. Trans­par­ent New­comb’s already comes with the op­tions and coun­ter­fac­tu­als at­tached. My in­ter­est is in how to con­struct them from scratch.

My im­pres­sion is that some philoso­phers hold a de­ci­sion the­ory like CDT and EDT re­spon­si­ble for what ad­vice it offers in a par­tic­u­lar situ­a­tion, even if it would be im­pos­si­ble to put agents in that situ­a­tion

I think it is im­por­tant to use very pre­cise lan­guage here. The agent isn’t be­ing rated on what it would do in such a situ­a­tion, it is be­ing rated on whether or not it can be put into that situ­a­tion at all.

I sus­pect that some­times when an agent can’t be put into a situ­a­tion it is be­cause the prob­lem has been badly for­mu­lated (or falls out­side the scope of prob­lems where its de­ci­sion the­ory is defined), while in other cases this is a rea­son for or against util­is­ing a spe­cific de­ci­sion the­ory al­gorithm. Hold­ing an agent re­spon­si­ble for all situ­a­tions it can’t be in seems like the wrong move, as it feels that there is some more fun­da­men­tal con­fu­sion that needs to be cleaned up.

I take the motto “de­ci­sions are for mak­ing bad out­comes in­con­sis­tent”

I’m not a fan of rea­son­ing via motto when dis­cussing these kinds of philo­soph­i­cal prob­lems which turn on very pre­cise rea­son­ing.

So it is not mean­ingless to talk about what hap­pens if you take an ac­tion which is in­con­sis­tent with what you know!… I don’t know that you dis­agree with any of this… We can set up a sort of re­verse trans­par­ent New­comb, where you should take the ac­tion which makes the situ­a­tion impossible

There’s some­thing of a ten­sion be­tween what I’ve said in this post about only be­ing able to take de­ci­sions that are con­sis­tent and what I said in Coun­ter­fac­tu­als for Perfect Pre­dic­tors, where I noted a way of do­ing some­thing analo­gous to act­ing to make your situ­a­tion in­con­sis­tent. This can be cleared up by not­ing that eras­ing in­for­ma­tion in many de­ci­sion the­ory prob­lems pro­vides a prob­lem state­ment where in­put-out­put maps can define all the rele­vant in­for­ma­tion about an agent. So I’m propos­ing that this tech­nique be used in com­bi­na­tion with era­sure, rather than sep­a­rately.

• In so far as both of us want to talk about de­ci­sion prob­lems where mul­ti­ple pos­si­ble op­tions are con­sid­ered, we need to provide a differ­ent in­ter­pre­ta­tion of what de­ci­sion prob­lems are. Your ap­proach is to al­low the se­lec­tion of in­con­sis­tent ac­tions, while I sug­gest eras­ing in­for­ma­tion to provide a con­sis­tent situ­a­tion.

I can agree that there’s an in­ter­pre­ta­tional is­sue, but some­thing is bug­ging me here which I’m not sure how to ar­tic­u­late. A claim which I would make and which might be some­how re­lated to what’s bug­ging me is: the in­ter­pre­ta­tion is­sue of a de­ci­sion prob­lem should be mostly gone when we for­mally spec­ify it. (There’s still a big in­ter­pre­ta­tion is­sue re­lat­ing to how the for­mal­iza­tion “re­lates to real cases” or “re­lates to AI de­sign in prac­tice” etc—ie, how it is used—but this seems less re­lated to our dis­agree­ment/​mis­com­mu­ni­ca­tion.)

If the in­ter­pre­ta­tion ques­tion is gone once a prob­lem is framed in a for­mal way, then (speak­ing loosely here and try­ing to con­nect with what’s bug­ging me about your fram­ing) it seems like ei­ther the for­mal­ism some­how forces us to do the for­get­ting (which strikes me as odd) or we are left with prob­lems which re­ally do in­volve im­pos­si­ble ac­tions w/​o any in­ter­pre­ta­tion is­sue. I fa­vor the lat­ter.

My re­sponse is to ar­gue as per my pre­vi­ous com­ment that there doesn’t seem to be any crite­ria for de­ter­min­ing which in­con­sis­tent ac­tions are con­sid­ered and which ones aren’t.

The de­ci­sion al­gorithm con­sid­ers each out­put from a given set. For ex­am­ple, with proof-based de­ci­sion the­o­ries such as MUDT, it is po­ten­tially con­ve­nient to con­sider the case where out­put is true or false (so that the de­ci­sion pro­ce­dure can be thought of as a sen­tence). In that case, the de­ci­sion pro­ce­dure con­sid­ers those two pos­si­bil­ities. There is no “ex­tract the set of pos­si­ble ac­tions from the de­ci­sion prob­lem state­ment” step—so you don’t run into a prob­lem of “why not out­put 2? It’s in­con­sis­tent with the prob­lem state­ment, but you’re not let­ting that stop you in other cases”.

It’s a prop­erty of the for­mal­ism, but it doesn’t seem like a par­tic­u­larly con­cern­ing one—if one imag­ines try­ing to carry things over to, say, pro­gram­ming a robot, there’s a clear set of pos­si­ble ac­tions even if you know the code may come to re­li­ably pre­dict its own ac­tions. The prob­lem of known ac­tions seems to be about iden­ti­fy­ing the con­se­quences of ac­tions which you know you wouldn’t take, rather than about iden­ti­fy­ing the ac­tion set.

I sup­pose you could re­spond that I haven’t pro­vided crite­ria for de­ter­min­ing what in­for­ma­tion should be erased, but my ap­proach has the benefit that if you do provide such crite­ria, log­i­cal coun­ter­fac­tu­als are solved for free, while it’s much more un­clear how to ap­proach this prob­lem in the al­low­ing in­con­sis­tency ap­proach (al­though there has been some progress with things like play­ing chicken with the uni­verse).

I feel like I’m over-stat­ing my po­si­tion a bit in the fol­low­ing, but: this doesn’t seem any differ­ent from say­ing that if we provide a log­i­cal coun­ter­fac­tual, we solve de­ci­sion the­ory for free. IE, the no­tion of for­get­ting has so many free pa­ram­e­ters that it doesn’t seem like much of a re­duc­tion of the prob­lem. You say that a for­get­ting crite­rion would solve the prob­lem of log­i­cal coun­ter­fac­tu­als, but ac­tu­ally it is very un­clear how much or how lit­tle it would ac­com­plish.

You’re at the stage of try­ing to figure out how agents should make de­ci­sions. I’m at the stage of try­ing to un­der­stand what a mak­ing a good de­ci­sion even means. Once there is a clearer un­der­stand­ing of what a de­ci­sion is, we can then write an al­gorithm to make good de­ci­sions or we may dis­cover that the con­cept dis­solves, in which case we will have to spec­ify the prob­lem more pre­cisely. Right now, I’d be perfectly happy just to have a clear crite­ria by which an ex­ter­nal eval­u­a­tor could say whether an agent made a good de­ci­sion or not, as that would con­sti­tute sub­stan­tial progress.

I dis­agree with the ‘stage’ fram­ing (I wouldn’t claim to un­der­stand what mak­ing a good de­ci­sion even means; I’d say that’s a huge part of the con­fu­sion I’m try­ing to stare at—for similar rea­sons, I dis­agree with your foun­da­tions foun­da­tions post in so far as it de­scribes what I’m in­ter­ested in as not be­ing agent foun­da­tions foun­da­tions), but oth­er­wise this makes sense.

This does seem like a big differ­ence in per­spec­tive, and I agree that if I take that per­spec­tive, it is bet­ter to sim­ply re­ject prob­lems where the ac­tion taken by the agent is already de­ter­mined (or call them triv­ial, etc). To me, that the agent it­self needs to judge is quite cen­tral to the con­fu­sion about de­ci­sions.

My point was that there isn’t any crite­ria for de­ter­min­ing which in­con­sis­tent ac­tions are con­sid­ered and which ones aren’t if you are just thrown a com­plete de­scrip­tion of a uni­verse and an agent.

As men­tioned ear­lier, this doesn’t seem prob­le­matic to me. First, if you’re handed a de­scrip­tion of a uni­verse with an agent already in it, then you don’t have to worry about defin­ing what the agent con­sid­ers: the agent already con­sid­ers what it con­sid­ers (just like it already does what it does). You can look at a trace of the ex­e­cuted de­ci­sion pro­ce­dure and read off which ac­tions it con­sid­ers. (Granted, you may not know how to in­ter­pret the code, but I think that’s not the prob­lem ei­ther of us are talk­ing about.)

But there’s an­other differ­ence here in how we’re think­ing about de­ci­sion the­ory, con­nected with the ear­lier-clar­ified differ­ence. Your ver­sion of the 5&10 prob­lem is that a de­ci­sion the­o­rist is handed a com­plete speci­fi­ca­tion of the uni­verse, in­clud­ing the agent. The agent takes some ac­tion, since it is fully defined, and the prob­lem is that the de­ci­sion the­o­rist doesn’t know how to judge the agent’s de­ci­sion.

(This might not be how you would define the 5&10 prob­lem, but my goal here is to get at how you are think­ing about the no­tion of de­ci­sion prob­lem in gen­eral, not 5&10 in par­tic­u­lar—so bear with me.)

My ver­sion of the 5&10 prob­lem is that you give a de­ci­sion the­o­rist the par­tially defined uni­verse with the \$5 bill on the table and te \$10 bill on the table, stipu­lat­ing that what­ever source code the de­ci­sion the­o­rist chooses for the agent, the agent it­self should know the source code and be ca­pa­ble of rea­son­ing about it ap­pro­pri­ately. (This is some­what vague but can be given for­mal­iza­tions such as that of the set­ting of proof-based DT.) In other words, the de­ci­sion the­o­rist works with a de­ci­sion prob­lem which is a “world with a hole in it” (a hole wait­ing for an agent). The challenge lies in the fact that what­ever agent is placed into the prob­lem by the de­ci­sion the­o­rist, the agent is fac­ing a fully-speci­fied uni­verse with no ques­tion marks re­main­ing.

So, for the de­ci­sion the­o­rist, the challenge pre­sented by the 5&10 prob­lem is to define an agent which se­lects the 10. (Of course, it had bet­ter se­lect the 10 via gen­er­al­iz­able rea­son­ing, not via spe­cial-case code which fails to do the right thing on other de­ci­sion prob­lems.) For a given agent in­serted into the prob­lem, there might be an is­sue or no is­sue at all.

We can write oth­er­wise plau­si­ble-look­ing agents which take the \$5, and for which it seems like the prob­lem is spu­ri­ous proofs; hence part of the challenge for the de­ci­sion the­o­rist seems to be the avoidance of spu­ri­ous proofs. But, not all agents face this prob­lem when in­serted into the world of 5&10. For ex­am­ple, agents which fol­low the chicken rule don’t have this prob­lem. This means that from the agent’s per­spec­tive, the 5&10 prob­lem does not nec­es­sar­ily look like a prob­lem of how to think about in­con­sis­tent ac­tions.

Trans­par­ent New­comb’s already comes with the op­tions and coun­ter­fac­tu­als at­tached. My in­ter­est is in how to con­struct them from scratch.

In the fram­ing above, where we dis­t­in­guish be­tween the view of the de­ci­sion the­o­rist and the view of the agent, I would say that:

• Often, as is (more or less) the case with trans­par­ent new­comb, a de­ci­sion prob­lem as-pre­sented-to-the-de­ci­sion-the­o­rist does come with op­tions and coun­ter­fac­tu­als at­tached. Then, the in­ter­est­ing prob­lem is usu­ally to de­sign an agent which (work­ing from gen­er­al­iz­able prin­ci­ples) re­cov­ers these cor­rectly from within its em­bed­ded per­spec­tive.

• Some­times, we might write down a de­ci­sion prob­lem as source code, or in some other for­mal­ism. Then, it may not be ob­vi­ous what the coun­ter­fac­tu­als are /​ should be, even from the de­ci­sion the­o­rist’s per­spec­tive. We take some­thing closer to the agent’s per­spec­tive, hav­ing to figure out for our­selves how to rea­son coun­ter­fac­tu­ally about the prob­lem.

• Some­times, a prob­lem is given with a full de­scrip­tion of its coun­ter­fac­tu­als, but the coun­ter­fac­tu­als as stated are clearly wrong: putting on our in­ter­pret-what-the-coun­ter­fac­tu­als-are hats, we come up with an an­swer which differs from the one given in the prob­lem state­ment. This means we need to be skep­ti­cal of the first case I men­tioned, where we think we know what the coun­ter­fac­tu­als are sup­posed to be and we’re just try­ing to get our agents to re­cover them cor­rectly.

Point be­ing, in all three cases I’m think­ing about the prob­lem of how to con­struct the coun­ter­fac­tu­als from scratch—even the first case where I en­dorse the coun­ter­fac­tu­als as given by the prob­lem. This is only pos­si­ble be­cause of the dis­tinc­tion I’m mak­ing be­tween a prob­lem as given to a de­ci­sion the­o­rist and the prob­lem as faced by an agent.

• The in­ter­pre­ta­tion is­sue of a de­ci­sion prob­lem should be mostly gone when we for­mally spec­ify it

In or­der to for­mally spec­ify a prob­lem, you will have already ex­plic­itly or im­plic­itly ex­pressed what an in­ter­pre­ta­tion of what de­ci­sion the­ory prob­lems are. But this doesn’t make the ques­tion, “Is this in­ter­pre­ta­tion valid?” dis­ap­pear. If we take my ap­proach, we will need to provide a philo­soph­i­cal jus­tifi­ca­tion for the for­get­ting; if we take yours, we’ll need to provide a philo­soph­i­cal jus­tifi­ca­tion that we care about the re­sults of these kinds of para­con­sis­tent situ­a­tions. Either way, there will be fur­ther work be­yond the for­mu­la­ri­sa­tion.

The de­ci­sion al­gorithm con­sid­ers each out­put from a given set… It’s a prop­erty of the for­mal­ism, but it doesn’t seem like a par­tic­u­larly con­cern­ing one

This ties into the point I’ll dis­cuss later about how I think be­ing able to ask an ex­ter­nal ob­server to eval­u­ate whether an ac­tual real agent took the op­ti­mal de­ci­sion is the core prob­lem in ty­ing real world de­ci­sion the­ory prob­lems to the more ab­stract the­o­ret­i­cal de­ci­sion the­ory prob­lems. Fur­ther down you write:

The agent already con­sid­ers what it con­sid­ers (just like it already does what it does)

But I’m try­ing to find a way of eval­u­at­ing an agent from the ex­ter­nal per­spec­tive. Here, it is valid to crit­i­cise an agent for not se­lect­ing as ac­tion that it didn’t con­sider. Fur­ther, it isn’t always clear what ac­tions are “con­sid­ered” as not all agent might have a loop over all ac­tions and they may use short­cuts to avoid ex­plic­itly eval­u­at­ing a cer­tain ac­tion.

I feel like I’m over-stat­ing my po­si­tion a bit in the fol­low­ing, but: this doesn’t seem any differ­ent from say­ing that if we provide a log­i­cal coun­ter­fac­tual, we solve de­ci­sion the­ory for free

“For­get­ting” has a large num­ber of free pa­ram­e­ters, but so does “de­on­tol­ogy” or “virtue ethics”. I’ve pro­vided some ex­am­ples and key de­tails about how this would pro­ceed, but I don’t think you can ex­pect too much more in this very pre­limi­nary stage. When I said that a for­get­ting crite­ria would solve the prob­lem of log­i­cal coun­ter­fac­tu­als for free, this was a slight ex­ag­ger­a­tion. We would still have to jus­tify why we care about raw coun­ter­fac­tu­als, but, ac­tu­ally be­ing con­sis­tent, this would seem to be a much eas­ier task than ar­gu­ing that we should care about what hap­pens in the kind of in­con­sis­tent situ­a­tions gen­er­ated by para­con­sis­tent ap­proaches.

I dis­agree with your foun­da­tions foun­da­tions post in so far as it de­scribes what I’m in­ter­ested in as not be­ing agent foun­da­tions foundations

I ac­tu­ally in­cluded the Smok­ing Le­sion Steel­man (https://​​www.al­ign­ment­fo­rum.org/​​s/​​fgHSwxFi­tysGKHH56/​​p/​​5bd75cc58225bf0670375452) as Foun­da­tions Foun­da­tions re­search. And CDT=EDT is pretty far along in this di­rec­tion as well (https://​​www.al­ign­ment­fo­rum.org/​​s/​​fgHSwxFi­tysGKHH56/​​p/​​x2wn2MWYSafDtm8Lf), al­though in my con­cep­tion of what Foun­da­tions Foun­da­tions re­search should look like, more at­ten­tion would have been paid to the pos­si­bil­ity of the EDT graph be­ing in­con­sis­tent, while the CDT graph was con­sis­tent.

Your ver­sion of the 5&10 prob­lem… The agent takes some ac­tion, since it is fully defined, and the prob­lem is that the de­ci­sion the­o­rist doesn’t know how to judge the agent’s de­ci­sion.

That’s ex­actly how I’d put it. Ex­cept I would say I’m in­ter­ested in the prob­lem from the ex­ter­nal per­spec­tive and the re­flec­tive per­spec­tive. I just see the ex­ter­nal per­spec­tive as eas­ier to un­der­stand first.

From the agent’s per­spec­tive, the 5&10 prob­lem does not nec­es­sar­ily look like a prob­lem of how to think about in­con­sis­tent actions

Sure. But the agent is think­ing about in­con­sis­tent ac­tions be­neath the sur­face which is why we have to worry about spu­ri­ous coun­ter­fac­tu­als. And this is im­por­tant for hav­ing a way of de­ter­min­ing if it is do­ing what it should be do­ing. (This be­comes more im­por­tant in the edge cases like Troll Bridge—https://​​agent­foun­da­tions.org/​​item?id=1711)

My in­ter­est is in how to con­struct them from scratch

Con­sider the fol­low­ing types of situ­a­tions:

1) A com­plete de­scrip­tion of a world, with an agent identified

2) A the­o­ret­i­cal de­ci­sion the­ory prob­lem viewed by an ex­ter­nal observer

3) A the­o­ret­i­cal de­ci­sion the­ory prob­lem viewed reflectively

I’m try­ing to get from 1->2, while you are try­ing to get from 2->3. What­ever for­mal­i­sa­tions we use need to ul­ti­mately re­late to the real world in some way, which is why I be­lieve that we need to un­der­stand the con­nec­tion from 1->2. We could also try con­nect­ing 1->3 di­rectly, al­though that seems much more challeng­ing. If we ig­nore the link from 1->2 and fo­cus solely on a link from 2->3, then we will end up im­plic­itly as­sum­ing a link from 1->2 which could in­volve as­sump­tions that we don’t ac­tu­ally want.

• Sounds like the dis­agree­ment has mostly landed in the area of ques­tions of what to in­ves­ti­gate first, which is pretty firmly “you do you” ter­ri­tory—what­ever most im­proves your own pic­ture of what’s go­ing on, that is very likely what you should be think­ing about.

On the other hand, I’m still left feel­ing like your ap­proach is not go­ing to be em­bed­ded enough. You say that in­ves­ti­gat­ing 2->3 first risks im­plic­itly as­sum­ing too much about 1->2. My sketchy re­sponse is that what we want in the end is not a pic­ture which is nec­es­sar­ily even con­sis­tent with hav­ing any 1->2 view. Every­thing is em­bed­ded, and im­plic­itly re­flec­tive, even the de­ci­sion the­o­rist think­ing about what de­ci­sion the­ory an agent should have. So, a firm 1->2 view can hurt rather than help, due to overly non-em­bed­ded as­sump­tions which have to be dis­carded later.

Us­ing some of the ideas from the em­bed­ded agency se­quence: a de­ci­sion the­o­rist may, in the course of eval­u­at­ing a de­ci­sion the­ory, con­sider a lot of #1-type situ­a­tions. How­ever, since the de­ci­sion the­o­rist is em­bed­ded as well, the de­ci­sion the­o­rist does not want to as­sume re­al­iz­abil­ity even with re­spect to their own on­tol­ogy. So, ul­ti­mately, the de­ci­sion the­o­rist wants a de­ci­sion the­ory to have “good be­hav­ior” on prob­lems where no #1-type view is available (mean­ing some sort of op­ti­mal­ity for non-re­al­iz­able cases).

• I re­ally ap­pre­ci­ate “Here’s a col­lec­tion of a lot of the work that has been done on this over the years, and im­por­tant sum­maries” type posts. Thanks for writ­ing this!

• I should note: This is my own idiosyn­cratic take on Log­i­cal Coun­ter­fac­tu­als, with many of the links refer­ring to my own posts and I don’t know if I’ve con­vinced any­one else of the mer­its of this ap­proach yet.

• How does the for­get­ting ap­proach differ from an up­date­less ap­proach (if it is sup­posed to)?

• Why do you think there is a good way to de­ter­mine which in­for­ma­tion should be for­got­ten in a given prob­lem, aside from hand anal­y­sis? (Hand anal­y­sis uti­lizes the de­ci­sion the­o­rist’s per­spec­tive, which is an ex­ter­nal per­spec­tive the agent lacks.)

• UDT* pro­vides a de­ci­sion the­ory given a de­ci­sion tree and a method of de­ter­min­ing sub­junc­tive links be­tween choices. I’m in­ves­ti­gat­ing how to de­ter­mine these sub­junc­tive links, which re­quires un­der­stand­ing what kind of thing a coun­ter­fac­tual is and what kind of thing a de­ci­sion is. The idea is that any solu­tion should nat­u­rally in­te­grate with UDT.

Firstly, even if this tech­nique were limited to hand anal­y­sis, I’d be quite pleased if this turned out to be a unify­ing the­ory be­hind our cur­rent in­tu­itions about how log­i­cal coun­ter­fac­tu­als should work. Be­cause if it were able to cover all or even just most of the cases, we’d at least know what as­sump­tions we were im­plic­itly mak­ing and it would provide a tar­get for crit­i­cism. Differ­ent sub­types of for­get­ting might be able to be iden­ti­fied; it wouldn’t sur­prised me if it turns out that the con­cept of a de­ci­sion ac­tu­ally needs to be dis­solved.

Se­condly, even if there doesn’t turn out to be a good way to figure out what in­for­ma­tion should be for­got­ten, I ex­pect that figur­ing out differ­ent ap­proaches would prove in­sight­ful, as would dis­cov­er­ing why there isn’t a good way to de­ter­mine what to for­get, if this is in­deed the case.

But, to be hon­est, I’ve not spent much time think­ing about how to de­ter­mine what in­for­ma­tion should be for­got­ten. I’m still cur­rently in the stage of try­ing to figure out whether this might be a use­ful re­search di­rec­tion.

*Per­haps there are other up­date­less ap­proaches, I don’t know about them ex­cept TDT, which is gen­er­ally con­sid­ered inferior

• When “you” is defined down to the atom, you can only im­ple­ment one de­ci­sion.

Once again: phys­i­cal de­ter­minism is not a fact.

• I’m con­fused, I’m claiming de­ter­minism, not indeterminism

• That was a typo, al­though ac­tu­ally nei­ther is a fact.