Two Alternatives to Logical Counterfactuals

Link post

The fol­low­ing is a cri­tique of the idea of log­i­cal coun­ter­fac­tu­als. The idea of log­i­cal coun­ter­fac­tu­als has ap­peared in pre­vi­ous agent foun­da­tions re­search (es­pe­cially at MIRI): here, here. “Im­pos­si­ble pos­si­ble wor­lds” have been con­sid­ered el­se­where in the liter­a­ture; see the SEP ar­ti­cle for a sum­mary.

I will start by mo­ti­vat­ing the prob­lem, which also gives an ac­count for what a log­i­cal coun­ter­fac­tual is meant to be.

Sup­pose you learn about physics and find that you are a robot. You learn that your source code is “A”. You also be­lieve that you have free will; in par­tic­u­lar, you may de­cide to take ei­ther ac­tion X or ac­tion Y. In fact, you take ac­tion X. Later, you simu­late “A” and find, un­sur­pris­ingly, that when you give it the ob­ser­va­tions you saw up to de­cid­ing to take ac­tion X or Y, it out­puts ac­tion X. How­ever, you, at the time, had the sense that you could have taken ac­tion Y in­stead. You want to be con­sis­tent with your past self, so you want to, at this later time, be­lieve that you could have taken ac­tion Y at the time. If you could have taken Y, then you do take Y in some pos­si­ble world (which still satis­fies the same laws of physics). In this pos­si­ble world, it is the case that “A” re­turns Y upon be­ing given those same ob­ser­va­tions. But, the out­put of “A” when given those ob­ser­va­tions is a fixed com­pu­ta­tion, so you now need to rea­son about a pos­si­ble world that is log­i­cally in­co­her­ent, given your knowl­edge that “A” in fact re­turns X. This pos­si­ble world is, then, a log­i­cal coun­ter­fac­tual: a “pos­si­ble world” that is log­i­cally in­co­her­ent.

To sum­ma­rize: a log­i­cal coun­ter­fac­tual is a no­tion of “what would have hap­pened” had you taken a differ­ent ac­tion af­ter see­ing your source code, and in that “what would have hap­pened”, the source code must out­put a differ­ent ac­tion than what you ac­tu­ally took; hence, this “what would have hap­pened” world is log­i­cally in­co­her­ent.

It is easy to see that this idea of log­i­cal coun­ter­fac­tu­als is un­satis­fac­tory. For one, no good ac­count of them has yet been given. For two, there is a sense in which no ac­count could be given; rea­son­ing about log­i­cally in­co­her­ent wor­lds can only be so ex­ten­sive be­fore run­ning into log­i­cal con­tra­dic­tion.

To ex­ten­sively re­fute the idea, it is nec­es­sary to provide an al­ter­na­tive ac­count of the mo­ti­vat­ing prob­lem(s) which dis­penses with the idea. Even if log­i­cal coun­ter­fac­tu­als are un­satis­fac­tory, the mo­ti­vat­ing prob­lem(s) re­main.

I now pre­sent two al­ter­na­tive ac­counts: coun­ter­fac­tual non­re­al­ism, and policy-de­pen­dent source code.

Coun­ter­fac­tual nonrealism

Ac­cord­ing to coun­ter­fac­tual non­re­al­ism, there is no fact of the mat­ter about what “would have hap­pened” had a differ­ent ac­tion been taken. There is, sim­ply, the se­quence of ac­tions you take, and the se­quence of ob­ser­va­tions you get. At the time of tak­ing an ac­tion, you are un­cer­tain about what that ac­tion is; hence, from your per­spec­tive, there are mul­ti­ple pos­si­bil­ities.

Given this un­cer­tainty, you may con­sider ma­te­rial con­di­tion­als: if I take ac­tion X, will con­se­quence Q nec­es­sar­ily fol­low? An ac­tion may be se­lected on the ba­sis of these con­di­tion­als, such as by de­ter­min­ing which ac­tion re­sults in the high­est guaran­teed ex­pected util­ity if that ac­tion is taken.

This is ba­si­cally the ap­proach taken in my post on sub­jec­tive im­pli­ca­tion de­ci­sion the­ory. It is also the ap­proach taken by proof-based UDT.

The ma­te­rial con­di­tion­als are ephemeral, in that at a later time, the agent will know that they could only have taken a cer­tain ac­tion (as­sum­ing they knew their source code be­fore tak­ing the ac­tion), due to hav­ing had longer to think by then; hence, all the origi­nal ma­te­rial con­di­tion­als will be vac­u­ously true. The ap­par­ent non­de­ter­minism is, then, only due to the epistemic limi­ta­tion of the agent at the time of mak­ing the de­ci­sion, a limi­ta­tion not faced by a later ver­sion of the agent (or an out­side agent) with more com­pu­ta­tion power.

This leads to a sort of rel­a­tivism: what is un­de­ter­mined from one per­spec­tive may be de­ter­mined from an­other. This makes global ac­count­ing difficult: it’s hard for one agent to eval­u­ate whether an­other agent’s ac­tion is any good, be­cause the two agents have differ­ent epistemic states, re­sult­ing in differ­ent judg­ments on ma­te­rial con­di­tion­als.

A prob­lem that comes up is that of “spu­ri­ous coun­ter­fac­tu­als” (an­a­lyzed in the linked pa­per on proof-based UDT). An agent may be­come sure of its own ac­tion be­fore that ac­tion is taken. Upon be­ing sure of that ac­tion, the agent will know the ma­te­rial im­pli­ca­tion that, if they take a differ­ent ac­tion, some­thing ter­rible will hap­pen (this ma­te­rial im­pli­ca­tion is vac­u­ously true). Hence the agent may take the ac­tion they were sure they would take, mak­ing the origi­nal cer­tainty self-fulfilling. (There are tech­ni­cal de­tails with how the agent be­comes cer­tain hav­ing to do with Löb’s the­o­rem).

The most nat­u­ral de­ci­sion the­ory re­sult­ing in this frame­work is time­less de­ci­sion the­ory (rather than up­date­less de­ci­sion the­ory). This is be­cause the agent up­dates on what they know about the world so far, and con­sid­ers the ma­te­rial im­pli­ca­tions of them­selves taken a cer­tain ac­tion; these im­pli­ca­tions in­clude log­i­cal im­pli­ca­tions if the agent knows their source code. Note that time­less de­ci­sion the­ory is dy­nam­i­cally in­con­sis­tent in the coun­ter­fac­tual mug­ging prob­lem.

Policy-de­pen­dent source code

A sec­ond ap­proach is to as­sert that one’s source code de­pends on one’s en­tire policy, rather than only one’s ac­tions up to see­ing one’s source code.

For­mally, a policy is a func­tion map­ping an ob­ser­va­tion his­tory to an ac­tion. It is dis­tinct from source code, in that the source code speci­fies the im­ple­men­ta­tion of the policy in some pro­gram­ming lan­guage, rather than it­self be­ing a policy func­tion.

Log­i­cally, it is im­pos­si­ble for the same source code to gen­er­ate two differ­ent poli­cies. There is a fact of the mat­ter about what ac­tion the source code out­puts given an ob­ser­va­tion his­tory (as­sum­ing the pro­gram halts). Hence there is no way for two differ­ent poli­cies to be com­pat­i­ble with the same source code.

Let’s re­turn to the robot thought ex­per­i­ment and re-an­a­lyze it in light of this. After the robot has seen that their source code is “A” and taken ac­tion X, the robot con­sid­ers what would have hap­pened if they had taken ac­tion Y in­stead. How­ever, if they had taken ac­tion Y in­stead, then their policy would, triv­ially, have to be differ­ent from their ac­tual policy, which takes ac­tion X. Hence, their source code would be differ­ent. Hence, they would not have seen that their source code is “A”.

In­stead, if the agent were to take ac­tion Y upon see­ing that their source code is “A”, their source code must be some­thing else, per­haps “B”. Hence, which ac­tion the agent would have taken de­pends di­rectly on their policy’s be­hav­ior upon see­ing that the source code is “B”, and in­di­rectly on the en­tire policy (as source code de­pends on policy).

We see, then, that the origi­nal thought ex­per­i­ment en­codes a rea­son­ing er­ror. The later agent wants to ask what would have hap­pened if they had taken a differ­ent ac­tion af­ter know­ing their source code; how­ever, the agent ne­glects that such a policy change would have re­sulted in see­ing differ­ent source code! Hence, there is no need to posit a log­i­cally in­co­her­ent pos­si­ble world.

The rea­son­ing er­ror came about due to us­ing a con­ven­tional, lin­ear no­tion of in­ter­ac­tive causal­ity. In­tu­itively, what you see up to time t de­pends only on your ac­tions be­fore time t. How­ever, policy-de­pen­dent source code breaks this con­di­tion. What source code you see that you have de­pends on your en­tire policy, not just what ac­tions you took up to see­ing your source code. Hence, rea­son­ing un­der policy-de­pen­dent source code re­quires aban­don­ing lin­ear in­ter­ac­tive causal­ity.

The most nat­u­ral de­ci­sion the­ory re­sult­ing from this ap­proach is up­date­less de­ci­sion the­ory, rather that time­less de­ci­sion the­ory, as it is the en­tire policy that the coun­ter­fac­tual is on.


Be­fore very re­cently, my philo­soph­i­cal ap­proach had been coun­ter­fac­tual non­re­al­ism. How­ever, I am now more com­pel­led by policy-de­pen­dent source code, af­ter hav­ing an­a­lyzed it. I be­lieve this ap­proach fixes the main prob­lem of coun­ter­fac­tual non­re­al­ism, namely rel­a­tivism mak­ing global ac­count­ing difficult. It also fixes the in­her­ent dy­namic in­con­sis­tency prob­lems that TDT has rel­a­tive to UDT (which are re­lated to the rel­a­tivism).

I be­lieve the re-anal­y­sis I have pro­vided of the thought ex­per­i­ment mo­ti­vat­ing log­i­cal coun­ter­fac­tu­als is suffi­cient to re­fute the origi­nal in­ter­pre­ta­tion, and thus to de-mo­ti­vate log­i­cal coun­ter­fac­tu­als.

The main prob­lem with policy-de­pen­dent source code is that, since it vi­o­lates lin­ear in­ter­ac­tive causal­ity, anal­y­sis is cor­re­spond­ingly more difficult. Hence, there is fur­ther work to be done in con­sid­er­ing sim­plified en­vi­ron­ment classes where pos­si­ble sim­plify­ing as­sump­tions (in­clud­ing lin­ear in­ter­ac­tive causal­ity) can be made. It is crit­i­cal, though, that the lin­ear in­ter­ac­tive causal­ity as­sump­tion not be used in an­a­lyz­ing cases of an agent learn­ing their source code, as this re­sults in log­i­cal in­co­her­ence.