# Two Alternatives to Logical Counterfactuals

The fol­low­ing is a cri­tique of the idea of log­i­cal coun­ter­fac­tu­als. The idea of log­i­cal coun­ter­fac­tu­als has ap­peared in pre­vi­ous agent foun­da­tions re­search (es­pe­cially at MIRI): here, here. “Im­pos­si­ble pos­si­ble wor­lds” have been con­sid­ered el­se­where in the liter­a­ture; see the SEP ar­ti­cle for a sum­mary.

I will start by mo­ti­vat­ing the prob­lem, which also gives an ac­count for what a log­i­cal coun­ter­fac­tual is meant to be.

Sup­pose you learn about physics and find that you are a robot. You learn that your source code is “A”. You also be­lieve that you have free will; in par­tic­u­lar, you may de­cide to take ei­ther ac­tion X or ac­tion Y. In fact, you take ac­tion X. Later, you simu­late “A” and find, un­sur­pris­ingly, that when you give it the ob­ser­va­tions you saw up to de­cid­ing to take ac­tion X or Y, it out­puts ac­tion X. How­ever, you, at the time, had the sense that you could have taken ac­tion Y in­stead. You want to be con­sis­tent with your past self, so you want to, at this later time, be­lieve that you could have taken ac­tion Y at the time. If you could have taken Y, then you do take Y in some pos­si­ble world (which still satis­fies the same laws of physics). In this pos­si­ble world, it is the case that “A” re­turns Y upon be­ing given those same ob­ser­va­tions. But, the out­put of “A” when given those ob­ser­va­tions is a fixed com­pu­ta­tion, so you now need to rea­son about a pos­si­ble world that is log­i­cally in­co­her­ent, given your knowl­edge that “A” in fact re­turns X. This pos­si­ble world is, then, a log­i­cal coun­ter­fac­tual: a “pos­si­ble world” that is log­i­cally in­co­her­ent.

To sum­ma­rize: a log­i­cal coun­ter­fac­tual is a no­tion of “what would have hap­pened” had you taken a differ­ent ac­tion af­ter see­ing your source code, and in that “what would have hap­pened”, the source code must out­put a differ­ent ac­tion than what you ac­tu­ally took; hence, this “what would have hap­pened” world is log­i­cally in­co­her­ent.

It is easy to see that this idea of log­i­cal coun­ter­fac­tu­als is un­satis­fac­tory. For one, no good ac­count of them has yet been given. For two, there is a sense in which no ac­count could be given; rea­son­ing about log­i­cally in­co­her­ent wor­lds can only be so ex­ten­sive be­fore run­ning into log­i­cal con­tra­dic­tion.

To ex­ten­sively re­fute the idea, it is nec­es­sary to provide an al­ter­na­tive ac­count of the mo­ti­vat­ing prob­lem(s) which dis­penses with the idea. Even if log­i­cal coun­ter­fac­tu­als are un­satis­fac­tory, the mo­ti­vat­ing prob­lem(s) re­main.

I now pre­sent two al­ter­na­tive ac­counts: coun­ter­fac­tual non­re­al­ism, and policy-de­pen­dent source code.

## Coun­ter­fac­tual nonrealism

Ac­cord­ing to coun­ter­fac­tual non­re­al­ism, there is no fact of the mat­ter about what “would have hap­pened” had a differ­ent ac­tion been taken. There is, sim­ply, the se­quence of ac­tions you take, and the se­quence of ob­ser­va­tions you get. At the time of tak­ing an ac­tion, you are un­cer­tain about what that ac­tion is; hence, from your per­spec­tive, there are mul­ti­ple pos­si­bil­ities.

Given this un­cer­tainty, you may con­sider ma­te­rial con­di­tion­als: if I take ac­tion X, will con­se­quence Q nec­es­sar­ily fol­low? An ac­tion may be se­lected on the ba­sis of these con­di­tion­als, such as by de­ter­min­ing which ac­tion re­sults in the high­est guaran­teed ex­pected util­ity if that ac­tion is taken.

This is ba­si­cally the ap­proach taken in my post on sub­jec­tive im­pli­ca­tion de­ci­sion the­ory. It is also the ap­proach taken by proof-based UDT.

The ma­te­rial con­di­tion­als are ephemeral, in that at a later time, the agent will know that they could only have taken a cer­tain ac­tion (as­sum­ing they knew their source code be­fore tak­ing the ac­tion), due to hav­ing had longer to think by then; hence, all the origi­nal ma­te­rial con­di­tion­als will be vac­u­ously true. The ap­par­ent non­de­ter­minism is, then, only due to the epistemic limi­ta­tion of the agent at the time of mak­ing the de­ci­sion, a limi­ta­tion not faced by a later ver­sion of the agent (or an out­side agent) with more com­pu­ta­tion power.

This leads to a sort of rel­a­tivism: what is un­de­ter­mined from one per­spec­tive may be de­ter­mined from an­other. This makes global ac­count­ing difficult: it’s hard for one agent to eval­u­ate whether an­other agent’s ac­tion is any good, be­cause the two agents have differ­ent epistemic states, re­sult­ing in differ­ent judg­ments on ma­te­rial con­di­tion­als.

A prob­lem that comes up is that of “spu­ri­ous coun­ter­fac­tu­als” (an­a­lyzed in the linked pa­per on proof-based UDT). An agent may be­come sure of its own ac­tion be­fore that ac­tion is taken. Upon be­ing sure of that ac­tion, the agent will know the ma­te­rial im­pli­ca­tion that, if they take a differ­ent ac­tion, some­thing ter­rible will hap­pen (this ma­te­rial im­pli­ca­tion is vac­u­ously true). Hence the agent may take the ac­tion they were sure they would take, mak­ing the origi­nal cer­tainty self-fulfilling. (There are tech­ni­cal de­tails with how the agent be­comes cer­tain hav­ing to do with Löb’s the­o­rem).

The most nat­u­ral de­ci­sion the­ory re­sult­ing in this frame­work is time­less de­ci­sion the­ory (rather than up­date­less de­ci­sion the­ory). This is be­cause the agent up­dates on what they know about the world so far, and con­sid­ers the ma­te­rial im­pli­ca­tions of them­selves taken a cer­tain ac­tion; these im­pli­ca­tions in­clude log­i­cal im­pli­ca­tions if the agent knows their source code. Note that time­less de­ci­sion the­ory is dy­nam­i­cally in­con­sis­tent in the coun­ter­fac­tual mug­ging prob­lem.

## Policy-de­pen­dent source code

A sec­ond ap­proach is to as­sert that one’s source code de­pends on one’s en­tire policy, rather than only one’s ac­tions up to see­ing one’s source code.

For­mally, a policy is a func­tion map­ping an ob­ser­va­tion his­tory to an ac­tion. It is dis­tinct from source code, in that the source code speci­fies the im­ple­men­ta­tion of the policy in some pro­gram­ming lan­guage, rather than it­self be­ing a policy func­tion.

Log­i­cally, it is im­pos­si­ble for the same source code to gen­er­ate two differ­ent poli­cies. There is a fact of the mat­ter about what ac­tion the source code out­puts given an ob­ser­va­tion his­tory (as­sum­ing the pro­gram halts). Hence there is no way for two differ­ent poli­cies to be com­pat­i­ble with the same source code.

Let’s re­turn to the robot thought ex­per­i­ment and re-an­a­lyze it in light of this. After the robot has seen that their source code is “A” and taken ac­tion X, the robot con­sid­ers what would have hap­pened if they had taken ac­tion Y in­stead. How­ever, if they had taken ac­tion Y in­stead, then their policy would, triv­ially, have to be differ­ent from their ac­tual policy, which takes ac­tion X. Hence, their source code would be differ­ent. Hence, they would not have seen that their source code is “A”.

In­stead, if the agent were to take ac­tion Y upon see­ing that their source code is “A”, their source code must be some­thing else, per­haps “B”. Hence, which ac­tion the agent would have taken de­pends di­rectly on their policy’s be­hav­ior upon see­ing that the source code is “B”, and in­di­rectly on the en­tire policy (as source code de­pends on policy).

We see, then, that the origi­nal thought ex­per­i­ment en­codes a rea­son­ing er­ror. The later agent wants to ask what would have hap­pened if they had taken a differ­ent ac­tion af­ter know­ing their source code; how­ever, the agent ne­glects that such a policy change would have re­sulted in see­ing differ­ent source code! Hence, there is no need to posit a log­i­cally in­co­her­ent pos­si­ble world.

The rea­son­ing er­ror came about due to us­ing a con­ven­tional, lin­ear no­tion of in­ter­ac­tive causal­ity. In­tu­itively, what you see up to time t de­pends only on your ac­tions be­fore time t. How­ever, policy-de­pen­dent source code breaks this con­di­tion. What source code you see that you have de­pends on your en­tire policy, not just what ac­tions you took up to see­ing your source code. Hence, rea­son­ing un­der policy-de­pen­dent source code re­quires aban­don­ing lin­ear in­ter­ac­tive causal­ity.

The most nat­u­ral de­ci­sion the­ory re­sult­ing from this ap­proach is up­date­less de­ci­sion the­ory, rather that time­less de­ci­sion the­ory, as it is the en­tire policy that the coun­ter­fac­tual is on.

## Conclusion

Be­fore very re­cently, my philo­soph­i­cal ap­proach had been coun­ter­fac­tual non­re­al­ism. How­ever, I am now more com­pel­led by policy-de­pen­dent source code, af­ter hav­ing an­a­lyzed it. I be­lieve this ap­proach fixes the main prob­lem of coun­ter­fac­tual non­re­al­ism, namely rel­a­tivism mak­ing global ac­count­ing difficult. It also fixes the in­her­ent dy­namic in­con­sis­tency prob­lems that TDT has rel­a­tive to UDT (which are re­lated to the rel­a­tivism).

I be­lieve the re-anal­y­sis I have pro­vided of the thought ex­per­i­ment mo­ti­vat­ing log­i­cal coun­ter­fac­tu­als is suffi­cient to re­fute the origi­nal in­ter­pre­ta­tion, and thus to de-mo­ti­vate log­i­cal coun­ter­fac­tu­als.

The main prob­lem with policy-de­pen­dent source code is that, since it vi­o­lates lin­ear in­ter­ac­tive causal­ity, anal­y­sis is cor­re­spond­ingly more difficult. Hence, there is fur­ther work to be done in con­sid­er­ing sim­plified en­vi­ron­ment classes where pos­si­ble sim­plify­ing as­sump­tions (in­clud­ing lin­ear in­ter­ac­tive causal­ity) can be made. It is crit­i­cal, though, that the lin­ear in­ter­ac­tive causal­ity as­sump­tion not be used in an­a­lyz­ing cases of an agent learn­ing their source code, as this re­sults in log­i­cal in­co­her­ence.

• I too have re­cently up­dated (some­what) away from coun­ter­fac­tual non-re­al­ism. I have a lot of stuff I need to work out and write about it.

I seem to have a lot of dis­agree­ments with your post.

Given this un­cer­tainty, you may con­sider ma­te­rial con­di­tion­als: if I take ac­tion X, will con­se­quence Q nec­es­sar­ily fol­low? An ac­tion may be se­lected on the ba­sis of these con­di­tion­als, such as by de­ter­min­ing which ac­tion re­sults in the high­est guaran­teed ex­pected util­ity if that ac­tion is taken.

I don’t think ma­te­rial con­di­tion­als are the best way to cash out coun­ter­fac­tual non-re­al­ism.

• The ba­sic rea­son I think it’s bad is the happy dance prob­lem. This makes it seem clear that the sen­tence to con­di­tion on should not be .

• If the ac­tion can be viewed as a func­tion of ob­ser­va­tions, con­di­tion­ing on makes sense. But this is sort of like already hav­ing coun­ter­fac­tu­als, or at least, be­ing re­al­ist that there are coun­ter­fac­tu­als about whan would do if the agent ob­served differ­ent things. So this re­sponse can be seen as aban­don­ing coun­ter­fac­tual non-re­al­ism.

• A differ­ent ap­proach is to con­sider con­di­tional be­liefs rather than ma­te­rial im­pli­ca­tions. I think this is more true to coun­ter­fac­tual non-re­al­ism. In the sim­plest form, this means you just con­di­tion on ac­tions (rather than try­ing to con­di­tion on some­thing like or ). How­ever, in or­der to rea­son up­date­lessly, you need some­thing like con­di­tion­ing on con­di­tion­als, which com­pli­cates mat­ters.

• Another rea­son to think it’s bad is Troll Bridge.

• Again if the agent thinks there are ba­sic coun­ter­fac­tual facts, (re­quired to re­spect but lit­tle else—ie en­tirely de­ter­mined by sub­jec­tive be­liefs), then the agent can es­cape Troll Bridge by dis­agree­ing with the rele­vant in­fer­ence. But this, of course, re­jects the kind of coun­ter­fac­tual non-re­al­ism you in­tend.

• To be more in line with coun­ter­fac­tual non-re­al­ism, we would like to use con­di­tional prob­a­bil­ities in­stead. How­ever, con­di­tional prob­a­bil­ity be­haves too much like ma­te­rial im­pli­ca­tion to block the Troll Bridge ar­gu­ment. How­ever, I be­lieve that there is an ac­count of con­di­tional prob­a­bil­ity which avoids this by re­ject­ing the ra­tio anal­y­sis of con­di­tional prob­a­bil­ity—ie Bayes’ defi­ni­tion—and in­stead re­gards con­di­tional prob­a­bil­ity as a ba­sic en­tity. (Along the lines of what Alan Há­jek goes on and on about.) Thus an EDT-like pro­ce­dure can be im­mune to both 5-and-10 and Troll Bridge. (I claim.)

As for policy-de­pen­dent source code, I find my­self quite un­sym­pa­thetic to this view.

• If the agent is up­date­ful, this is just say­ing that in coun­ter­fac­tu­als where the agent does some­thing else, it might have differ­ent source code. Which seems fine, but does it re­ally solve any­thing? Why is this much bet­ter than coun­ter­fac­tu­als which keep the source code fixed but imag­ine the ex­e­cu­tion trace be­ing differ­ent? This seems to only push the rough spots fur­ther back—there can still be con­tra­dic­tions, e.g. be­tween the source code and the pro­cess by which pro­gram­mers wrote the source code. Do you imag­ine it is pos­si­ble to en­tirely re­move such rough spots from the coun­ter­fac­tu­als?

• So it seems you in­tend the agent to be up­date­less in­stead. But then we have all the usual is­sues with log­i­cal up­date­less­ness. If the agent is log­i­cally up­date­less, there is ab­solutely no rea­son to think that its be­liefs about the con­nec­tions be­tween source code and ac­tual policy be­hav­ior is any good. Mak­ing those con­nec­tions re­quires ac­tual rea­son­ing, not sim­ply a good enough prior—which means be­ing log­i­cally up­date­ful. So it’s un­clear what to do.

• Per­haps log­i­cally-up­date­ful policy-de­pen­dent-source-code is the most rea­son­able ver­sion of the idea. But then we are faced with the usual ques­tions about spu­ri­ous coun­ter­fac­tu­als, chicken rule, ex­plo­ra­tion, and Troll Bridge. So we still have to make choices about those things.

• In the happy dance prob­lem, when the agent is con­sid­er­ing do­ing a happy dance, the agent should have already up­dated on M. This is more like time­less de­ci­sion the­ory than up­date­less de­ci­sion the­ory.

Con­di­tion­ing on ‘A(obs) = act’ is still a con­di­tional, not a coun­ter­fac­tual. The differ­ence be­tween con­di­tion­als and coun­ter­fac­tu­als is the differ­ence be­tween “If Oswald didn’t kill Kennedy, then some­one else did” and “If Oswald didn’t kill Kennedy, then some­one else would have”.

In­deed, troll bridge will pre­sent a prob­lem for “play­ing chicken” ap­proaches, which are prob­a­bly nec­es­sary in coun­ter­fac­tual non­re­al­ism.

For policy-de­pen­dent source code, I in­tend for the agent to be log­i­cally up­date­ful, while up­date­less about ob­ser­va­tions.

Why is this much bet­ter than coun­ter­fac­tu­als which keep the source code fixed but imag­ine the ex­e­cu­tion trace be­ing differ­ent?

Be­cause it doesn’t lead to log­i­cal in­co­her­ence, so rea­son­ing about coun­ter­fac­tu­als doesn’t have to be limited.

This seems to only push the rough spots fur­ther back—there can still be con­tra­dic­tions, e.g. be­tween the source code and the pro­cess by which pro­gram­mers wrote the source code.

If you see your source code is B in­stead of A, you should an­ti­ci­pate learn­ing that the pro­gram­mers pro­grammed B in­stead of A, which means some­thing was differ­ent in the pro­cess. So the coun­ter­fac­tual has im­pli­ca­tions back­wards in phys­i­cal time.

At some point it will ground out in: differ­ent in­dex­i­cal facts, differ­ent laws of physics, differ­ent ini­tial con­di­tions, differ­ent ran­dom events...

This the­ory isn’t worked out yet but it doesn’t yet seem that it will run into log­i­cal in­co­her­ence, the way log­i­cal coun­ter­fac­tu­als do.

But then we are faced with the usual ques­tions about spu­ri­ous coun­ter­fac­tu­als, chicken rule, ex­plo­ra­tion, and Troll Bridge.

Maybe some of these.

Spu­ri­ous coun­ter­fac­tu­als re­quire get­ting a proof of “I will take ac­tion X”. The proof pro­ceeds by show­ing “source code A out­puts ac­tion X”. But an agent who ac­cepts policy-de­pen­dent source code will be­lieve they have source code other than A if they don’t take ac­tion X. So the spu­ri­ous proof doesn’t pre­vent the coun­ter­fac­tual from be­ing eval­u­ated.

Chicken rule is hence un­nec­es­sary.

Ex­plo­ra­tion is a mat­ter of whether the world model is any good; the world model may, for ex­am­ple, map a policy to a dis­tri­bu­tion of ex­pected ob­ser­va­tions. (That is, the world model already has policy coun­ter­fac­tu­als as part of it; the­o­ries such as physics provide con­straints on the world model rather than fully de­ter­min­ing it). Learn­ing a good world model is of course a prob­lem in any ap­proach.

Whether troll bridge is a prob­lem de­pends on how the source code coun­ter­fac­tual is eval­u­ated. In­deed, many ways of run­ning this coun­ter­fac­tual (e.g. in­sert­ing spe­cial cases into the source code) are “stupid” and could be pun­ished in a troll bridge prob­lem.

I by no means think “policy-de­pen­dent source code” is presently a well worked-out the­ory; the ad­van­tage rel­a­tive to log­i­cal coun­ter­fac­tu­als is that in the lat­ter case, there is a strong the­o­ret­i­cal ob­sta­cle to ever hav­ing a well worked-out the­ory, namely log­i­cal in­co­her­ence of the coun­ter­fac­tu­als. Hence, com­ing up with a the­ory of policy-de­pen­dent source code seems more likely to suc­ceed than com­ing up with a the­ory of log­i­cal coun­ter­fac­tu­als.

• If you see your source code is B in­stead of A, you should an­ti­ci­pate learn­ing that the pro­gram­mers pro­grammed B in­stead of A, which means some­thing was differ­ent in the pro­cess. So the coun­ter­fac­tual has im­pli­ca­tions back­wards in phys­i­cal time.

At some point it will ground out in: differ­ent in­dex­i­cal facts, differ­ent laws of physics, differ­ent ini­tial con­di­tions, differ­ent ran­dom events...

I’m not sure how you are think­ing about this. It seems to me like this will im­ply re­ally rad­i­cal changes to the uni­verse. Sup­pose the agent is choos­ing be­tween a left path and a right path. Its ac­tual pro­gram­ming will go left. It has to come up with al­ter­nate pro­gram­ming which would make it go right, in or­der to con­sider that sce­nario. The most prob­a­ble uni­verse in which its pro­gram­ming would make it go right is po­ten­tially re­ally differ­ent from our own. In par­tic­u­lar, it is a uni­verse where it would go right de­spite ev­ery­thing it has ob­served, a life­time of (up­date­less) learn­ing, which in the real uni­verse, has taught it that it should go left in situ­a­tions like this.

EG, per­haps it has faced an iter­ated 5&10 prob­lem, where left always yields 10. It has to con­sider al­ter­nate selves who, faced with that his­tory, go right.

It just seems im­plau­si­ble that think­ing about uni­verses like that will re­sult in sys­tem­at­i­cally good de­ci­sions. In the iter­ated 5&10 ex­am­ple, per­haps uni­verses where its pro­gram­ming fails iter­ated 5&10 are uni­verses where iter­ated 5&10 is an ex­ceed­ingly un­likely situ­a­tion; so in fact, the re­ward for go­ing right is quite un­likely to be 5, and very likely to be 100. Then the AI would choose to go right.

Ob­vi­ously, this is not nec­es­sar­ily how you are think­ing about it at all—as you said, you haven’t given an ac­tual de­ci­sion pro­ce­dure. But the idea of con­sid­er­ing only re­ally con­sis­tent coun­ter­fac­tual wor­lds seems quite prob­le­matic.

• I agree this is a prob­lem, but isn’t this a prob­lem for log­i­cal coun­ter­fac­tual ap­proaches as well? Isn’t it also weird for a known fixed op­ti­mizer source code to pro­duce a differ­ent re­sult on this de­ci­sion where it’s ob­vi­ous that ‘left’ is the best de­ci­sion?

If you as­sume that the agent chose ‘right’, it’s more rea­son­able to think it’s be­cause it’s not a pure op­ti­mizer than that a pure op­ti­mizer would have cho­sen ‘right’, in my view.

If you form the in­tent to, as a policy, go ‘right’ on the 100th turn, you should an­ti­ci­pate learn­ing that your source code is not the code of a pure op­ti­mizer.

• I’m left with the feel­ing that you don’t see the prob­lem I’m point­ing at.

My con­cern is that the most plau­si­ble world where you aren’t a pure op­ti­mizer might look very very differ­ent, and whether this very very differ­ent world looks bet­ter or worse than the nor­mal-look­ing world does not seem very rele­vant to the cur­rent de­ci­sion.

Con­sider the “spe­cial ex­cep­tion selves” you men­tion—the Nth ex­cep­tion-self has a hard-coded ex­cep­tion “go right if it’s beet at least N turns and you’ve gone right at most 1/​N of the time”.

Now let’s sup­pose that the wor­lds which give rise to ex­cep­tion-selves are a bit wild. That is to say, the re­wards in those wor­lds have pretty high var­i­ance. So a sig­nifi­cant frac­tion of them have quite high re­ward—let’s just say 10% of them have value much higher than is achiev­able in the real world.

So we ex­pect that by around N=10, there will be an ex­cep­tion-self liv­ing in a world that looks re­ally good.

This sug­gests to me that the policy-de­pen­dent-source agent can­not learn to go left > 90% of the time, be­cause once it crosses that thresh­hold, the ex­cep­tion-self in the re­ally good look­ing world is ready to trig­ger its ex­cep­tion—so go­ing right starts to ap­pear re­ally good. The agent goes right un­til it is un­der the thresh­hold again.

If that’s true, then it seems to me rather bad: the agent ends up re­peat­edly go­ing right in a situ­a­tion where it should be able to learn to go left eas­ily. Its rea­son for re­peat­edly go­ing right? There is one en­tic­ing world, which looks much like the real world, ex­cept that in that world the agent definitely goes right. Be­cause that agent is a lucky agent who gets a lot of util­ity, the ac­tual agent has de­cided to copy its be­hav­ior ex­actly—any­thing else would prove the real agent un­lucky, which would be sad.

Of course, this out­come is far from ob­vi­ous; I’m play­ing fast and loose with how this sort of agent might rea­son.

• I think it’s worth ex­am­in­ing more closely what it means to be “not a pure op­ti­mizer”. For­mally, a VNM util­ity func­tion is a ra­tio­nal­iza­tion of a co­her­ent policy. Say that you have some idea about what your util­ity func­tion is, U. Sup­pose you then de­cide to fol­low a policy that does not max­i­mize U. Log­i­cally, it fol­lows that U is not re­ally your util­ity func­tion; ei­ther your policy doesn’t co­her­ently max­i­mize any util­ity func­tion, or it max­i­mizes some other util­ity func­tion. (Be­cause the util­ity func­tion is, by defi­ni­tion, a ra­tio­nal­iza­tion of the policy)

Failing to dis­am­biguate these two no­tions of “the agent’s util­ity func­tion” is a map-ter­ri­tory er­ror.

De­ci­sion the­o­ries re­quire, as in­put, a util­ity func­tion to max­i­mize, and out­put a policy. If a de­ci­sion the­ory is adopted by an agent who is us­ing it to de­ter­mine their policy (rather than already know­ing their policy), then they are op­er­at­ing on some pre­limi­nary idea about what their util­ity func­tion is. Their “ac­tual” util­ity func­tion is de­pen­dent on their policy; it need not match up with their idea.

So, it is very much pos­si­ble for an agent who is op­er­at­ing on an idea U of their util­ity func­tion, to eval­u­ate coun­ter­fac­tu­als in which their true be­hav­ioral util­ity func­tion is not U. In­deed, this is im­plied by the fact that util­ity func­tions are ra­tio­nal­iza­tions of poli­cies.

Let’s look at the “turn left/​right” ex­am­ple. The agent is op­er­at­ing on a util­ity func­tion idea U, which is higher the more the agent turns left. When they eval­u­ate the policy of turn­ing “right” on the 10th time, they must con­clude that, in this hy­po­thet­i­cal, ei­ther (a) “right” max­i­mizes U, (b) they are max­i­miz­ing some util­ity func­tion other than U, or (c) they aren’t a max­i­mizer at all.

The log­i­cal coun­ter­fac­tual frame­work says the an­swer is (a): that the fixed com­pu­ta­tion of U-max­i­miza­tion re­sults in turn­ing right, not left. But, this is ac­tu­ally the weirdest of the three wor­lds. It is hard to imag­ine ways that “right” max­i­mizes U, whereas it is easy to imag­ine that the agent is max­i­miz­ing a util­ity func­tion other than U, or is not a max­i­mizer.

Yes, the (b) and (c) wor­lds may be weird in a prob­le­matic way. How­ever, it is hard to imag­ine these be­ing nearly as weird as (a).

One way they could be weird is that an agent hav­ing a com­plex util­ity func­tion is likely to have been pro­duced by a differ­ent pro­cess than an agent with a sim­ple util­ity func­tion. So the more weird ex­cep­tional de­ci­sions you make, the greater the ev­i­dence is that you were pro­duced by the sort of pro­cess that pro­duces com­plex util­ity func­tions.

This is pretty similar to the smok­ing le­sion prob­lem, then. I ex­pect that policy-de­pen­dent source code will have a lot in com­mon with EDT, as they both con­sider “what sort of agent I am” to be a con­se­quence of one’s policy. (How­ever, as you’ve pointed out, there are im­por­tant com­pli­ca­tions with the fram­ing of the smok­ing le­sion prob­lem)

I think fur­ther dis­am­bigua­tion on this could benefit from re-an­a­lyz­ing the smok­ing le­sion prob­lem (or a similar prob­lem), but I’m not sure if I have the right set of con­cepts for this yet.

• OK, all of that made sense to me. I find the di­rec­tion more plau­si­ble than when I first read your post, al­though it still seems like it’ll fall to the prob­lem I sketched.

I both like and hate that it treats log­i­cal un­cer­tainty in a rad­i­cally differ­ent way from em­piri­cal un­cer­tainty—like, be­cause we have so far failed to find any way to treat the two uniformly (be­sides be­ing en­tirely up­date­ful that is); and hate, be­cause it still feels so wrong for the two to be very differ­ent.

• Con­di­tion­ing on ‘A(obs) = act’ is still a con­di­tional, not a coun­ter­fac­tual. The differ­ence be­tween con­di­tion­als and coun­ter­fac­tu­als is the differ­ence be­tween “If Oswald didn’t kill Kennedy, then some­one else did” and “If Oswald didn’t kill Kennedy, then some­one else would have”.

I still dis­agree. We need a coun­ter­fac­tual struc­ture in or­der to con­sider the agent as a func­tion A(obs). EG, if the agent is a com­puter pro­gram, the func­tion would con­tain all the coun­ter­fac­tual in­for­ma­tion about what the agent would do if it ob­served differ­ent things. Hence, con­sid­er­ing the agent’s com­puter pro­gram as such a func­tion lev­er­ages an on­tolog­i­cal com­mit­ment to those coun­ter­fac­tu­als.

To illus­trate this, con­sider coun­ter­fac­tual mug­ging where we already see that the coin is heads—so, there is noth­ing we can do, we are at the mercy of our coun­ter­fac­tual part­ner. But sup­pose we haven’t yet ob­served whether Omega gives us the money.

A “real coun­ter­fac­tual” is one which can be true or false in­de­pen­dently of whether its con­di­tion is met. In this case, if we be­lieve in real coun­ter­fac­tu­als, we be­lieve that there is a fact of the mat­ter about what we do in the case, even though the coin came up heads. If we don’t be­lieve in real coun­ter­fac­tu­als, we in­stead think only that there is a fact of how Omega is com­put­ing “what I would have done if the coin had been tails”—but we do not be­lieve there is any “cor­rect” way for Omega to com­pute that.

The rep­re­sen­ta­tion and the rep­re­sen­ta­tion both ap­pear to satisfy this test of non-re­al­ism. The first is always true if the ob­ser­va­tion is false, so, lacks the abil­ity to vary in­de­pen­dently of the ob­ser­va­tion. The sec­ond is un­defined when the ob­ser­va­tion is false, which is per­haps even more ap­peal­ing for the non-re­al­ist.

Now con­sider the rep­re­sen­ta­tion. can still vary even when we know . So, it fails this test—it is a re­al­ist rep­re­sen­ta­tion!

Put­ting some­thing into func­tional form im­putes a causal/​coun­ter­fac­tual struc­ture.

• In the happy dance prob­lem, when the agent is con­sid­er­ing do­ing a happy dance, the agent should have already up­dated on M. This is more like time­less de­ci­sion the­ory than up­date­less de­ci­sion the­ory.

I agree that this gets around the prob­lem, but to me the happy dance prob­lem is still sug­ges­tive—it looks like the ma­te­rial con­di­tional is the wrong rep­re­sen­ta­tion of the thing we want to con­di­tion on.

Also—if the agent has already up­dated on ob­ser­va­tions, then up­dat­ing on is just the same as up­dat­ing on . So this differ­ence only mat­ters in the up­date­less case, where it seems to cause us trou­ble.

• It is easy to see that this idea of log­i­cal coun­ter­fac­tu­als is un­satis­fac­tory. For one, no good ac­count of them has yet been given. For two, there is a sense in which no ac­count could be given; rea­son­ing about log­i­cally in­co­her­ent wor­lds can only be so ex­ten­sive be­fore run­ning into log­i­cal con­tra­dic­tion.

I’ve been do­ing some work on this topic, and I am see­ing two schools of thought on how to deal with the prob­lem of log­i­cal con­tra­dic­tions you men­tion. To ex­plain these, I’ll use an ex­am­ple coun­ter­fac­tual not in­volv­ing agents and free will. Con­sider the coun­ter­fac­tual sen­tence: if the vase had not been bro­ken, the floor would not have been wet’. Now, how can we com­pute a truth value for this sen­tence?

School of thought 1 pro­ceeds as fol­lows: we know var­i­ous facts about the world, like that the vase is bro­ken and that the floor is wet. We also know gen­eral facts about vases, break­ing, wa­ter, and floors. Now we add the ex­tra fact that the vase is not bro­ken to our knowl­edge base. Based on this ex­tended body of knowl­edge, we com­pute the truth value of the claim ‘the floor is not wet’. Clearly, we are deal­ing with a knowl­edge base that con­tains mu­tu­ally con­tra­dic­tory facts: the vase is both bro­ken and it is not bro­ken. Un­der nor­mal math­e­mat­i­cal sys­tems of rea­son­ing, this will al­low us to prove any claim we like: the truth value of any sen­tence be­comes 1, which is not what we want. Now, school 1 tries to solve this by com­ing up with new sys­tems of rea­son­ing that are tol­er­ant of such in­ter­nal con­tra­dic­tions, sys­tems that will make com­pu­ta­tions that will pro­duce the ‘ob­vi­ously true’ con­clu­sions only, of that will de­rive the ob­vi­ously true’ con­clu­sions be­fore de­riv­ing the ob­vi­ously false’ ones, or that com­pute prob­a­bil­is­tic truth val­ues such a way that those of the ob­vi­ously true’ con­clu­sions are higher. In MIRI ter­minol­ogy, I be­lieve this ap­proach goes un­der the head­ing ‘de­ci­sion the­ory’. I also in­ter­pret the two al­ter­na­tive solu­tions you men­tion above as fol­low­ing this school of thought. Per­son­ally, I find this solu­tion ap­proach not very promis­ing or com­pel­ling.

School of thought 2, which in­cludes Pearl’s ver­sion of coun­ter­fac­tual rea­son­ing, says that if you want to rea­son (or if you want a ma­chine to rea­son) in a coun­ter­fac­tual way, you should not just add facts to the body of knowl­edge you use. You need to delete or edit other facts in the knowl­edge base too, be­fore you sup­ply it to the rea­son­ing en­g­ine, ex­actly to avoid in­putting a knowl­edge base that has in­ter­nal con­tra­dic­tions. For ex­am­ple, if you want to rea­son about ‘if the vase had not been bro­ken’, one thing you definitely need to do is to first re­move the state­ment (or any in­for­ma­tion lead­ing to the con­clu­sion that) the vase is bro­ken’ from the knowl­edge base that goes into your rea­son­ing en­g­ine. You have to do this even though the fact that the vase is bro­ken is ob­vi­ously true for the cur­rent world you are in.

So school 2 avoids the prob­lem of hav­ing to some­how build a rea­son­ing en­g­ine that does the right thing even when a con­tra­dic­tory knowl­edge base is in­put. But it trades this for the prob­lem of hav­ing to de­cide ex­actly what ed­its will be made to the knowl­edge base to elimi­nate the pos­si­bil­ity of hav­ing such con­tra­dic­tions. In other words, if you want a ma­chine to rea­son in a coun­ter­fac­tual way, you have to make choices about the spe­cific ed­its you will make. Often, there are many pos­si­ble choices, and differ­ent choices may lead to differ­ent prob­a­bil­ity dis­tri­bu­tions in the out­comes com­puted. This choice prob­lem does not bother me that much, I see it as hav­ing de­sign free­dom. But if you are a philoso­pher of lan­guage try­ing to find a sin­gle ob­vi­ous sys­tem of mean­ing for nat­u­ral lan­guage coun­ter­fac­tual sen­tences, this choice prob­lem might bother you a lot, you might be tempted to find some kind of rep­re­sen­ta­tion-in­de­pen­dent Oc­cam’s ra­zor that can be used to de­cide be­tween coun­ter­fac­tual ed­its.

Over­all, my feel­ing is that school 2 gives an ac­count of log­i­cal coun­ter­fac­tu­als that is good enough for my pur­poses in AGI safety work.

As a triv­ial school 1 edge case, one could de­sign a rea­son­ing en­g­ine that can deal with con­tra­dic­tory facts in its in­put knowl­edge base as fol­lows: the en­g­ine first makes some school 2 ed­its on its in­put to re­move the con­tra­dic­tions, and then pro­ceeds calcu­lat­ing the re­quested truth value. So one could ar­gue that the schools are not fun­da­men­tally differ­ent, though I do feel they are differ­ent in out­look, es­pe­cially in their out­look on how nec­es­sary or use­ful it will be for AGI safety to re­solve cer­tain puz­zles.

• What about school 3, the one that solves the prob­lem with com­part­men­tal­i­sa­tion/​sand­box­ing?

• I was hop­ing some­body would come up with more schools… I think I could in­ter­pret the tech­niques of school 3 as a par­tic­u­lar way to im­ple­ment the make some ed­its be­fore you in­put it into the rea­son­ing en­g­ine en­g­ine’ pre­scrip­tion of school 2, but maybe school 3 is differ­ent from school 2 in how it would de­scribe its solu­tion di­rec­tion.

There is definitely also a school 4 (or maybe you would say this is the same one as school 3) which con­sid­ers it to be an ob­vi­ous truth that that when you run simu­la­tions or start up a sand­box, you can sup­ply any start­ing world state that you like, and there is noth­ing strange or para­dox­i­cal about this. Speci­fi­cally, if you are an agent con­sid­er­ing a choice be­tween tak­ing ac­tions A, B, and C as the next ac­tion, you can run differ­ent simu­la­tions to ex­trap­o­late the re­sults of each. If a self-aware agent in­side the simu­la­tion for ac­tion B com­putes the ac­tion that an op­ti­mal agent would have taken at the point in time where its simu­la­tion started was A, this agent can­not con­clude there is a con­tra­dic­tion: such a con­clu­sion would rest on mak­ing a cat­e­gory er­ror. (See my an­swer in this post for a longer dis­cus­sion of the topic.)

• Sup­pose you learn about physics and find that you are a robot. You learn that your source code is “A”. You also be­lieve that you have free will; in par­tic­u­lar, you may de­cide to take ei­ther ac­tion X or ac­tion Y.

My mo­ti­va­tion for talk­ing about log­i­cal coun­ter­fac­tu­als has lit­tle to do with free will, even if the philo­soph­i­cal anal­y­sis of log­i­cal coun­ter­fac­tu­als does.

The rea­son I want to talk about log­i­cal coun­ter­fac­tu­als is as fol­lows: sup­pose as above that I learn that I am a robot, and that my source code is “A”(which is pre­sumed to be de­ter­minis­tic in this sce­nario), and that I have a de­ci­sion to make be­tween ac­tion X and ac­tion Y. In or­der to make that de­ci­sion, I want to know which de­ci­sion has bet­ter ex­pected util­ity. The prob­lem is that, in fact, I will ei­ther choose X or Y. Sup­pose with­out loss of gen­er­al­ity that I will end up choos­ing ac­tion X. Then wor­lds in which I choose Y are log­i­cally in­co­her­ent, so how am I sup­posed to rea­son about the ex­pected util­ity of choos­ing Y?

• I’m not us­ing “free will” to mean some­thing dis­tinct from “the abil­ity of an agent, from its per­spec­tive, to choose one of mul­ti­ple pos­si­ble ac­tions”. Maybe this us­age is non­stan­dard but find/​re­place yields the right mean­ing.

• I think us­ing the term in that way, with­out ex­plic­itly defin­ing it, makes the dis­cus­sion more confused

• Then wor­lds in which I choose Y are log­i­cally incoherent

From an om­ni­scient point of view, or from your point of view? The typ­i­cal agent has im­perfect knowl­edge of both the in­puts to their de­ci­sion pro­ce­dure, and the pro­ce­dure it­self. So long as an agent treats what it thinks is hap­pen­ing, as only one pos­si­bil­ity, then there is not con­tra­dic­tion be­cause pos­si­ble-X is always com­pat­i­ble with pos­si­bly not-X.

• From an om­ni­scient point of view, yes. From my point of view, prob­a­bly not, but there are still prob­lems that arise re­lat­ing to this, that can cause logic-based agents to get very con­fused.

Let A be an agent, con­sid­er­ing op­tions X and not-X. Sup­pose A |- Ac­tion=not-X → Utility=0. The naive ap­proach to this would be to say: if A |- Ac­tion=X → Utility<0, A will do not-X, and if A |- Ac­tion=X → Utility>0, A will do X. Sup­pose fur­ther that A knows its source code, so it knows this is the case.
Con­sider the state­ment G=(A |- G) → (Ac­tion=X → Utility<0). It can be con­structed by us­ing Godel-num­ber­ing and quines. Pre­sent A with the fol­low­ing ar­gu­ment:

Sup­pose for the sake of ar­gu­ment that A |- G. Then A |- (A |- G), since A knows its source code. Also, by defi­ni­tion of G, A |- (A |- G) → (Ac­tion=X → Utility<0). By modus po­nens, A |- (Ac­tion=X → Utility<0). There­fore, by our as­sump­tion about A, A will do not-X: Ac­tion!=X. But, vac­u­ously, this means that (Ac­tion=X → Utility<0). Since we have proved this by as­sum­ing A |- G, we know that (A |- G) → (Ac­tion=X → Utility<0), in other words, we know G.

The ar­gu­ment then goes, similarly to above:
A |- G
A |- (A |- G)
A |- (A |- G) → (Ac­tion=X → Utility<0)
A |- (Ac­tion=X → Utility<0)
Ac­tion=Not-X

We proved this with­out know­ing any­thing about X. This shows that naive log­i­cal im­pli­ca­tion can eas­ily lead one astray. The stan­dard solu­tion to this prob­lem is the chicken rule, mak­ing it so that if A ever proves which ac­tion it will take, it will im­me­di­ately take the op­po­site ac­tion, which avoids the ar­gu­ment pre­sented above, but is defeated by Troll Bridge, even when the agent has good log­i­cal un­cer­tainty.

Th­ese prob­lems seem to me to show that log­i­cal un­cer­tainty about the ac­tion one will take, paired with log­i­cal im­pli­ca­tions about what the re­sult will be if you take a par­tic­u­lar ac­tion, are in­suffi­cient to de­scribe a good de­ci­sion the­ory.

• I am not aware of a good rea­son to be­lieve that a perfect de­ci­sion the­ory is even pos­si­ble, or that coun­ter­fac­tu­als of any sort are the main ob­sta­cle.

• In this pos­si­ble world, it is the case that “A” re­turns Y upon be­ing given those same ob­ser­va­tions. But, the out­put of “A” when given those ob­ser­va­tions is a fixed com­pu­ta­tion, so you now need to rea­son about a pos­si­ble world that is log­i­cally in­co­her­ent, given your knowl­edge that “A” in fact re­turns X. This pos­si­ble world is, then, a log­i­cal coun­ter­fac­tual: a “pos­si­ble world” that is log­i­cally in­co­her­ent.

Sim­pler solu­tion: in that world, your code is in­stead A’, which is ex­actly like A, ex­cept that it re­turns Y in this situ­a­tion. This is the more gen­eral solu­tion de­rived from Pearl’s ac­count of coun­ter­fac­tu­als in do­mains with a finite num­ber of vari­ables (the “twin net­work con­struc­tion”).

Last year, my col­leagues and I pub­lished a pa­per on Tur­ing-com­plete coun­ter­fac­tual mod­els (“causal prob­a­bil­is­tic pro­gram­ming”), which de­tails how to do this, and even gives ex­e­cutable code to play with, as well as a for­mal se­man­tics. Have a look at our preda­tor-prey ex­am­ple, a fully worked ex­am­ple of how to do this “coun­ter­fac­tual world is same ex­cept blah” con­struc­tion.

http://​​www.zenna.org/​​pub­li­ca­tions/​​causal.pdf

• Yes, this is a spe­cific way of do­ing policy-de­pen­dent source code, which min­i­mizes how much the source code has to change to han­dle the coun­ter­fac­tual.

Haven’t looked deeply into the pa­per yet but the ba­sic idea seems sound.

• If the agent is ‘caused’ then in or­der for its source code to be differ­ent, some­thing about the pro­cess that pro­duced it must be differ­ent. (I haven’t seen this ad­dressed.)

• I found parts of your fram­ing quite origi­nal and I’m still try­ing to un­der­stand all the con­se­quences.

Firstly, I’m also op­posed to char­ac­ter­is­ing the prob­lem in terms of log­i­cal coun­ter­fac­tu­als. I’ve ar­gued be­fore that Coun­ter­fac­tu­als are an An­swer Not a Ques­tion, al­though maybe it would have been clearer to say that they are a Tool Not a Ques­tion in­stead. If we’re talk­ing strictly, it doesn’t make sense to ask what maths would. be like if 1+1=3 as it doesn’t, but we can con­struct a para-con­sis­tent logic where it makes sense to do some­thing analo­gous to pre­tend­ing 1+1=3. And so maybe one form of “log­i­cal coun­ter­fac­tual” could be use­ful for solv­ing these prob­lems, but that doesn’t mean ask­ing what log­i­cal coun­ter­fac­tu­als are, as though they were on­tolog­i­cally ba­sic, as though they were in the map not the ter­ri­tory, as though they were a sin­gle unified con­cept, makes sense.

Se­condly, “free will” is such a loaded word that us­ing it in a non-stan­dard fash­ion sim­ply ob­scures and con­fuses the dis­cus­sion. Nonethe­less, I think you are touch­ing upon an im­por­tant point here. I have a fram­ing which I be­lieve helps clar­ify the situ­a­tion. If there’s only one pos­si­ble de­ci­sion, this gives us a Triv­ial De­ci­sion Prob­lem. So to have a non-triv­ial de­ci­sion prob­lem, we’d need a model con­tain­ing at least two de­ci­sions. If we ac­tu­ally did have liber­tar­ian free will, then our de­ci­sion prob­lems would always be non-triv­ial. How­ever, in the ab­sence of this, the only way to avoid triv­ial­ity would be to aug­ment the fac­tual with at least one coun­ter­fac­tual.

Coun­ter­fac­tual non-re­al­ism: Hmm… I see how this could be a use­ful con­cept, but the defi­ni­tion given feels a bit vague. For ex­am­ple, re­cently I’ve been ar­gu­ing in favour of what counts as a valid coun­ter­fac­tual be­ing at least par­tially a mat­ter of so­cial con­ven­tion. Is that coun­ter­fac­tual non-re­al­ism?

Fur­ther, it seems a bit strange to as­so­ci­ate ma­te­rial con­di­tions with coun­ter­fac­tual non-re­al­ism. Ma­te­rial con­di­tions only provide the out­come when we have a con­sis­tent coun­ter­fac­tual. So, ei­ther a) we be­lieve in liber­tar­ian free will b) we use some­thing like the era­sure ap­proach to re­move in­for­ma­tion such that we have mul­ti­ple con­sis­tent pos­si­bil­ities (see https://​​www.less­wrong.com/​​posts/​​BRuWm4Gx­cTNPn4XDX/​​de­con­fus­ing-log­i­cal-coun­ter­fac­tu­als). Proof-based UDT doesn’t quite use ma­te­rial con­di­tion­als, it uses a para­con­sis­tent ver­sion of them in­stead. Although, maybe I’m just be­ing too pedan­tic here. In any case, we can find ways of mak­ing para­con­sis­tent logic be­have as ex­pected in any sce­nario, how­ever it would re­quire a seper­ate ground. That is, it isn’t enough that the logic merely seems to work, but we should be able to provide a sep­a­rate rea­son for why us­ing a para­con­sis­tent logic in that way is good.

Also, an­other ap­proach which kind of al­igns with coun­ter­fac­tual non-re­al­ism is to say that given the state of the uni­verse at any par­tic­u­lar time we can de­ter­mine the past and fu­ture and that there are no coun­ter­fac­tu­als be­yond those we gen­er­ate by imag­in­ing state Y at time T in­stead of state X. So, to imag­ine coun­ter­fac­tu­ally tak­ing ac­tion Y we re­place the agent do­ing X with an­other agent do­ing Y and flow cau­sa­tion both for­wards and back­wards. (See this post for more de­tail). It could be ar­gued that these count as coun­ter­fac­tu­als, but I’d al­ign it with coun­ter­fac­tual non-re­al­ism as it doesn’t have de­ci­sion coun­ter­fac­tual as seper­ate on­tolog­i­cal el­e­ments.

Policy-de­pen­dent source code—this is ac­tu­ally a pretty in­ter­est­ing fram­ing. I’ve always de­faulted to think­ing about coun­ter­fac­tu­als in terms of ac­tions, but when we’re talk­ing about things in terms of prob­lem’s like Coun­ter­fac­tual Mug­ging, char­ac­ter­is­ing coun­ter­fac­tu­als in terms of policy might be more nat­u­ral. It’s strange why this feels fresh to me—I mean UDT takes this ap­proach—but I never con­sid­ered the pos­si­bil­ity of non-UDT policy coun­ter­fac­tu­als. I guess from a philo­soph­i­cal per­spec­tive it makes sense to first con­sider whether policy-de­pen­dent source code makes sense and then if it does fur­ther ask whether UDT makes sense.

• “free will” is such a loaded word

As a side note—one thing I don’t un­der­stand is why more peo­ple don’t seem to want to use just the word “will” with­out the “free” part in front of it.

It seems like a much more straight­for­ward and less fraught term, and some­thing that we ob­vi­ously have. Do we have a “will”? Ob­vi­ously yes—we want things, we choose things, etc. Is that will “free”? Well what does that mean?

EDIT: I feel like this is a case of philoso­phers bak­ing in a con­fu­sion into their stan­dard term. It’d be like if in­stead of space we always talked about “ab­solute space”. And then post-Ein­stein peo­ple ar­gued about whether “ab­solute space” ex­isted or not, with­out ever just us­ing the term “space” just by it­self.

• Philoso­phers talk about free will be­cause it is con­tentious and there­fore worth dis­cussing philo­soph­i­cally , whereas will, qua wants and de­sires, isn’t.

cf, the silly physi­cists who in­sist on talk­ing about dark mat­ter, when any­one can see that or­di­nary mat­ter ex­ists.

• Philoso­phers talk about free will be­cause it is con­tentious and there­fore worth dis­cussing philo­soph­i­cally , whereas will, qua wants and de­sires, isn’t.

Fair point. But then why do so many (in­clud­ing philoso­phers) make state­ments like, “we seem to have free will”, or “this ex­pe­rience of ap­par­ent free will that we have re­quires ex­pla­na­tion.”

If ‘free will’ in those state­ments means some­thing differ­ent from ‘will’, then it seems like they’re as­sum­ing the (wrong) ex­pla­na­tion.

cf, the silly physi­cists who in­sist on talk­ing about dark mat­ter, when any­one can see that or­di­nary mat­ter ex­ists.

If physi­cists of­ten used the term “dark mat­ter” in ways that sug­gested it’s the same thing as peo­ple’s folk con­cept of mat­ter, then I’d agree that they were silly.

• Fair point. But then why do so many (in­clud­ing philoso­phers) make state­ments like, “we seem to have free will”, or “this ex­pe­rience of ap­par­ent free will that we have re­quires ex­pla­na­tion.”

Why spe­cific philoso­phers say spe­cific things is usu­ally ex­plained by the philoso­phers them­selves, since it is hard to gain a rep­u­ta­tion in the field by mak­ing un­sup­ported as­ser­tions. But you seem to be mak­ing the point that is strange that any philoso­pher ar­gues in favour of free will, since, ac­cord­ing to you it is ob­vi­ously non-ex­is­tent. The an­swer to that is that you are not ca­pa­ble of re­pro­duc­ing all the ar­gu­ments for or against a claim your­self, so your per­sonal guess­work is not a good guide to how plau­si­ble some­thing is.

“this ex­pe­rience of ap­par­ent free will that we have re­quires ex­pla­na­tion.”

Doens’t ev­ery­thing re­quire ex­pla­na­tion? Even your man Yud­kowsky offers an ex­pla­na­tion of the feel­ing of free will.

If physi­cists of­ten used the term “dark mat­ter” in ways that sug­gested it’s the same thing as peo­ple’s folk con­cept of mat­ter, then I’d agree that they were silly.

Physi­cists do use the word “mat­ter” in a sense that de­parts from folk us­age. For in­stance, they as­sert that it is mostly noth­ing­ness, and that it is equiv­a­lent to en­ergy.

• But you seem to be mak­ing the point that is strange that any philoso­pher ar­gues in favour of free will, since, ac­cord­ing to you it is ob­vi­ously non-ex­is­tent.

I didn’t mean that just the philoso­phers who be­lieve in (liber­tar­ian, con­tra-causal) free will make state­ments like “we seem to have free will”, or “this ex­pe­rience of ap­par­ent free will that we have re­quires ex­pla­na­tion”. I’ve heard those state­ments even from those ques­tion­ing such free will.

They’ll say, “we seem to have free will, but ac­tu­ally it’s an illu­sion”.

What I do not see is pro­po­nents of de­ter­minism say­ing that “free will” is the wrong term, that most of the in­tu­itive prop­er­ties that our wants and choices seem to have are satis­fied by the idea of a “will” plane and sim­ple. And then start­ing the ar­gu­ment from there about whether there are ad­di­tional prop­er­ties that that will has or seems to have s.t. it’s rea­son­able to ap­pend the term “free” to the front.

Maybe it’s pop­u­lariz­ers that I have to blame, rather than philoso­phers. I’m not sure. My com­plaint is that some­how the stan­dard sides of the de­bate came to be la­beled “free will” vs “de­ter­minism” rather than “un­caused will” vs “de­ter­mined will”.

I think the “fee will” vs “de­ter­minism” fram­ing un­fairly makes it seem like whether any want­ing or choos­ing is hap­pen­ing is at stake, such that peo­ple had to come up with the spe­cial term “com­pat­i­bil­ism” for the po­si­tion that “no no, there’s still want­ing and choos­ing go­ing on”.

If you started the de­bate with ev­ery­one agree­ing, “ob­vi­ously there’s some form of want­ing and choos­ing hap­pen­ing,” and then ask­ing, “but what form does it take and where does it come from? Can it be said to be caused by any­thing?” then I think the nat­u­ral terms for the two camps would be some­thing like “un­caused will” and “de­ter­mined will”.

I think those terms ac­cu­rately de­scribe the ma­jor sides of the pop­u­lar de­bate and are less likely to prej­u­dice peo­ple’s in­tu­itions in fa­vor of the free/​un­caused will side.

So what I don’t un­der­stand is: why don’t pro­po­nents of de­ter­minism push that fram­ing?

• Pro­po­nents of de­ter­minism tend to say that liber­tar­ian free will doesn’t ex­ist, but com­pat­i­bil­ist free will might. It is likely that they are ex­press­ing the same idea as you, but in differ­ent lan­guage.

• That’s an in­ter­est­ing point

• Se­condly, “free will” is such a loaded word that us­ing it in a non-stan­dard fash­ion sim­ply ob­scures and con­fuses the dis­cus­sion.

Wikipe­dia says “Free will is the abil­ity to choose be­tween differ­ent pos­si­ble courses of ac­tion unim­peded.” SEP says “The term “free will” has emerged over the past two mil­len­nia as the canon­i­cal des­ig­na­tor for a sig­nifi­cant kind of con­trol over one’s ac­tions.” So my us­age seems pretty stan­dard.

For ex­am­ple, re­cently I’ve been ar­gu­ing in favour of what counts as a valid coun­ter­fac­tual be­ing at least par­tially a mat­ter of so­cial con­ven­tion.

All word defi­ni­tions are de­ter­mined in large part by so­cial con­ven­tion. The ques­tion is whether the so­cial con­ven­tion cor­re­sponds to a defi­ni­tion (e.g. with truth con­di­tions) or not. If it does, then the so­cial con­ven­tion is re­al­ist, if not, it’s non­re­al­ist (per­haps emo­tivist, etc).

Ma­te­rial con­di­tions only provide the out­come when we have a con­sis­tent coun­ter­fac­tual.

Not nec­es­sar­ily. An agent may be un­cer­tain over its own ac­tion, and thus have un­cer­tainty about ma­te­rial con­di­tion­als in­volv­ing its ac­tion. The “pos­si­ble wor­lds” rep­re­sented by this un­cer­tainty may be log­i­cally in­con­sis­tent, in ways the agent can’t de­ter­mine be­fore mak­ing the de­ci­sion.

Proof-based UDT doesn’t quite use ma­te­rial con­di­tion­als, it uses a para­con­sis­tent ver­sion of them in­stead.

I don’t un­der­stand this? I thought it searched for proofs of the form “if I take this ac­tion, then I get at least this much util­ity”, which is a ma­te­rial con­di­tional.

So, to imag­ine coun­ter­fac­tu­ally tak­ing ac­tion Y we re­place the agent do­ing X with an­other agent do­ing Y and flow cau­sa­tion both for­wards and back­wards.

Policy-de­pen­dent source code does this; one’s source code de­pends on one’s policy.

I guess from a philo­soph­i­cal per­spec­tive it makes sense to first con­sider whether policy-de­pen­dent source code makes sense and then if it does fur­ther ask whether UDT makes sense.

I think UDT makes sense in “du­al­is­tic” de­ci­sion prob­lems that are already fac­tor­ized as “this policy leads to these con­se­quences”. Ex­tend­ing it to a non­d­u­al­ist case brings up difficul­ties, in­clud­ing the free will /​ de­ter­minism is­sue. Policy-de­pen­dent source code is a way of in­ter­pret­ing UDT in a set­ting with de­ter­minis­tic, know­able physics.

• So my us­age (of free will) seems pretty stan­dard.

Not quite. The way you are us­ing it doesn’t nec­es­sar­ily im­ply real con­trol, it may be imag­i­nary con­trol.

All word defi­ni­tions are de­ter­mined in large part by so­cial convention

True. Maybe I should clar­ify what I’m sug­gest­ing. My cur­rent the­ory is that there are mul­ti­ple rea­son­able defi­ni­tions of coun­ter­fac­tual and it comes down to so­cial norms as to what we ac­cept as a valid coun­ter­fac­tual. How­ever, it is still very much a work in progress, so I wouldn’t be able to provide more than vague de­tails.

The “pos­si­ble wor­lds” rep­re­sented by this un­cer­tainty may be log­i­cally in­con­sis­tent, in ways the agent can’t de­ter­mine be­fore mak­ing the de­ci­sion.

I guess my point was that this no­tion of coun­ter­fac­tual isn’t strictly a ma­te­rial con­di­tional due to the prin­ci­ple of ex­plo­sion. It’s a “para-con­sis­tent ma­te­rial con­di­tional” by which I mean the al­gorithm is limited in such a way as to pre­vent this ex­plo­sion.

Policy-de­pen­dent source code does this; one’s source code de­pends on one’s policy.

Hmm… good point. How­ever, were you flow­ing this all the way back in time? Such as if you change some­one’s source code, you’d also have to change the per­son who pro­grammed them.

I think UDT makes sense in “du­al­is­tic” de­ci­sion prob­lems’\

What do you mean by du­al­is­tic?

• The way you are us­ing it doesn’t nec­es­sar­ily im­ply real con­trol, it may be imag­i­nary con­trol.

I’m dis­cussing a hy­po­thet­i­cal agent who be­lieves it­self to have con­trol. So its be­liefs in­clude “I have free will”. Its be­lief isn’t “I be­lieve that I have free will”.

It’s a “para-con­sis­tent ma­te­rial con­di­tional” by which I mean the al­gorithm is limited in such a way as to pre­vent this ex­plo­sion.

Yes, that makes sense.

How­ever, were you flow­ing this all the way back in time?

Yes (see thread with Abram Dem­ski).

What do you mean by du­al­is­tic?

Already fac­tor­ized as an agent in­ter­act­ing with an en­vi­ron­ment.

• Yes (see thread with Abram Dem­ski).

Hmm, yeah this could be a vi­able the­ory. Any­way to sum­marise the ar­gu­ment I make in Is Back­wards Cau­sa­tion Ne­c­es­sar­ily Ab­surd?, I point out that since physics is pretty much re­versible, in­stead of A caus­ing B, it seems as though we could also imag­ine B caus­ing A and time go­ing back­wards. In this view, it would be rea­son­able to say that one-box­ing (back­wards-)caused the box to be full in New­combs. I only sketched the the­ory be­cause I don’t have enough physics knowl­edge to eval­u­ate it. But the point is that we can give jus­tifi­ca­tion for a non-stan­dard model of causal­ity.

• On my cur­rent un­der­stand­ing of this post, I think I have a crit­i­cism. But I’m not sure if I prop­erly un­der­stand the post, so tell me if I’m wrong in my fol­low­ing sum­mary. I take the post to be say­ing some­thing like the fol­low­ing:

‘Sup­pose, in fact, I take the ac­tion A. In­stead of talk­ing about log­i­cal coun­ter­fac­tu­als, we should talk about policy-de­pen­dent source code. If we do this, then we can see that ini­tial talk about log­i­cal coun­ter­fac­tu­als en­coded an er­ror. The er­ror is not un­der­stand­ing the fol­low­ing claim: when ask­ing what would have hap­pened if I had performed some ac­tion A* A, ob­serv­ing that I do A* is ev­i­dence that I had some differ­ent source code. Thus, in analysing that coun­ter­fac­tual state­ment, we do not need to re­fer to in­co­her­ent ‘im­pos­si­ble wor­lds’.

If my sum­mary is right, I’m not sure how policy-de­pen­dent source code is a solu­tion to the global ac­count­ing prob­lem. This is be­cause the agent, when ask­ing what would have hap­pened if I had done Y, still faces a global ac­count­ing prob­lem. This is be­cause the agent must then as­sume they have some differ­ent source code B, and it seems like choos­ing an ap­pro­pri­ate B will be un­der­de­ter­mined. That is, there is no unique source code B to give you a de­ter­mi­nate an­swer about what would have hap­pened if you performed A*. I can see why think­ing in terms of policy-de­pen­dent source code would be at­trac­tive if you were a non­re­al­ist about speci­fi­cally log­i­cal coun­ter­fac­tu­als, and a re­al­ist about differ­ent kinds of coun­ter­fac­tu­als. But that’s not what I took you to be say­ing.

• The sum­mary is cor­rect.

In­deed, it is un­der­de­ter­mined what the al­ter­na­tive source code is. Some­times it doesn’t mat­ter (this is the case in most de­ci­sion prob­lems), and some­times there is a fam­ily of pro­grams that can be as­sumed. But this still pre­sents the­o­ret­i­cal prob­lems.

The mo­ti­va­tion is to be a non­re­al­ist about log­i­cal coun­ter­fac­tu­als while be­ing a re­al­ist about some coun­ter­fac­tu­als.

• I see the prob­lem of coun­ter­fac­tu­als as es­sen­tially solved by quasi-Bayesi­anism, which be­haves like UDT in all New­comb-like situ­a­tions. The source code in your pre­sen­ta­tion of the prob­lem is more or less equiv­a­lent to Omega in New­comb-like prob­lems. A TRL agent can also rea­son about ar­bi­trary pro­grams, and learn that a cer­tain pro­gram acts as a pre­dic­tor for its own ac­tions.

This ap­proach has some similar­ity with ma­te­rial im­pli­ca­tion and proof-based de­ci­sion the­ory, in the sense that out of sev­eral hy­poth­e­sis about coun­ter­fac­tu­als that are con­sis­tent with ob­ser­va­tions, the de­ci­sive role is played by the most op­ti­mistic hy­poth­e­sis (the one that can be ex­ploited for the most ex­pected util­ity). How­ever, it has no prob­lem with global ac­count­ing and in­deed it solves coun­ter­fac­tual mug­ging suc­cess­fully.

• It seems the ap­proaches we’re us­ing are similar, in that they both are start­ing from ob­ser­va­tion/​ac­tion his­tory with posited falsifi­able laws, with the agent’s source code not known a pri­ori, and the agent con­sid­er­ing differ­ent poli­cies.

Learn­ing “my source code is A” is quite similar to learn­ing “Omega pre­dicts my ac­tion is equal to A()”, so these would lead to similar re­sults.

Policy-de­pen­dent source code, then, cor­re­sponds to Omega mak­ing differ­ent pre­dic­tions de­pend­ing on the agent’s in­tended policy, such that when com­par­ing poli­cies, the agent has to imag­ine Omega pre­dict­ing differ­ently (as it would imag­ine learn­ing differ­ent source code un­der policy-de­pen­dent source code).

• Policy-de­pen­dent source code, then, cor­re­sponds to Omega mak­ing differ­ent pre­dic­tions de­pend­ing on the agent’s in­tended policy, such that when com­par­ing poli­cies, the agent has to imag­ine Omega pre­dict­ing differ­ently (as it would imag­ine learn­ing differ­ent source code un­der policy-de­pen­dent source code).

Well, in quasi-Bayesi­anism for each policy you have to con­sider the worst-case en­vi­ron­ment in your be­lief set, which de­pends on the policy. I guess that in this sense it is analo­gous.

• Short:

Hence, they would not have seen that their source code is “A”.

Un­less some­thing in­terfered with what they saw—there need not be pure/​true ob­ser­va­tions.

In­stead, if the agent were to take ac­tion Y upon see­ing that their source code is “A”, their source code must be some­thing else, per­haps “B”.

And some­thing might have in­cen­tive to do so if the agent were to do X if it “saw its source code was A” and were to do Y if it “saw its source code was B”. While A and B may be mu­tu­ally ex­clu­sive, the ac­tual policy “might” be de­pen­dent on ob­ser­va­tions of ei­ther.

Long:

[1] If a pro­gram takes long enough to run, it may never be found that it does halt. In a sense, the fact that its out­put is de­ter­mined does not mean it can (or will) be de­duced.

there is no way for two differ­ent poli­cies to be com­pat­i­ble with the same source code.

And set of in­puts.

For­mally, a policy is a func­tion map­ping an ob­ser­va­tion his­tory to an ac­tion. It is dis­tinct from source code, in that the source code speci­fies the im­ple­men­ta­tion of the policy in some pro­gram­ming lan­guage, rather than it­self be­ing a policy func­tion.
Log­i­cally, it is im­pos­si­ble for the same source code to gen­er­ate two differ­ent poli­cies. There is a fact of the mat­ter about what ac­tion the source code out­puts given an ob­ser­va­tion his­tory (as­sum­ing the pro­gram halts). Hence there is no way for two differ­ent poli­cies to be com­pat­i­ble with the same source code.

Over­all take:

Dy­namic ver­sus static:

Con­sider the num­bers 3, 1, 2, 4.

There ex­ists more than one set of ac­tions that ‘trans­forms’ the above into: 1, 2, 3, 4.

(It can also be trans­formed into a sorted list by delet­ing the 3...)

A sort­ing method how­ever, does not always take a list and move the first el­e­ment to the third po­si­tion, or even nec­es­sar­ily do so in ev­ery case where the first el­e­ment is three.

While de­ter­minis­tic, its be­hav­ior de­pends upon an in­put. Given the in­put, the ac­tions it will take are known (or fol­low from the source code in prin­ci­ple[1]).

This can be gen­er­al­ized fur­ther, in the case of a sort­ing pro­gram that takes both a set of ob­jects, and a way of or­der­ing. Per­haps a pro­gram can even be writ­ten that rea­sons about some policy, and based on the re­sults, makes an out­put con­di­tional on what it finds. Thus the “log­i­cal coun­ter­fac­tual” does not ex­ist per se, but is a way of think­ing used in or­der to han­dle the differ­ent cases, as it is not clear which one is the case, though only one may be pos­si­ble.

More spe­cific:

For­mally, a policy is a func­tion map­ping an ob­ser­va­tion his­tory to an ac­tion. It is dis­tinct from source code, in that the source code speci­fies the im­ple­men­ta­tion of the policy in some pro­gram­ming lan­guage, rather than it­self be­ing a policy func­tion.

Though a policy may in­clude/​spec­ify (sim­pler) poli­cies, and thus by ex­ten­sion, a source code may as well, though the differ­ent threads will prob­a­bly be weaved to­gether.

• I’m try­ing t un­der­stand where ex­actly in your ap­proach you sneak in the free will...

• For coun­ter­fac­tual non­re­al­ism, it’s sim­ply the un­cer­tainty an agent has about their own ac­tion, while be­liev­ing them­selves to con­trol their ac­tion.

For policy-de­pen­dent source code, the “differ­ent pos­si­bil­ities” cor­re­spond to differ­ent source code. An agent with fixed source code can only take one pos­si­ble ac­tion (from a log­i­cally om­nis­cent per­spec­tive), but the coun­ter­fac­tu­als change the agent’s source code, get­ting around this con­straint.

• I think

• when mod­el­ing a com­plex/​not en­tirely un­der­stood sys­tem, prob­a­bil­ities may be a more effec­tive frame­work.

• Just as, if the out­put of a pro­gram were known be­fore it was run, it prob­a­bly wouldn’t need to be run, we don’t know what we’ll de­cide be­fore we de­cide, though we do af­ter, and we’re not sure how we could have pre­dicted the out­come in ad­vance.

• What is “lin­ear in­ter­ac­tive causal­ity”?

• Ba­si­cally, the as­sump­tion that you’re par­ti­ci­pat­ing in a POMDP. The idea is that there’s some hid­den state that your ac­tions in­ter­act with in a tem­po­rally lin­ear fash­ion (i.e. ac­tion 1 af­fects state 2), such that your late ac­tions can’t af­fect early states/​ob­ser­va­tions.

• OK, so no “back­wards cau­sa­tion” ? (not sure if that’s a tech­ni­cal term and/​or if I’m us­ing it right...)

Is there a word we could use in­stead of “lin­ear”, which to an ML per­son sounds like “as in lin­ear alge­bra”?

• Yes, it’s about no back­wards as­sump­tion. Lin­ear has lots of mean­ings, I’m not con­cerned about this get­ting con­fused with lin­ear alge­bra, but you can sug­gest a bet­ter term if you have one.

• this “what would have hap­pened” world is log­i­cally in­co­her­ent.

There is a log­i­cal con­tra­dic­tion be­tween the idea that your ac­tions are de­ter­mined, and the idea that you could have acted differ­ently un­der the ex­act same cir­cum­stances. There is no such prob­lem if you do not as­sume de­ter­minism, mean­ing that the “prob­lem” of log­i­cal coun­ter­fac­tu­als is nei­ther un­avoid­able nor purely log­i­cal—it is not purely log­i­cal be­cause a meta­phys­i­cal as­sump­tion, an as­sump­tion about the way re­al­ity works is in­volved.

The as­sump­tion of de­ter­minism is im­plicit in talk­ing of your­self as a com­puter pro­gramme, and the as­sump­tion of in­de­ter­minism is im­plicit in talk­ing about your­self as nonethe­less hav­ing free will.

A purely log­i­cal coun­ter­fac­tual , a log­i­cal coun­ter­fac­tual prop­erly so-called, is a hy­po­thet­i­cal state of af­fairs, where a differ­ent in­put or set of pre­con­di­tions is sup­posed, and a differ­ent, also hy­po­thet­i­cal out­put or re­sult ob­tains. Such a coun­ter­fac­tual is log­i­cally con­sis­tent—it just isn’t con­sis­tent with what ac­tu­ally oc­curred.

Ac­cord­ing to coun­ter­fac­tual non­re­al­ism, there is no fact of the mat­ter about what “would have hap­pened” had a differ­ent ac­tion been taken.

Peo­ple calcu­late log­i­cal coun­ter­fac­tu­als all the time. You can figure out what out­put a pro­gramme will give in re­sponse to an in­put it has never re­ceived by look­ing at the code. But note that that is a purely episte­molog­i­cal is­sue. There may be a sep­a­rate, on­tolog­i­cal, not episte­molog­i­cal is­sue about real coun­ter­fac­tu­als. If you have good rea­son to be­lieve in de­ter­min­sim, which you don’t, you should dis­be­lieve in real coun­ter­fac­tu­als. But that says noth­ing about log­i­cal coun­ter­facuals. So long as some hy­giene is ex­er­cised about the episte­molog­i­cal/​on­tolog­i­cal dis­tinc­tion, and the log­i­cal/​real dis­in­tic­n­tion then there is no prob­lem.

The ap­par­ent non­de­ter­minism is, then, only due to the epistemic limi­ta­tion of the agent at the time of mak­ing the de­ci­sion, a limi­ta­tion not faced by a later ver­sion of the agent (or an out­side agent) with more com­pu­ta­tion power.

Note that prob­lems agents have in in­tro­spect­ing their own de­ci­sion mak­ing are not prob­lems with coun­ter­fac­tu­als (real or log­i­cal) per se.

This leads to a sort of rel­a­tivism: what is un­de­ter­mined from one per­spec­tive may be de­ter­mined from an­other.

It doesn’t lead to se­ri­ous rel­a­tivism, be­cause the per­spec­tives are asym­met­ri­cal. The agent that knows more is more right.

A prob­lem that comes up is that of “spu­ri­ous coun­ter­fac­tu­als”

A “spu­ri­ous” coun­ter­fac­tual is just a log­i­cal, as op­posed to real, coun­ter­fac­tual. The fact that it could never have oc­curred means it was never a real coun­ter­fac­tual.

• This was in­ter­est­ing to read but not en­tirely sure just what it means to me and think­ing any­thing through.

As I was read­ing I started to think along the lines of you policy side—per­haps the ques­tion is not about how to twist A code into out­putting Y but rather why not just con­sider the agent runs some other code. (The im­me­di­ate prob­lem with my thought is that of the in­finite regress.) But also when think­ing about coun­ter­fac­tu­als that is in a sense what I am ex­plor­ing. But I would ex­press that more as what if ac­tion/​path Y were taken, where does that lead? Is that a bet­ter re­sult? If so then the re­sponse is about up­dat­ing some pri­ors re­lated to the in­put to A or up­dat­ing A. In this sense the ques­tion is not about the log­i­cal prob­lems of get­ting A to out­put Y when we know it has to out­put X but about the in­ter­nal logic and de­ci­sion mak­ing perfor­mance of A and if we need to up­date to A’.

I am also won­der­ing if in­clud­ing the whole free-will as­pect add value. If you just took that as­pect out what changes for your think­ing? Or is the whole ques­tion of free-will part of the philo­soph­i­cal ques­tion you need to ad­dress. If so your post did prompt a thought on my think­ing about free-will, par­tic­u­larly in the con­text of ra­tio­nal mind­sets. I don’t know if some­one has already fol­lowed the line of think­ing (but would cer­tainly think so) but I don’t think free-will can be ra­tio­nally ex­plored and ex­plained within the con­fines of pure log­i­cal con­sis­tency.

• Without some as­sump­tion similar to “free will” it is hard to do any de­ci­sion the­ory at all, as you can’t com­pare differ­ent ac­tions; there is only one pos­si­ble ac­tion.

The coun­ter­fac­tual non­re­al­ist po­si­tion is closer to de­ter­minism than the policy-de­pen­dent source code po­si­tion. This as­sumes that the al­gorithm con­trols the de­ci­sion while the out­put of the al­gorithm is un­known.

• Without some as­sump­tion similar to “free will” it is hard to do any de­ci­sion the­ory at all, as you can’t com­pare differ­ent ac­tions; there is only one pos­si­ble ac­tion.

Un­der de­ter­minism, there is only one ac­tu­ally pos­si­ble ac­tion, and that doesn’t stop you com­par­ing hy­po­thet­i­cal ac­tions. Log­i­cal pos­si­bil­ity =/​= real pos­si­bil­ity. Since log­i­cal pos­si­bil­ities are only log­i­cal pos­si­bil­ities, no sincere as­sump­tion of real free will is re­quired.

Since you are in­vari­ably in a far from om­ni­scient state about both the world and your own in­ner work­ings, you are pretty much always deal­ing with hy­pothe­ses, not di­rect in­sight into re­al­ity.

• This is ex­actly what is de­scribed in the coun­ter­fac­tual non­re­al­ism sec­tion.

• Un­der de­ter­minism, you should be a non­re­al­ist about real coun­ter­fac­tu­als, but there is still no prob­lem with log­i­cal coun­ter­fac­tu­als. So what is “the prob­lem of log­i­cal coun­ter­fac­tu­als”?

• They’re log­i­cally in­co­her­ent so your rea­son­ing about them is limited. If you gain in com­put­ing power then you need to stop be­ing a re­al­ist about them or else your rea­son­ing ex­plodes.

• They are not log­i­cally in­co­her­ent in then­selves. They are in­con­sis­tent with what ac­tu­ally hap­pened. That means that if you try to be bun­dle the hy­po­thet­i­cal,the log­i­cal coun­ter­fac­tual ,in with your model of re­al­ity, the re­sult­ing mish mash will be in­con­sis­tent. But the re­sult­ing mish mash isn’t the log­i­cal coun­ter­fac­tual per se.

W can think about coun­ter­fac­tu­als with­out our heads the ex­plod­ing. That is the cor­rect start­ing point. How is that pos­si­ble? The ob­vi­ous an­swer is that con­sid­er­a­tion of hy­po­thet­i­cal sce­nar­ios takes place in a sand­box.

• They are log­i­cally in­co­her­ent in them­selves though. Sup­pose the agent’s source code is “A”. Sup­pose that in fact, A re­turns ac­tion X. Con­sider a log­i­cal coun­ter­fac­tual “pos­si­ble world” where A re­turns ac­tion Y. In this log­i­cal coun­ter­fac­tual, it is pos­si­ble to de­duce a con­tra­dic­tion: A re­turns X (by com­pu­ta­tion/​logic) and re­turns Y (by as­sump­tion) and X is not equal to Y. Hence by the prin­ci­ple of ex­plo­sion, ev­ery­thing is true.

It isn’t nec­es­sary to ob­serve that A re­turns X in real life, it can be de­duced from logic.

(Note that this doesn’t ex­clude the log­i­cal ma­te­rial con­di­tion­als de­scribed in the post, only log­i­cal coun­ter­fac­tu­als)

• Source code doesn’t en­tirely de­ter­mine the re­sult, in­puts are also re­quired.* Thus “log­i­cal coun­ter­fac­tu­als” -rea­son­ing about what a pro­gram will re­turn if I in­put y? This can be done by ask­ing ‘if I had in­put y in­stead of x’ or ‘if I in­put y’ even if I later de­cide to in­put x.

While it can be said that such con­sid­er­a­tions ren­der one’s “out­put” con­di­tional on logic, they re­main en­tirely con­di­tional on rea­son­ing about a model, which may be in­cor­rect. It seems more use­ful to re­fer to such a re­la­tion as con­di­tional on one’s mod­els/​rea­son­ing, or even pro­cesses in the world. A calcu­la­tor may be mi­sused—a 2 in­stead of a 3 here, hit­ting “=” one too many times, there, etc.

(Say­ing it is im­pos­si­ble for a ra­tio­nal agent that knows X to do Y, and agent A is not do­ing Y, does not es­tab­lish that A is ir­ra­tional—even if the premises are true, what fol­lows is that A is not ra­tio­nal or does not know X.)

*Un­less source code is defined as in­clud­ing the in­puts.

• You are as­sum­ing a very strong set of con­di­tions..that de­ter­minism holds,that the agent has perfect knowl­edge of its source code, and that it is com­pel­led to con­sider hy­po­thet­i­cal situ­a­tions in max­i­mum re­s­olu­tion.

• Those are the con­di­tions in which log­i­cal coun­ter­fac­tu­als are most well-mo­ti­vated. If there isn’t de­ter­minism or known source code then there isn’t an ob­vi­ous rea­son to be con­sid­er­ing im­pos­si­ble pos­si­ble wor­lds.

• Those are the con­di­tions un­der which coun­ter­fac­tu­als are flat out im­pos­si­ble. But we have plenty of mo­ti­va­tion to con­sider hy­po­thet­i­cals ,and we don’t gen­er­ally know how pos­si­ble they are