# Counterfactuals for Perfect Predictors

Parfit’s Hitch­hiker with a perfect pre­dic­tor has the un­usual prop­erty of hav­ing a Less Wrong con­sen­sus that you ought to pay, whilst also be­ing sur­pris­ingly hard to define for­mally. For ex­am­ple, if we try to ask about whether an agent that never pays in town is ra­tio­nal, then we en­counter a con­tra­dic­tion. A perfect pre­dic­tor would not ever give such an agent a lift, so by the Prin­ci­ple of Ex­plo­sion we can prove any state­ment to be true given this coun­ter­fac­tual.

On the other hand, even if the pre­dic­tor mis­tak­enly picks up defec­tors only 0.01% of the time, then this coun­ter­fac­tual seems to have mean­ing. Let’s sup­pose that a ran­dom num­ber from 1 to 10,000 is cho­sen and the pre­dic­tor always picks you up when the num­ber is 1 and is perfect oth­er­wise. Even if we draw the num­ber 120, we can fairly eas­ily imag­ine the situ­a­tion where the num­ber drawn was 1 in­stead. This is then a co­her­ent situ­a­tion where an Always Defect agent would end up in town, so we can talk about how the agent would have coun­ter­fac­tu­ally cho­sen.

So one re­sponse to the difficul­ties of dis­cussing coun­ter­fac­tual de­ci­sions with perfect pre­dic­tors would be to sim­ply com­pute the coun­ter­fac­tual as though the agent has a (tiny) chance of be­ing wrong. How­ever, agents may quite un­der­stand­ably wish to act differ­ently de­pend­ing on whether they are fac­ing a perfect or im­perfect pre­dic­tor, even choos­ing differ­ently when fac­ing an agent with a very low er­ror rate.

Another would be to say that the pre­dic­tor pre­dicts whether plac­ing the agent in town is log­i­cally co­her­ent. On the ba­sis that the agent only picks up those who it pre­dicts (with 100% ac­cu­racy) will pay, it can as­sume that it will be payed if the situ­a­tion is co­her­ent. Un­for­tu­nately, it isn’t clear what this means in con­crete terms for an agent to be such that it couldn’t co­her­ently be placed in such a situ­a­tion. How is, “I com­mit to not pay­ing in <im­pos­si­ble situ­a­tion>” any kind of mean­ingful com­mit­ment at all? We could look at, “I com­mit to mak­ing <situ­a­tion> im­pos­si­ble”, but that doesn’t mean any­thing ei­ther. If you’re in a situ­a­tion, then it must be pos­si­ble? Fur­ther, such situ­a­tions are con­tra­dic­tory and ev­ery­thing is true given a con­tra­dic­tion, so all con­tra­dic­tory situ­a­tions seem to be the same.

As the for­mal de­scrip­tion of my solu­tion is rather long, I’ll provide a sum­mary: We will as­sume that each pos­si­ble world model cor­re­sponds to at least one pos­si­ble se­quence of ob­ser­va­tions. For world mod­els that are con­sis­tent con­di­tional on the agent mak­ing cer­tain de­ci­sions, we’ll take the set of ob­ser­va­tions for agents that are con­sis­tent and feed it into the set of agents who aren’t. This will be in­ter­preted as what they would have coun­ter­fac­tu­ally cho­sen in such a situ­a­tion.

A For­mal De­scrip­tion of the Problem

(You may wish to skip di­rectly to the dis­cus­sion)

My solu­tion will be to in­clude ob­ser­va­tions in our model of the coun­ter­fac­tual. Most such prob­lems can be mod­el­led as fol­lows:

Let x be a la­bel that refers to one par­tic­u­lar agent that will be called the cen­tered agent for short. It should gen­er­ally re­fer to the agent whose de­ci­sions we are op­ti­mis­ing. In Parfit’s Hitch­hiker, x refers to the Hitch­hiker.

Let W be a set of pos­si­ble “world mod­els with holes”. That is, each is a col­lec­tion of facts about the world, but not in­clud­ing facts about the de­ci­sion pro­cesses of x which should ex­ist as an agent in this world. Th­ese will in­clude the prob­lem state­ment.

To demon­strate, we’ll con­struct I for this prob­lem. We start off by defin­ing the vari­ables:

• t: Time

• 0 when you en­counter the Driver

• 1 af­ter you’ve ei­ther been dropped off in Town or left in the Desert

• l: Lo­ca­tion. Either Desert or Town

• Act: The ac­tual ac­tion cho­sen by the hitch­hiker if they are in Town at t=1. Either Pay or Don’t Pay or Not in Town

• Pred: The driver’s pre­dic­tion of x’s ac­tion if the driver were to drop them in town. Either Pay or Don’t Pay (as we’ve already noted, defin­ing this coun­ter­fac­tual is prob­le­matic, but we’ll provide a cor­rec­tion later)

• u: Utility of the hitchhiker

We can now provide the prob­lem state­ment as a list of facts:

• Time: t is a time variable

• Lo­ca­tion:

• l=Desert at t=0

• l=Town at t=1 if Pred=Pay

• l=Desert at t=1 if Pred=Don’t Pay

• Act:

• Not in Town at t=0

• Not in Town if l=Desert at t=1

• Pay or Don’t Pay if l=Town at t=1

• Pre­dic­tion: The Pre­dic­tor is perfect. A more for­mal defi­ni­tion will have to wait

• Utility:

• u=0 at t=0

• At t=1: Sub­tract 1,000,000 from u if l=Desert

• At t=1: Sub­tract 50 from u if Act=Pay

W then con­tains three dis­tinct world mod­els:

• Start­ing World Model—w1:

• t=0, l=Desert, Act=Not in Town, Pred: varies, u=0

• End­ing Town World Model—w2:

• t=1, l=Town, Act: varies, Pred: Pay, u: varies

• End­ing Desert World Model—w3:

• t=1, l=Desert, Act: Not in Town, Pred: Don’t Pay, u=-1,000,000

The prop­er­ties listed as varies will only be known once we have in­for­ma­tion about x. Fur­ther, it is im­pos­si­ble for cer­tain agents to ex­ist in cer­tain wor­lds given the rules above.

Let O be a set of pos­si­ble se­quences of ob­ser­va­tions. It should be cho­sen to con­tain all ob­ser­va­tions that could be made by the cen­tered agent in the given prob­lem and there should be at least one set of ob­ser­va­tions rep­re­sent­ing each pos­si­ble world model with holes. We will do some­thing slightly un­usual and in­clude the prob­lem state­ment as a set of ob­ser­va­tions. One in­tu­ition that might help illus­trate this is to imag­ine that the agent has an or­a­cle that al­lows it to di­rectly learn these facts.

For this ex­am­ple, the pos­si­ble in­di­vi­d­ual ob­ser­va­tions grouped by type are:

• Lo­ca­tion Events: <l=Desert> OR <l=Town>

• Time Events: <t=0> OR <t=1>

• Prob­lem State­ment: There should be an en­try for each point in the prob­lem state­ment as de­scribed for I. For ex­am­ple:

• <l=desert at t=0>

O then con­tains three dis­tinct ob­ser­va­tion se­quences:

• Start­ing World Model—o1:

• <Prob­lem State­ment> <t=0> <l=Desert>

• End­ing Town World Model—o2:

• <Prob­lem State­ment> <t=0> <l=Desert> <t=1> <l=Town>

• End­ing Desert World Model—o3:

• <Prob­lem State­ment> <t=0> <l=Desert> <t=1> <l=Desert>

Of course, <t=0><l=Desert> is ob­served ini­tially in each world so we could just re­move it to provide sim­plified se­quences of ob­ser­va­tions. I sim­ply write <Prob­lem State­ment> in­stead of ex­plic­itly list­ing each item as an ob­ser­va­tion.

Re­gard­less of its de­ci­sion al­gorithm, we will as­so­ci­ate x with a fixed Fact-Deriva­tion Al­gorithm f. This al­gorithm will take a spe­cific se­quences of ob­ser­va­tions o and pro­duce an id rep­re­sent­ing a world model with holes w. The rea­son why it pro­duces an id is that some se­quences of ob­ser­va­tions won’t lead to a co­her­ent world model for some agents. For ex­am­ple, the End­ing in Town Se­quence of ob­servers can never be ob­served by an agent that never pays. To han­dle this, we will as­sume that each in­com­plete world model w is as­so­ci­ated with a unique in­te­ger [w]. In this case, we might log­i­cally choose, [w1]=1, [w2]=2, [w3]=3 and then f(o1)=[w1], f(o2)=[w2], f(o3)=[w3]. We will define m to map from these id’s to the cor­re­spond­ing in­com­plete world model.

We will write D for the set of pos­si­ble de­ci­sions al­gorithms that x might pos­sess. In­stead of hav­ing these al­gorithms op­er­ate on ei­ther ob­ser­va­tions or world mod­els, we will make them op­er­ate on the world ids that are pro­duced by the Fact-Deriva­tion Al­gorithm so that they still pro­duce ac­tions in con­tra­dic­tory wor­lds. For ex­am­ple, define:

• d2 - Always Pay

• d3- Never Pay

If d2 sees [O3] or d3 sees [O2], then it knows that this is im­pos­si­ble ac­cord­ing to its model. How­ever, it isn’t ac­tu­ally im­pos­si­ble as its model could be wrong. Fur­ther, these “im­pos­si­ble” pre-com­mit­ments now mean some­thing tan­gible. The agent has pre-com­mit­ted to act a cer­tain way if it ex­pe­riences a par­tic­u­lar se­quence of ob­ser­va­tions.

We can now for­mal­ise the Driver’s Pre­dic­tion as fol­lows for situ­a­tions that are only con­di­tion­ally con­sis­tent (we noted be­fore that this needed to be cor­rected). Let o be the se­quence of ob­ser­va­tions and d0 be a de­ci­sion al­gorithm that is con­sis­tent with o, while d1 is a de­ci­sion al­gorithm that is in­con­sis­tent with it. Let w=m(f(o)), which is a con­sis­tent world given d0. Then the coun­ter­fac­tual of what d1 would do in w is defined as: d1(f(o)). We’ve now defined what it means to be a “perfect pre­dic­tor”. There is how­ever one po­ten­tial is­sue, per­haps mul­ti­ple ob­ser­va­tions led to w? In this case, we need to define the world more pre­cisely and in­clude ob­ser­va­tional de­tails in the model. Even if these de­tails don’t seem to change the prob­lem from a stan­dard de­ci­sion the­ory per­spec­tive, they may still af­fect the pre­dic­tions of ac­tions in im­pos­si­ble coun­ter­fac­tu­als.

Discussion

In most de­ci­sion the­ory prob­lems, it is eas­ier to avoid dis­cussing ob­ser­va­tions any more than nec­es­sary. Gen­er­ally, the agent makes some ob­ser­va­tions, but their knowl­edge of most of the setup is mostly as­sumed. This ab­strac­tion gen­er­ally works well, but it leads to con­fu­sion in cases like this where we are deal­ing with pre­dic­tors who want to know if they can co­her­ently put an­other agent in a spe­cific situ­a­tion. As we’ve shown, even though it is mean­ingless to ask what an agent would do given an im­pos­si­ble situ­a­tion, it is mean­ingful to ask what the agent would do given an im­pos­si­ble in­put.

When ask­ing what any real agent would do in a real world prob­lem, we can always restate it as ask­ing about what the agent would do given a par­tic­u­lar in­put. How­ever, us­ing the trick of sep­a­rat­ing ob­ser­va­tions doesn’t limit us to real world prob­lems; as we’ve seen, we can use the trick of rep­re­sent­ing the prob­lem state­ment as di­rect ob­ser­va­tions to rep­re­sent more ab­stract prob­lems. The next log­i­cal step is to try ex­tend­ing this to cases such as, “What if the 1000th digit of Pi were even?” This al­lows us to avoid the con­tra­dic­tion and deal with situ­a­tions that are at least con­sis­tent, but it doesn’t provide much in the way of hints of how to solve these prob­lems in gen­eral. Nonethe­less, I figured that I may as well start with the the one prob­lem that was the most straight­for­ward.

Up­date: After reread­ing the de­scrip­tion of Up­date­less De­ci­sion The­ory, I re­al­ise that it is already us­ing some­thing very similar to this tech­nique as de­scribed here. So the main con­tri­bu­tion of this ar­ti­cle seems to be ex­plor­ing a part of UDT that is nor­mally not ex­am­ined in much de­tail.

One differ­ence though is that UDT uses a Math­e­mat­i­cal In­tu­ition Func­tion that maps from in­puts to a prob­a­bil­ity dis­tri­bu­tion of ex­e­cu­tion his­to­ries, in­stead of a Fact-Deriva­tion Al­gorithm that maps from in­puts to mod­els and only for con­sis­tent situ­a­tions. One ad­van­tage of break­ing it down as I do is to clar­ify that UDT’s ob­ser­va­tion-ac­tion maps don’t only in­clude en­tries for pos­si­ble ob­ser­va­tions, but ob­ser­va­tions that it would be con­tra­dic­tory for an agent to make. Se­condly, it clar­ifies that UDT pre­dic­tors pre­dict agents based on how they re­spond to in­puts rep­re­sent­ing situ­a­tions, rather than di­rectly on situ­a­tions them­selves, which is im­por­tant for im­pos­si­ble situ­a­tions.

• If you’re in a situ­a­tion, then it must be pos­si­ble?

There is a sense in which you can’t con­clude this. For a given no­tion of rea­son­ing about po­ten­tially im­pos­si­ble situ­a­tions, you can rea­son about such situ­a­tions that con­tain agents, and you can see how these agents in the im­pos­si­ble situ­a­tions think. If the situ­a­tion doesn’t tell the agent whether it’s pos­si­ble or im­pos­si­ble (say, through ob­ser­va­tions), the agent in­side won’t be able to tell if it’s an im­pos­si­ble situ­a­tion. Always con­clud­ing that the pre­sent situ­a­tion is pos­si­ble will re­sult in er­ror in the im­pos­si­ble situ­a­tions (so it’s even worse than be­ing un­jus­tified). Er­rors in im­pos­si­ble situ­a­tions may mat­ter if some­thing that mat­ters de­pends on how you rea­son in im­pos­si­ble situ­a­tions (for ex­am­ple, a “pre­dic­tor” in a pos­si­ble situ­a­tion that asks what you would do in im­pos­si­ble situ­a­tions).

We could look at, “I com­mit to mak­ing <situ­a­tion> im­pos­si­ble”, but that doesn’t mean any­thing ei­ther.

A use­ful sense of an “im­pos­si­ble situ­a­tion” won’t make it im­pos­si­ble to rea­son about. There’s prob­a­bly some­thing wrong with it, but not to the ex­tent that it can’t be con­sid­ered. Maybe it falls apart if you look too closely, or maybe it has no moral worth and so should be dis­carded from de­ci­sion mak­ing. But even in these cases it might be in­stru­men­tally valuable, be­cause some­thing in morally rele­vant wor­lds de­pends on this kind of rea­son­ing. You might not ap­prove of this kind of rea­son­ing and call it mean­ingless, but other things in the world can perform it re­gard­less of your judge­ment, and it’s use­ful to un­der­stand how that hap­pens to be able to con­trol them.

Fi­nally, some no­tions of “im­pos­si­ble situ­a­tion” will say that a “situ­a­tion” is pos­si­ble/​im­pos­si­ble de­pend­ing on what hap­pens in­side it, and there may be agents in­side it. In that case, their de­ci­sions may af­fect whether a given situ­a­tion is con­sid­ered “pos­si­ble” or “im­pos­si­ble”, and if these agents are fa­mil­iar with this no­tion they can aim to make a given situ­a­tion they find them­selves in pos­si­ble or im­pos­si­ble.

• “There is a sense in which you can’t con­clude this”—Well this para­graph is pretty much an in­for­mal de­scrip­tion of how my tech­nique works. Only I differ­en­ti­ate be­tween world mod­els and rep­re­sen­ta­tions of world mod­els. Agents can’t op­er­ate on in­co­her­ent world mod­els, but they can op­er­ate on rep­re­sen­ta­tions of world mod­els that are in­co­her­ent for this agent. It’s also the rea­son why I sep­a­rated out ob­ser­va­tions from mod­els.

“In that case, their de­ci­sions may af­fect whether a given situ­a­tion is con­sid­ered “pos­si­ble” or “im­pos­si­ble”, and if these agents are fa­mil­iar with this no­tion they can aim to make a given situ­a­tion they find them­selfs in pos­si­ble or im­pos­si­ble”—My an­swer to this ques­tion is that it is mean­ingless to ask what an agent does given an im­pos­si­ble situ­a­tion, but mean­ingful to ask what it does given an im­pos­si­ble in­put (which ul­ti­mately rep­re­sents an im­pos­si­ble situ­a­tion).

I get the im­pres­sion that you didn’t quite grasp the gen­eral point of this post. I sus­pect that the rea­son may be that the for­mal de­scrip­tion may be less skip­pable than I origi­nally thought.

• I was re­ply­ing speci­fi­cally to those re­marks, on their use of ter­minol­ogy, not to the the­sis of the post. I dis­agree with the fram­ing of “im­pos­si­ble situ­a­tions” and “mean­ingless” for the rea­sons I de­scribed. I think it’s use­ful to let these words (in the con­text of de­ci­sion the­ory) take de­fault mean­ing that makes the state­ments I quoted mis­lead­ing.

My an­swer to this ques­tion is that it is mean­ingless to ask what an agent does given an im­pos­si­ble situ­a­tion, but mean­ingful to ask what it does given an im­pos­si­ble in­put (which ul­ti­mately rep­re­sents an im­pos­si­ble situ­a­tion).

That’s the thing: if this “im­pos­si­ble in­put” rep­re­sents an “im­pos­si­ble situ­a­tion”, and it’s pos­si­ble to ask what hap­pens for this in­put, that gives a way of rea­son­ing about the “im­pos­si­ble situ­a­tion”, in which case it’s mis­lead­ing to say that “it is mean­ingless to ask what an agent does given an im­pos­si­ble situ­a­tion”. I of course agree that you can make a tech­ni­cal dis­tinc­tion, but even then it’s not clear what you mean by call­ing an idea “mean­ingless” when you im­me­di­ately pro­ceed to give a way of rea­son­ing about (a tech­ni­cal re­for­mu­la­tion of) that idea.

If an idea is con­fused in some way, even sig­nifi­cantly, that shouldn’t be enough to de­clare it “mean­ingless”. Per­haps “hope­lessly con­fused” and “use­less”, but not yet “mean­ingless”. Un­less you are talk­ing about a more spe­cific sense of “mean­ing”, which you didn’t stipu­late. My guess is that by “mean­ingless” you meant that you don’t see how it could ever be made clear in its origi­nal form, or that in the con­text of this post it’s not at all clear com­pared to the idea of “im­pos­si­ble in­put” that’s ac­tu­ally clar­ified. But that’s an un­usual sense for that word.

• I guess I saw those mainly as fram­ing re­marks, so I may have been less care­ful with my lan­guage than el­se­where. Maybe “mean­ingless” is a strong word, but I only meant it in a spe­cific way that I hoped was clear enough from con­text.

I was us­ing situ­a­tions to re­fer to ob­jects where the equiv­alence func­tion is log­i­cal equiv­alence, whilst I was us­ing rep­re­sen­ta­tions to re­fer to ob­jects where the equiv­alence func­tion is the spe­cific for­mu­la­tion. My point was that all im­pos­si­ble situ­a­tions are log­i­cally equiv­a­lent, so ask­ing what an agent does in this situ­a­tion is of limited use. An agent that op­er­ates di­rectly on such im­pos­si­ble situ­a­tions can only have one such re­sponse to these situ­a­tions, even across mul­ti­ple prob­lems. On the other hand, rep­re­sen­ta­tions don’t have this limi­ta­tion.

• My point was that all im­pos­si­ble situ­a­tions are log­i­cally equivalent

Yes, the way you are for­mu­lat­ing this, as a the­ory that in­cludes claims about agent’s ac­tion or other coun­ter­fac­tual things to­gether with things from the origi­nal set­ting that con­tra­dict them such as agent’s pro­gram. It’s also very nat­u­ral to ex­cise parts of a situ­a­tion (just as you do in the post) and re­place them with the al­ter­na­tives you are con­sid­er­ing. It’s what hap­pens with causal surgery.

An agent that op­er­ates di­rectly on such im­pos­si­ble situ­a­tions can only have one such re­sponse to these situ­a­tions, even across mul­ti­ple prob­lems.

If it re­spects equiv­alence of the­o­ries (which is in gen­eral im­pos­si­ble to de­cide) and doesn’t know where the the­o­ries came from, so that this es­sen­tial data is some­how lost be­fore that point. I think it’s use­ful to split this pro­cess into two phases, where first the agent looks for it­self in the wor­lds it cares about, and only then con­sid­ers the con­se­quences of al­ter­na­tive ac­tions. The first phase gives a world that has all dis­cov­ered in­stances of the agent ex­cised from it (a “de­pen­dence” of world on agent), so that on the sec­ond phase we can plug in al­ter­na­tive ac­tions (or strate­gies, maps from ob­ser­va­tions to ac­tions, as the type of the ex­cised agent will be some­thing like ex­po­nen­tial if the agent ex­pects in­put).

At that point the difficulty is mostly on the first phase, for­mu­la­tion of de­pen­dence. (By the way, in this view there is no prob­lem with perfect pre­dic­tors, since they are just equiv­a­lent to the agent and be­come one of the lo­ca­tions where the agent finds it­self, no differ­ent from any other. It’s the im­perfect pre­dic­tors, such as too-weak pre­dic­tors of Agent-Si­mu­lates-Pre­dic­tor (ASP) or other such things that cause trou­ble.) The main difficulty here is spu­ri­ous de­pen­den­cies, since in prin­ci­ple the agent is equiv­a­lent to their ac­tual ac­tion, and so con­versely the value of their ac­tual ac­tion found some­where in the world is equiv­a­lent to the agent. So the agent finds it­self be­hind all an­swers “No” in the world (ut­tered by any­one and any­thing) if it turns out that their ac­tual ac­tion is “No” etc., and the con­se­quences of an­swer­ing “Yes” then in­volve chang­ing all an­swers “No” to “Yes” ev­ery­where in the world. (When run­ning the search, the agent won’t ac­tu­ally en­counter spu­ri­ous de­pen­den­cies un­der cer­tain cir­cum­stances, but that’s a bit flimsy.)

This shows that even equiv­alence of pro­grams is too strong when search­ing for your­self in the world, or at least the proof of equiv­alence shouldn’t be ir­rele­vant in the re­sult­ing de­pen­dence. So this fram­ing doesn’t ac­tu­ally help with log­i­cal coun­ter­fac­tu­als, but at least the sec­ond phase where we con­sider al­ter­na­tive ac­tions is spared the trou­ble, if we some­how man­age to find use­ful de­pen­den­cies.

• “By the way, in this view there is no prob­lem with perfect pre­dic­tors, since they are just equiv­a­lent to the agent and be­come one of the lo­ca­tions where the agent finds it­self”—Well, this still runs into is­sues as the simu­lated agent en­coun­ters an im­pos­si­ble situ­a­tion, so aren’t we still re­quired to use the work around (or an­other workaround if you’ve got one)?

“This shows that even equiv­alence of pro­grams is too strong when search­ing for your­self in the world, or at least the proof of equiv­alence shouldn’t be ir­rele­vant in the re­sult­ing de­pen­dence”—Hmm, agents may take mul­ti­ple ac­tions in a de­ci­sion prob­lem. So aren’t agents only equiv­a­lent to pro­grams that take the same ac­tion in each situ­a­tion? Any­way, I was talk­ing about equiv­alence of wor­lds, not of agents, but this is still an in­ter­est­ing point that I need to think through. (Fur­ther, are you say­ing that agents should only be con­sid­ered to have their be­havi­our linked to agents they are prov­ably equiv­a­lent too and in­stead of all agents they are equiv­a­lent to?)

“A use­ful sense of an “im­pos­si­ble situ­a­tion” won’t make it im­pos­si­ble to rea­son about”—That’s true. My first thought was to con­sider how the pro­gram rep­re­sents its model the world and imag­in­ing run­ning the pro­gram with im­pos­si­ble world model rep­re­sen­ta­tions. How­ever, the nice thing about mod­el­ling the in­puts and treat­ing model rep­re­sen­ta­tions as in­te­gers rather than spe­cific struc­tures, is that it al­lows us to ab­stract away from these kinds of in­ter­nal de­tails. Is there a spe­cific rea­son why you might want to avoid this ab­strac­tion?

UPDATE: I just re-read your com­ment and found that I sig­nifi­cantly mi­s­un­der­stood it, so I’ve made some large ed­its to this com­ment. I’m still not com­pletely sure that I un­der­stand what you were driv­ing at.

• Well, this still runs into is­sues as the simu­lated agent en­coun­ters an im­pos­si­ble situation

The simu­lated agent, to­gether with the origi­nal agent, are re­moved from the world to form a de­pen­dence, which is a world with holes (free vari­ables). If we sub­sti­tute the agent term for the vari­ables in the de­pen­dence, the re­sult is equiv­a­lent (not nec­es­sar­ily syn­tac­ti­cally equal) to the world term as origi­nally given. To test a pos­si­ble ac­tion, this pos­si­ble ac­tion is sub­sti­tuted for the vari­ables in the de­pen­dence. The re­sult­ing term no longer in­cludes in­stances of the agent, in­stead it in­cludes an ac­tion, so there is no con­tra­dic­tion.

Hmm, agents may take mul­ti­ple ac­tions in a de­ci­sion prob­lem. So aren’t agents only equiv­a­lent to pro­grams that take the same ac­tion in each situ­a­tion?

A pro­to­col for in­ter­act­ing with en­vi­ron­ment can be ex­pressed with the type of de­ci­sion. So if an agent makes an ac­tion of type A de­pend­ing on an ob­ser­va­tion of type O, we can in­stead con­sider (O->A) as the type of its de­ci­sion, so that the only thing that it needs to do is pro­duce a de­ci­sion in this way, with in­ter­ac­tion be­ing some­thing that hap­pens to the de­ci­sion and not the agent.

Re­quiring that only pro­grams com­pletely equiv­a­lent to the agent are to be con­sid­ered its in­stances may seem too strong, and it prob­a­bly is, but the prob­lem is that it’s also not strong enough, be­cause even with this re­quire­ment there are spu­ri­ous de­pen­den­cies that say that an agent is equiv­a­lent to a piece of pa­per that hap­pens to con­tain a de­ci­sion that co­in­cides with agent’s own. So it’s a good sim­plifi­ca­tion for fo­cus­ing on log­i­cal coun­ter­fac­tu­als (in the log­i­cal di­rec­tion, which I be­lieve is less hope­less than find­ing an­swers in prob­a­bil­ity).

Fur­ther, are you say­ing that agents should only be con­sid­ered to have their be­havi­our linked to agents they are prov­ably equiv­a­lent [to] in­stead of all agents they are equiv­a­lent to?

Not sure what the dis­tinc­tion you are mak­ing is. How would you define equiv­alence? By equiv­alence I meant equiv­alence of lambda terms, where one can be rewrit­ten into the other with a se­quence of alpha, re­duc­tion and ex­pan­sion rules, or some­thing like that. It’s judge­men­tal/​com­pu­ta­tional/​re­duc­tional equal­ity of type the­ory, as op­posed to propo­si­tional equal­ity, which can be weaker, but since judge­men­tal equal­ity is already too weak, it’s prob­a­bly the wrong place to look for an im­prove­ment.

• The simu­lated agent, to­gether with the origi­nal agent, are re­moved from the world to form a de­pen­dence, which is a world with holes (free vari­ables)

I’m still hav­ing difficulty un­der­stand­ing the pro­cess that you’re fol­low­ing, but let’s see if I can cor­rectly guess this. Firstly you make a list of all po­ten­tial situ­a­tions that an agent may ex­pe­rience or for which an agent may be simu­lated. De­ci­sions are in­cluded in this list, even if they might be in­co­her­ent for par­tic­u­lar agents. In this ex­am­ple, these are:

• Ac­tual_De­ci­sion → Co-op­er­ate/​Defect

• Si­mu­lated_De­ci­sion → Co-op­er­ate/​Defect

We then group all nec­es­sar­ily linked de­ci­sions to­gether:

• (Ac­tual_De­ci­sion, Si­mu­lated_De­ci­sion) → (Co-op­er­ate, Co-op­er­ate)/​(Defect, Defect)

You then con­sider the tu­ple (equiv­a­lent to an ob­ser­va­tion-ac­tion map) that leads to the best out­come.

I agree that this pro­vides the cor­rect out­come, but I’m not per­suaded that the rea­son­ing is par­tic­u­larly solid. At some point we’ll want to be able to tie these mod­els back to the real world and ex­plain ex­actly what kind of hitch­hiker cor­re­sponds to a (Defect, Defect) tu­ple. A hitch­hiker that doesn’t get a lift? Sure, but what prop­erty of the hitch­hiker makes it not get a lift?

We can’t talk about any ac­tions it chooses in the ac­tual world his­tory, as it is never given the chance to make this de­ci­sion. Next we could try con­struct­ing a coun­ter­fac­tual as per CDT and con­sider what the hitch­hiker does in the world model where we’ve performed model surgery to make the hitch­hiker ar­rive in town. How­ever, as this is an im­pos­si­ble situ­a­tion, there’s no guaran­tee that this de­ci­sion is con­nected to any de­ci­sion the agent makes in a pos­si­ble situ­a­tion. TDT coun­ter­fac­tu­als don’t help ei­ther as they are equiv­a­lent to these tu­ples.

Alter­na­tively, we could take the ap­proach that you seem to favour and say that the agent makes the de­ci­sion to defect in a para­con­sis­tent situ­a­tion where it is in town. But this as­sumes that the agent has the abil­ity to han­dle para­con­sis­tent situ­a­tions when only some agents have this abil­ity. It’s not clear how to in­ter­pret this for other agents. How­ever, in­puts have nei­ther of these prob­lems—all real world agents must do some­thing given an in­put even if it is do­ing noth­ing or crash­ing and these are easy to in­ter­pret. So mod­el­ling in­puts al­lows us to more rigor­ously jus­tify the use of these maps. I’m be­gin­ning to think that there would be a whole post worth of ma­te­rial if I ex­panded upon this com­ment.

How would you define equiv­alence?

I think I was us­ing the wrong term. I meant linked in the log­i­cal coun­ter­fac­tual sense, say two iden­ti­cal calcu­la­tors. Is there a term for this? I was try­ing to un­der­stand whether you were say­ing that we only care about the prov­able link­ages, rather than all such link­ages.

Edit: Ac­tu­ally, af­ter reread­ing over UDT, I can see that it is much more similar than I re­al­ised. For ex­am­ple, it also sep­a­rates in­puts from mod­els. More de­tailed in­for­ma­tion is in­cluded at the bot­tom of the post.

• Firstly you make a list of all po­ten­tial situ­a­tions that an agent may ex­pe­rience or for which an agent may be simu­lated. De­ci­sions are in­cluded in this list, even if they might be in­co­her­ent for par­tic­u­lar agents.

No? Si­tu­a­tions are not eval­u­ated, they con­tain in­stances of the agent, but when they are con­sid­ered, it’s not yet known what the de­ci­sion will be, so de­ci­sions are un­known, even if in prin­ci­ple de­ter­mined by the (agents in the) situ­a­tion. There is no mat­ich­ing or as­sign­ment of pos­si­ble de­ci­sions when we iden­tify in­stances of the agent. Next, the in­stances are re­moved from the situ­a­tion. At this point, de­ci­sions are no longer de­ter­mined in the situ­a­tions-with-holes (de­pen­den­cies), since there are no agents and no de­ci­sions re­main­ing in them. So there won’t be a con­tra­dic­tion in putting in any de­ci­sions af­ter that (with­out the agents!) and see­ing what hap­pens.

I meant linked in the log­i­cal coun­ter­fac­tual sense, say two iden­ti­cal calcu­la­tors.

That doesn’t seem differ­ent from what I meant, if ap­pro­pri­ately for­mu­lated.

• I think the limit you’re run­ning up against is how to for­mally define “pos­si­ble”, and what model of de­ci­sion-mak­ing and free will is con­sis­tent with a “perfect pre­dic­tor”.

For many of us, “perfect pre­dic­tor” im­plies “de­ter­min­stic fu­ture, with choice be­ing an illu­sion”. Whether that’s truly pos­si­ble in our uni­verse or not is un­known.

• Whether or not the uni­verse is or isn’t truly de­ter­minis­tic (not the fo­cus of this thread), it is a com­mon enough be­lief that it’s worth mod­el­ling.

• No­tice how none of these difficul­ties arise if you adopt the ap­proach I had posted re­cently about, that you do not change the world, you dis­cover what pos­si­ble sub­jec­tive world you live in. The ques­tion is always about what world model the agent has, not about the world it­self, and about dis­cov­er­ing more about that world model.

In the Parfit’s hitch­hiker prob­lem with the driver who is a perfect pre­dic­tor, there is no pos­si­ble world where the hitch­hiker gets a lift but does not pay. The non-deliri­ous agent will end up ad­just­ing their world model to “Damn, ap­par­ently I am the sort of per­son who at­tempts to trick the driver, fails and dies” or “Happy I am the type of per­son who pre­com­mits to pay­ing”, for ex­am­ple. There are many more pos­si­ble wor­lds in that prob­lem if we in­clude the agents whose world model is not prop­erly ad­justed based on the in­put. In se­vere cases this is known as psy­chosis.

Similarly, “What if the 1000th digit of Pi were even?” is a ques­tion about par­ti­tion­ing pos­si­ble wor­lds in your mind. No­tice that there are not just two classes of those:

Th­ese classes in­clude the pos­si­ble wor­lds where you learn that the 1000th digit of Pi is even, the wor­lds where you learn that it is odd, the wor­lds where you never bother figur­ing it out, the wor­lds where you learned one an­swer, but then had to reeval­u­ate it, or ex­am­ple, be­cause you found a mis­take in the calcu­la­tions. There are also low-prob­a­bil­ity pos­si­ble wor­lds, like those where Pi only has 999 digits, where the 1000th digit keeps chang­ing, and so on. All those are pos­si­ble world mod­els, just some are not very prob­a­ble apri­ori for the refer­ence class of agents we are in­ter­ested in.

...But that would be rad­i­cally chang­ing your world model from “there is the sin­gle ob­jec­tive re­al­ity about which we ask ques­tions” to “agents are con­stantly ad­just­ing mod­els, and some mod­els are bet­ter than oth­ers at an­ti­ci­pat­ing fu­ture in­puts.”

• I’m not sure sure that it solves the prob­lem. The is­sue is that in the case where you always choose “Don’t Pay” it isn’t easy to define what the pre­dic­tor pre­dicts as it is im­pos­si­ble for you to end up in town. The pre­dic­tor could ask what you’d do if you thought the pre­dic­tor was im­perfect (as then end­ing up in town would ac­tu­ally be pos­si­ble), but this mightn’t rep­re­sent how you’d be­have against a perfect pre­dic­tor.

(But fur­ther, I am work­ing within the as­sump­tion that ev­ery­thing is de­ter­minis­tic and that you can’t ac­tu­ally “change” the world as you say. How have I as­sumed the con­trary?)

• The prin­ci­ple of ex­plo­sion isn’t a prob­lem for all log­ics.

I think in a way, the prob­lem with Parfit’s Hitch­hiker is—how would you know that some­thing is a perfect pre­dic­tor? In or­der to have a prob­a­bil­ity p of mak­ing ev­ery pre­dic­tion right, over n pre­dic­tions, only re­quires a pre­dic­tor be right in their pre­dic­tions with prob­a­bil­ity x, where x^n>=p. So they have a bet­ter than 50% chance of mak­ing 100 con­sec­u­tive pre­dic­tions right if they’re right 99.31% of the time. By this met­ric, to be sure the chance they’re wrong is less than 1 in 10,000 (i.e. they’re right 99.99% of the time or more) you’d have to see them make 6,932 cor­rect pre­dic­tions. (This as­sumes that all these pre­dic­tions are in­de­pen­dent, un­re­lated events, in ad­di­tion to a few other coun­ter­fac­tual re­quire­ments that are prob­a­bly satis­fied if this is your first time in such a situ­a­tion.)

• Sure, in the real world you can’t know a pre­dic­tor is perfect. But the point is that perfec­tion is of­ten a use­ful ab­strac­tion and the tools that I in­tro­duced al­low you to ei­ther work with real world prob­lems as you’d seem to pre­fer or with more ab­stract prob­lems which are of­ten eas­ier to work with. Any­way, by rep­re­sent­ing the in­put of the prob­lem ex­plic­itly I’ve cre­ated an ab­strac­tion that is closer to the real world than most of these prob­lems are.

• I was sug­gest­ing that what model you should use if your cur­rent one is in­cor­rect is based on how you got your cur­rent model, which is why it sounds like ‘I pre­fer real world prob­lems’ - model gen­er­a­tion de­tails do seem nec­es­sar­ily spe­cific. (My an­gle was that in life, few things are im­pos­si­ble, many things are im­prob­a­ble—like get­ting out of the desert and not pay­ing.) I prob­a­bly should have stated that, and that only, in­stead of the math.

by rep­re­sent­ing the in­put of the prob­lem ex­plic­itly I’ve cre­ated an ab­strac­tion that is closer to the real world than most of these prob­lems are.

In­deed. I found your post well thought out, and for­mal, though I do not yet fully un­der­stand the jar­gon.

Where/​how did you learn de­ci­sion the­ory?

• Thanks, I ap­pre­ci­ate the com­ple­ment. Even though I have a maths de­gree, I never for­mally stud­ied de­ci­sion the­ory. I’ve only learned about it by read­ing posts on Less Wrong. So much of the jar­gon is my at­tempt to come up with words that suc­cinctly de­scribe the con­cept.