# Eliezer Yudkowsky comments on Ingredients of Timeless Decision Theory

• And I think I can still show that if you run TDT, you will de­cide to self-mod­ify into CDT be­fore start­ing this game

Well that should never hap­pen. Any­thing that would make a TDT want to self-mod­ify into CDT should make it just want to play D, no need for self-mod­ifi­ca­tion. It should give the same an­swer at differ­ent times, that’s what makes it a time­less de­ci­sion the­ory. If you can break that with­out di­rect ex­plicit de­pen­dence on the al­gorithm apart from its de­ci­sions, then I am in trou­ble! But it seems to me that I can sub­sti­tute “play D” for “self-mod­ify” in all cases above.

First, if Omega’s AIs know that you run TDT at the be­gin­ning, then they can use that “play D if you self-mod­ify” strat­egy to de­ter you from self-mod­ify­ing.

E.g., “play D if you play D to de­ter you from play­ing D” seems like the same idea, the self-mod­ifi­ca­tion doesn’t add any­thing.

So who wins this game? (If some­one moves first log­i­cally, then he wins, but what if ev­ery­one moves si­mul­ta­neously in the log­i­cal sense, which seems to be the case in this game?)

Well… it par­tially seems to me that, in as­sum­ing cer­tain de­ci­sions are made with­out log­i­cal con­se­quences—be­cause you move log­i­cally first, or be­cause the TDT agents have fixed wrong pri­ors, etc. - you are try­ing to re­duce the game to a Pri­soner’s Dilemma in which you have a cer­tain chance of play­ing against a piece of card­board with “D” writ­ten on it. Even a uniform pop­u­la­tion of TDTs may go on play­ing C in this case, of course, if the prob­a­bil­ity of fac­ing card­board is low enough. But by the same to­ken, the fact that the card­board some­times “wins” does not make it smarter or more ra­tio­nal than the TDT agents.

Now, I want to be very care­ful about how I use this ar­gu­ment, be­cause in­deed a piece of card­board with “only take box B” writ­ten on it, is smarter than CDT agents on New­comb’s Prob­lem. But who writes that piece of card­board, rather than a differ­ent one?

An au­thor­less piece of card­board gen­uinely does go log­i­cally first, but at the ex­pense of be­ing a piece of card­board, which makes it un­able to adapt to more com­plex situ­a­tions. A true CDT agent goes log­i­cally first, but at the ex­pense of los­ing on New­comb’s Prob­lem. And your choice to put forth a piece of card­board marked “D” re­lies on you ex­pect­ing the TDT agents to make a cer­tain re­sponse, which makes the claim that it’s re­ally just a piece of card­board and there­fore gets to go log­i­cally first, some­what ques­tion­able.

Roughly, what I’m try­ing to re­ply is that you’re rea­son­ing about the re­sponse of the TDT agents to your choos­ing the CDT al­gorithm, which makes you TDT, but you’re also try­ing to force your choice of the CDT al­gorithm to go log­i­cally first, but this is beg­ging the ques­tion.

I would, per­haps, go so far as to agree that in an ex­ten­sion of TDT to cases in which cer­tain agents mag­i­cally get to go log­i­cally first, then if those agents are part of a small group un­cor­re­lated with yet ob­ser­va­tion­ally in­dis­t­in­guish­able from a large group, the small group might make a cor­re­lated de­ci­sion to defect “no mat­ter what” the large group does, know­ing that the large group will de­cide to co­op­er­ate any­way given the pay­off ma­trix. But the key as­sump­tion here is the abil­ity to go log­i­cally first.

It seems to me that the in­com­plete­ness of my pre­sent the­ory when it comes to log­i­cal or­der­ing is the real key is­sue here.

• Well that should never hap­pen. Any­thing that would make a TDT want to self-mod­ify into CDT should make it just want to play D, no need for self-mod­ifi­ca­tion. It should give the same an­swer at differ­ent times, that’s what makes it a time­less de­ci­sion the­ory. If you can break that with­out di­rect ex­plicit de­pen­dence on the al­gorithm apart from its de­ci­sions, then I am in trou­ble! But it seems to me that I can sub­sti­tute “play D” for “self-mod­ify” in all cases above.

The rea­son to self-mod­ify is to make your­self in­dis­t­in­guish­able from play­ers who started as CDT agents, so that Omega’s AIs can’t con­di­tion their moves on the player’s type. Re­mem­ber that Omega’s AIs get a copy of your source code.

A true CDT agent goes log­i­cally first, but at the ex­pense of los­ing on New­comb’s Prob­lem.

But a CDT agent would self-mod­ify into some­thing not los­ing on New­comb’s prob­lem if it ex­pects to face that. On the other hand, if TDT doesn’t self-mod­ify into some­thing that wins my game, isn’t that worse? (Is it bet­ter to be re­flec­tively con­sis­tent, or win­ning, if you had to choose one?)

It seems to me that the in­com­plete­ness of my pre­sent the­ory when it comes to log­i­cal or­der­ing is the real key is­sue here.

Yes, I agree that’s a big piece of the puz­zle, but I’m guess­ing the solu­tion to that won’t fully solve the “stupid win­ner” prob­lem.

ETA: And for TDT agents that move si­mul­ta­neously, there re­mains the prob­lem of “bar­gain­ing” to use Nesov’s term. Lots of un­solved prob­lems… I wish you started us work­ing on this stuff ear­lier!

• The rea­son to self-mod­ify is to make your­self in­dis­t­in­guish­able from play­ers who started as CDT agents, so that Omega’s AIs can’t con­di­tion their moves on the player’s type.

Be­ing (or perform­ing an ac­tion) in­dis­t­in­guish­able from X doesn’t pro­tect you from the in­fer­ence that X prob­a­bly re­sulted from such a plot. That you can de­cide to cam­ou­flage like this may even re­duce X’s own cred­i­bil­ity (and so a lot of pla­tonic/​pos­si­ble agents do­ing that will make the con­figu­ra­tion unattrac­tive). Thus, the agents need to de­cide among them­selves what to look like: first-mover con­figu­ra­tions is a limited re­source.

(This seems like a step to­wards solv­ing bar­gain­ing.)

• Yes, I see that your com­ment does seem like a step to­wards solv­ing bar­gain­ing among TDT agents. But I’m still try­ing to ar­gue that if we’re not TDT agents yet, maybe we don’t want to be­come them. My com­ment was made in that con­text.

• Let’s pick up Eliezer’s sug­ges­tion and dis­t­in­guish now-much-less-mys­te­ri­ous TDT from the differ­ent idea of “up­date­less de­ci­sion the­ory”, UDT, that de­scribes choice of a whole strat­egy (func­tion from states of knowl­edge to ac­tions) rather than choice of ac­tions in each given state of knowl­edge, of which lat­ter class TDT is an ex­am­ple. TDT isn’t a UDT, and UDT is a rather vac­u­ous state­ment, as it only achieves re­flec­tive con­sis­tency pretty much by defi­ni­tion, but doesn’t tell much about the struc­ture of prefer­ence and how to choose the strat­egy.

I don’t want to be­come a TDT agent, as in UDT sense, TDT agents aren’t re­flec­tively con­sis­tent. They could self-mod­ify to­wards more UDT-ish look, but this is the same ar­gu­ment as with CDT self-mod­ify­ing into a TDT.

• Dai’s ver­sion of this is a gen­uine, re­flec­tively con­sis­tent up­date­less de­ci­sion the­ory, though. It makes the cor­rect de­ci­sion lo­cally, rather than need­ing to choose a strat­egy once and for all time from a priv­ileged van­tage point.

That’s why I referred to it as “Dai’s de­ci­sion the­ory” at first, but both you and Dai seem to think your idea was the im­por­tant one, so I com­pro­mised and referred to it as Nesov-Dai de­ci­sion the­ory.

• Well, as I see UDT, it also makes de­ci­sions lo­cally, with un­der­stand­ing that this lo­cal com­pu­ta­tion is meant to find the best global solu­tion given other such lo­cally com­puted de­ci­sions. That is, each lo­cal com­pu­ta­tion can make a mis­take, mak­ing the best global solu­tion im­pos­si­ble, which may make it very im­por­tant for the other lo­cal com­pu­ta­tions to pre­dict (or at least no­tice) this mis­take and find the lo­cal de­ci­sions that to­gether with this mis­take con­sti­tute the best re­main­ing global solu­tion, and so on. The struc­ture of states of knowl­edge pro­duced by the lo­cal com­pu­ta­tions for the ad­ja­cent lo­cal com­pu­ta­tions is meant to op­ti­mize the al­gorithm of lo­cal de­ci­sion-mak­ing in those states, giv­ing most of the an­swer ex­plic­itly, leav­ing the lo­cal com­pu­ta­tions to only move the goal­post a lit­tle bit.

The non­triv­ial form of the de­ci­sion-mak­ing comes from the loop that makes lo­cal de­ci­sions max­i­mize prefer­ence given the other lo­cal de­ci­sions, and those other lo­cal de­ci­sions do the same. Thus, the lo­cal de­ci­sions have to co­or­di­nate with each other, and they can do that only through the com­mon al­gorithm and log­i­cal de­pen­den­cies be­tween differ­ent states of knowl­edge.

At which point the fact that these lo­cal de­ci­sions are part of the same agent seems to be­come ir­rele­vant, so that a more gen­eral prob­lem needs to be solved, one of co­op­er­a­tion of any kinds of agents, or even more gen­er­ally pro­cesses that aren’t ex­actly “agents”.

• One thing I don’t un­der­stand is that both you and Eliezer talk con­fi­dently about how agents would make use of log­i­cal de­pen­den­cies/​cor­re­la­tions. You guys don’t seem to think this is a re­ally hard prob­lem.

But we don’t even know how to as­sign a prob­a­bil­ity (or whether it even makes sense to do so) to a sim­ple math­e­mat­i­cal state­ment like P=NP. How do we calcu­late and/​or rep­re­sent the cor­re­la­tion be­tween one agent and an­other agent (ex­cept in sim­ple cases like where they’re iden­ti­cal or eas­ily proven to be equiv­a­lent)? I’m im­pressed by how far you’ve man­aged to push the idea of up­date­less­ness, but it’s hard for me to pro­cess what you say, when the ba­sic con­cept of log­i­cal un­cer­tainty is still re­ally fuzzy.

• I can ar­gue pretty force­fully that (1) a causal graph in which un­cer­tainty has been fac­tored into un­cor­re­lated sources, must have nodes or some kind of el­e­ments cor­re­spond­ing to log­i­cal un­cer­tainty; (2) that in pre­sent­ing New­comblike prob­lems, the dilemma-pre­sen­ters are in fact talk­ing of such un­cer­tain­ties and cor­re­la­tions; (3) that hu­man be­ings use log­i­cal un­cer­tainty all the time in an in­tu­itive sense, to what seems like good effect.

Of course none of that is ac­tu­ally hav­ing a good for­mal the­ory of log­i­cal un­cer­tainty—I just drew a bound­ary rope around a few sim­ple log­i­cal in­fer­ences and grafted them onto causal graphs. Two-way im­pli­ca­tions get rep­re­sented by the same node, that sort of thing.

I would be dras­ti­cally in­ter­ested in a for­mal the­ory of log­i­cal un­cer­tainty for non-log­i­cally-om­ni­scient Bayesi­ans.

Mean­while—you’re car­ry­ing out log­i­cal rea­son­ing about whole other civ­i­liza­tions start­ing from a vague prior over their ori­gins, ev­ery time you rea­son that most su­per­in­tel­li­gences (if any) that you en­counter in far­away galax­ies, will have been built in such a way as to max­i­mize a util­ity func­tion rather than say choos­ing the first op­tion in alpha­bet­i­cal or­der, on the likes of true PDs.

• I want to try to un­der­stand the na­ture of log­i­cal cor­re­la­tions be­tween agents a bit bet­ter.

Con­sider two agents who are both TDT-like but not perfectly cor­re­lated. They play a one-shot PD but in turn. First one player moves, then the other sees the move and makes its move.

In nor­mal Bayesian rea­son­ing, once the sec­ond player sees the first player’s move, all cor­re­la­tion be­tween them dis­ap­pears. (Does this hap­pen in your TDT?) But in UDT, the sec­ond player doesn’t up­date, so the cor­re­la­tion is pre­served. So far so good.

Now con­sider what hap­pens if the sec­ond player has more com­put­ing power than the first, so that it can perfectly simu­late the first player and com­pute its move. After it finishes that com­pu­ta­tion and knows the first player’s move, the log­i­cal cor­re­la­tion be­tween them dis­ap­pears, be­cause no un­cer­tainty im­plies no cor­re­la­tion. So, given there’s no log­i­cal cor­re­la­tion, it ought to play D. The first player would have ex­pected that, and also played D.

Look­ing at my for­mu­la­tion of UDT, this may or may not hap­pen, de­pend­ing on what the “math­e­mat­i­cal in­tu­ition sub­rou­tine” does when com­put­ing the log­i­cal con­se­quences of a choice. If it tries to be max­i­mally cor­rect, then it would do a full simu­la­tion of the op­po­nent when it can, which re­moves log­i­cal cor­re­la­tion, which causes the above out­come. Maybe the sec­ond player could get a bet­ter out­come if it doesn’t try to be max­i­mally cor­rect, but the way my the­ory is for­mu­lated, what strat­egy the “math­e­mat­i­cal in­tu­ition sub­rou­tine” uses is not part of what’s be­ing op­ti­mized.

• Com­ing to this a bit late :), but I’ve got a ba­sic ques­tion (which I think is similar to Nesov’s, but I’m still con­fused af­ter read­ing the en­su­ing ex­change). Why would it be that,

The first player would have ex­pected that, and also played D.

If the sec­ond player has more com­puter power (so that the first player can­not simu­late it), how can the first player pre­dict what the sec­ond player will do? Can the first player rea­son that since the sec­ond player could simu­late it, the sec­ond player will de­cide that they’re un­cor­re­lated and play D no mat­ter what?

That de­pen­dence on com­put­ing power seems very odd, though maybe I’m sneak­ing in ex­pec­ta­tions from my (very rough) un­der­stand­ing of UDT.

• Now con­sider what hap­pens if the sec­ond player has more com­put­ing power than the first, so that it can perfectly simu­late the first player and com­pute its move. After it finishes that com­pu­ta­tion and knows the first player’s move, the log­i­cal cor­re­la­tion be­tween them dis­ap­pears, be­cause no un­cer­tainty im­plies no cor­re­la­tion. So, given there’s no log­i­cal cor­re­la­tion, it ought to play D. The first player would have ex­pected that, and also played D.

The first player’s move could de­pend on the sec­ond player’s, in which case the sec­ond player won’t get the an­swer is a closed form, the an­swer must be a func­tion of the sec­ond player’s move...

• But if the sec­ond player has more com­pu­ta­tional power, it can just keep simu­lat­ing the first player un­til the first player runs out of clock cy­cles and has to out­put some­thing.

• I don’t un­der­stand your re­ply: ex­act simu­la­tion is brute force that isn’t a good idea. You can prove gen­eral state­ments about the be­hav­ior of pro­grams on runs of un­limited or in­finite length in finite time. But any­way, why would the sec­ond player pro­voke mu­tual defec­tion?

• But any­way, why would the sec­ond player pro­voke mu­tual defec­tion?

In my for­mu­la­tion, it doesn’t have a choice. Whether or not it does ex­act simu­la­tion of the first player is de­ter­mined by its “math­e­mat­i­cal in­tu­ition sub­rou­tine”, which I treated as a black box. If that mod­ule does an ex­act simu­la­tion, then mu­tual defec­tion is the re­sult. So this also ties in with my lack of un­der­stand­ing re­gard­ing log­i­cal un­cer­tainty. If we don’t treat the thing that rea­sons about log­i­cal un­cer­tainty as a black box, what should we do?

ETA: Some­times ex­act simu­la­tion clearly is ap­pro­pri­ate, for ex­am­ple in rock-pa­per-scis­sors.

• Con­cep­tu­ally, I treat log­i­cal un­cer­tainty as I do prior+util­ity, a rep­re­sen­ta­tion of prefer­ence, in this more gen­eral case over math­e­mat­i­cal struc­tures. The prob­lems of rep­re­sent­ing this prefer­ence com­pactly and ex­tract­ing hu­man prefer­ence don’t hin­der these par­tic­u­lar ex­plo­ra­tions.

• I don’t un­der­stand this yet. Can you ex­plain in more de­tail what is a gen­eral (non­com­pact) way to rep­re­sent­ing log­i­cal un­cer­tainty?

• If you are a CDT agent, you can’t (or sim­ply won’t) be­come a nor­mal TDT agent. If you are a hu­man, who knows what that means.