Ingredients of Timeless Decision Theory

Fol­lowup to: New­comb’s Prob­lem and Re­gret of Ra­tion­al­ity, Towards a New De­ci­sion Theory

Wei Dai asked:

“Why didn’t you men­tion ear­lier that your time­less de­ci­sion the­ory mainly had to do with log­i­cal un­cer­tainty? It would have saved peo­ple a lot of time try­ing to guess what you were talk­ing about.”

...

All right, fine, here’s a fast sum­mary of the most im­por­tant in­gre­di­ents that go into my “time­less de­ci­sion the­ory”. This isn’t so much an ex­pla­na­tion of TDT, as a list of start­ing ideas that you could use to recre­ate TDT given suffi­cient back­ground knowl­edge. It seems to me that this sort of thing re­ally takes a mini-book, but per­haps I shall be proven wrong.

The one-sen­tence ver­sion is: Choose as though con­trol­ling the log­i­cal out­put of the ab­stract com­pu­ta­tion you im­ple­ment, in­clud­ing the out­put of all other in­stan­ti­a­tions and simu­la­tions of that com­pu­ta­tion.

The three-sen­tence ver­sion is: Fac­tor your un­cer­tainty over (im­pos­si­ble) pos­si­ble wor­lds into a causal graph that in­cludes nodes cor­re­spond­ing to the un­known out­puts of known com­pu­ta­tions; con­di­tion on the known ini­tial con­di­tions of your de­ci­sion com­pu­ta­tion to screen off fac­tors in­fluenc­ing the de­ci­sion-setup; com­pute the coun­ter­fac­tu­als in your ex­pected util­ity for­mula by surgery on the node rep­re­sent­ing the log­i­cal out­put of that com­pu­ta­tion.

To ob­tain the back­ground knowl­edge if you don’t already have it, the two main things you’d need to study are the clas­si­cal de­bates over New­comblike prob­lems, and the Judea Pearl syn­the­sis of causal­ity. Canon­i­cal sources would be “Para­doxes of Ra­tion­al­ity and Co­op­er­a­tion” for New­comblike prob­lems and “Causal­ity” for causal­ity.

For those of you who don’t con­de­scend to buy phys­i­cal books, Mar­ion Led­wig’s the­sis on New­comb’s Prob­lem is a good sum­mary of the ex­ist­ing at­tempts at de­ci­sion the­o­ries, ev­i­den­tial de­ci­sion the­ory and causal de­ci­sion the­ory. You need to know that causal de­ci­sion the­o­ries two-box on New­comb’s Prob­lem (which loses) and that ev­i­den­tial de­ci­sion the­o­ries re­frain from smok­ing on the smok­ing le­sion prob­lem (which is even cra­zier). You need to know that the ex­pected util­ity for­mula is ac­tu­ally over a coun­ter­fac­tual on our ac­tions, rather than an or­di­nary prob­a­bil­ity up­date on our ac­tions.

I’m not sure what you’d use for on­line read­ing on causal­ity. Mainly you need to know:

  • That a causal graph fac­tor­izes a cor­re­lated prob­a­bil­ity dis­tri­bu­tion into a de­ter­minis­tic mechanism of chained func­tions plus a set of un­cor­re­lated un­knowns as back­ground fac­tors.

  • Stan­dard ideas about “screen­ing off” vari­ables (D-sep­a­ra­tion).

  • The stan­dard way of com­put­ing coun­ter­fac­tu­als (through surgery on causal graphs).

It will be helpful to have the stan­dard Less Wrong back­ground of defin­ing ra­tio­nal­ity in terms of pro­cesses that sys­tem­at­i­cally dis­cover truths or achieve preferred out­comes, rather than pro­cesses that sound rea­son­able; un­der­stand­ing that you are em­bed­ded within physics; un­der­stand­ing that your philo­soph­i­cal in­tu­tions are how some par­tic­u­lar cog­ni­tive al­gorithm feels from in­side; and so on.


The first lemma is that a fac­tor­ized prob­a­bil­ity dis­tri­bu­tion which in­cludes log­i­cal un­cer­tainty—un­cer­tainty about the un­known out­put of known com­pu­ta­tions—ap­pears to need cause-like nodes cor­re­spond­ing to this un­cer­tainty.

Sup­pose I have a calcu­la­tor on Mars and a calcu­la­tor on Venus. Both calcu­la­tors are set to com­pute 123 * 456. Since you know their ex­act ini­tial con­di­tions—per­haps even their ex­act ini­tial phys­i­cal state—a stan­dard read­ing of the causal graph would in­sist that any un­cer­tain­ties we have about the out­put of the two calcu­la­tors, should be un­cor­re­lated. (By stan­dard D-sep­a­ra­tion; if you have ob­served all the an­ces­tors of two nodes, but have not ob­served any com­mon de­scen­dants, the two nodes should be in­de­pen­dent.) How­ever, if I tell you that the calcu­la­tor at Mars flashes “56,088” on its LED dis­play screen, you will con­clude that the Venus calcu­la­tor’s dis­play is also flash­ing “56,088″. (And you will con­clude this be­fore any ray of light could com­mu­ni­cate be­tween the two events, too.)

If I was giv­ing a long ex­po­si­tion I would go on about how if you have two en­velopes origi­nat­ing on Earth and one goes to Mars and one goes to Venus, your con­clu­sion about the one on Venus from ob­serv­ing the one on Mars does not of course in­di­cate a faster-than-light phys­i­cal event, but stan­dard ideas about D-sep­a­ra­tion in­di­cate that com­pletely ob­serv­ing the ini­tial state of the calcu­la­tors ought to screen off any re­main­ing un­cer­tainty we have about their causal de­scen­dants so that the de­scen­dant nodes are un­cor­re­lated, and the fact that they’re still cor­re­lated in­di­cates that there is a com­mon un­ob­served fac­tor, and this is our log­i­cal un­cer­tainty about the re­sult of the ab­stract com­pu­ta­tion. I would also talk for a bit about how if there’s a small ran­dom fac­tor in the tran­sis­tors, and we saw three calcu­la­tors, and two showed 56,088 and one showed 56,086, we would prob­a­bly treat these as like­li­hood mes­sages go­ing up from nodes de­scend­ing from the “Pla­tonic” node stand­ing for the ideal re­sult of the com­pu­ta­tion—in short, it looks like our un­cer­tainty about the un­known log­i­cal re­sults of known com­pu­ta­tions, re­ally does be­have like a stan­dard causal node from which the phys­i­cal re­sults de­scend as child nodes.

But this is a short ex­po­si­tion, so you can fill in that sort of thing your­self, if you like.

Hav­ing re­al­ized that our causal graphs con­tain nodes cor­re­spond­ing to log­i­cal un­cer­tain­ties /​ the ideal re­sult of Pla­tonic com­pu­ta­tions, we next con­strue the coun­ter­fac­tu­als of our ex­pected util­ity for­mula to be coun­ter­fac­tu­als over the log­i­cal re­sult of the ab­stract com­pu­ta­tion cor­re­spond­ing to the ex­pected util­ity calcu­la­tion, rather than coun­ter­fac­tu­als over any par­tic­u­lar phys­i­cal node.

You treat your choice as de­ter­min­ing the re­sult of the log­i­cal com­pu­ta­tion, and hence all in­stan­ti­a­tions of that com­pu­ta­tion, and all in­stan­ti­a­tions of other com­pu­ta­tions de­pen­dent on that log­i­cal com­pu­ta­tion.

For­mally you’d use a Godelian di­ag­o­nal to write:

Argmax[A in Ac­tions] in Sum[O in Out­comes](Utility(O)*P(this com­pu­ta­tion yields A []-> O|rest of uni­verse))

(where P( X=x []-> Y | Z ) means com­put­ing the coun­ter­fac­tual on the fac­tored causal graph P, that sur­gi­cally set­ting node X to x, leads to Y, given Z)

Set­ting this up cor­rectly (in ac­cor­dance with stan­dard con­straints on causal graphs, like non­cir­cu­lar­ity) will solve (yield re­flec­tively con­sis­tent, epistem­i­cally in­tu­itive, sys­tem­at­i­cally win­ning an­swers to) 95% of the New­comblike prob­lems in the liter­a­ture I’ve seen, in­clud­ing New­comb’s Prob­lem and other prob­lems caus­ing CDT to lose, the Smok­ing Le­sion and other prob­lems caus­ing EDT to fail, Parfit’s Hitch­hiker which causes both CDT and EDT to lose, etc.

Note that this does not solve the re­main­ing open prob­lems in TDT (though Nesov and Dai may have solved one such prob­lem with their up­date­less de­ci­sion the­ory). Also, al­though this the­ory goes into much more de­tail about how to com­pute its coun­ter­fac­tu­als than clas­si­cal CDT, there are still some visi­ble in­com­plete­nesses when it comes to gen­er­at­ing causal graphs that in­clude the un­cer­tain re­sults of com­pu­ta­tions, com­pu­ta­tions de­pen­dent on other com­pu­ta­tions, com­pu­ta­tions un­cer­tainly cor­re­lated to other com­pu­ta­tions, com­pu­ta­tions that rea­son ab­stractly about other com­pu­ta­tions with­out simu­lat­ing them ex­actly, and so on. On the other hand, CDT just has the en­tire coun­ter­fac­tual dis­tri­bu­tion rain down on the the­ory as mana from heaven (e.g. James Joyce, Foun­da­tions of Causal De­ci­sion The­ory), so TDT is at least an im­prove­ment; and stan­dard clas­si­cal logic and stan­dard causal graphs offer quite a lot of pre-ex­ist­ing struc­ture here. (In gen­eral, un­der­stand­ing the causal struc­ture of re­al­ity is an AI-com­plete prob­lem, and so in philo­soph­i­cal dilem­mas the causal struc­ture of the prob­lem is im­plic­itly given in the story de­scrip­tion.)

Among the many other things I am skip­ping over:

  • Some ac­tual ex­am­ples of where CDT loses and TDT wins, EDT loses and TDT wins, both lose and TDT wins, what I mean by “set­ting up the causal graph cor­rectly” and some po­ten­tial pit­falls to avoid, etc.

  • A rather huge amount of rea­son­ing which defines re­flec­tive con­sis­tency on a prob­lem class; ex­plains why re­flec­tive con­sis­tency is a rather strong desider­a­tum for self-mod­ify­ing AI; why the need to make “pre­com­mit­ments” is an ex­pen­sive re­treat to sec­ond-best and shows lack of re­flec­tive con­sis­tency; ex­plains why it is de­sir­able to win and get lots of money rather than just be “rea­son­able” (that is con­form to pre-ex­ist­ing in­tu­itions gen­er­ated by a pre-ex­ist­ing al­gorithm); which notes that, con­sid­er­ing the many pleas from peo­ple who want, but can’t find any good in­ter­me­di­ate stage be­tween CDT and EDT, it’s a fas­ci­nat­ing lit­tle fact that if you were rewrit­ing your own source code, you’d rewrite it to one-box on New­comb’s Prob­lem and smoke on the smok­ing le­sion prob­lem...

  • ...and so, hav­ing given many con­sid­er­a­tions of de­sir­a­bil­ity in a de­ci­sion the­ory, shows that the be­hav­ior of TDT cor­re­sponds to re­flec­tive con­sis­tency on a prob­lem class in which your pay­off is de­ter­mined by the type of de­ci­sion you make, but not sen­si­tive to the ex­act al­gorithm you use apart from that—that TDT is the com­pact way of com­put­ing this de­sir­able be­hav­ior we have pre­vi­ously defined in terms of re­flec­tively con­sis­tent sys­tem­atic win­ning.

  • Show­ing that clas­si­cal CDT, given self-mod­ifi­ca­tion abil­ity, mod­ifies into a crip­pled and in­el­e­gant form of TDT.

  • Us­ing TDT to fix the non-nat­u­ral­is­tic be­hav­ior of Pearl’s ver­sion of clas­si­cal causal­ity in which we’re sup­posed to pre­tend that our ac­tions are di­vorced from the rest of the uni­verse—the coun­ter­fac­tual surgery, writ­ten out Pearl’s way, will ac­tu­ally give poor pre­dic­tions for some prob­lems (like some­one who two-boxes on New­comb’s Prob­lem and be­lieves that box B has a base-rate prob­a­bil­ity of con­tain­ing a mil­lion dol­lars, be­cause the coun­ter­fac­tual surgery says that box B’s con­tents have to be in­de­pen­dent of the ac­tion). TDT not only gives the cor­rect pre­dic­tion, but ex­plains why the coun­ter­fac­tual surgery can have the form it does—if you con­di­tion on the ini­tial state of the com­pu­ta­tion, this should screen off all the in­for­ma­tion you could get about out­side things that af­fect your de­ci­sion; then your ac­tual out­put can be fur­ther de­ter­mined only by the Godel-di­ag­o­nal for­mula writ­ten out above, per­mit­ting the for­mula to con­tain a coun­ter­fac­tual surgery that as­sumes its own out­put, so that the for­mula does not need to in­finitely re­curse on call­ing it­self.

  • An ac­count of some brief ad-hoc ex­per­i­ments I performed on IRC to show that a ma­jor­ity of re­spon­dents ex­hibited a de­ci­sion pat­tern best ex­plained by TDT rather than EDT or CDT.

  • A rather huge amount of ex­po­si­tion of what TDT de­ci­sion the­ory ac­tu­ally cor­re­sponds to in terms of philo­soph­i­cal in­tu­itions, es­pe­cially those about “free will”. For ex­am­ple, this is the the­ory I was us­ing as hid­den back­ground when I wrote in “Causal­ity and Mo­ral Re­spon­si­bil­ity” that fac­tors like ed­u­ca­tion and up­bring­ing can be thought of as de­ter­min­ing which per­son makes a de­ci­sion—that you rather than some­one else makes a de­ci­sion—but that the de­ci­sion made by that par­tic­u­lar per­son is up to you. This cor­re­sponds to con­di­tion­ing on the known ini­tial state of the com­pu­ta­tion, and perform­ing the coun­ter­fac­tual surgery over its out­put. I’ve ac­tu­ally done a lot of this ex­po­si­tion on OBLW with­out ex­plic­itly men­tion­ing TDT, like Time­less Con­trol and Thou Art Physics for rec­on­cil­ing de­ter­minism with choice (ac­tu­ally effec­tive choice re­quires de­ter­minism, but this con­fuses hu­mans for rea­sons given in Pos­si­bil­ity and Could-ness). But if you read the other parts of the solu­tion to “free will”, and then fur­ther­more ex­plic­itly for­mu­late TDT, then this is what ut­terly, fi­nally, com­pletely, and with­out even a tiny trace of con­fu­sion or dis­satis­fac­tion or a sense of lin­ger­ing ques­tions, kills off en­tirely the ques­tion of “free will”.

  • Some con­clud­ing chid­ing of those philoso­phers who blithely de­cided that the “ra­tio­nal” course of ac­tion sys­tem­at­i­cally loses; that ra­tio­nal­ists defect on the Pri­soner’s Dilemma and hence we need a sep­a­rate con­cept of “so­cial ra­tio­nal­ity”; that the “rea­son­able” thing to do is de­ter­mined by con­sult­ing pre-ex­ist­ing in­tu­itions of rea­son­able­ness, rather than first look­ing at which agents walk away with huge heaps of money and then work­ing out how to do it sys­tem­at­i­cally; peo­ple who take their in­tu­itions about free will at face value; as­sum­ing that coun­ter­fac­tu­als are fixed givens rain­ing down from the sky rather than non-ob­serv­able con­structs which we can con­strue in what­ever way gen­er­ates a win­ning de­ci­sion the­ory; et cetera. And cel­e­brat­ing of the fact that ra­tio­nal­ists can co­op­er­ate with each other, vote in elec­tions, and do many other nice things that philoso­phers have claimed they can’t. And sug­gest­ing that per­haps next time one should ex­tend “ra­tio­nal­ity” a bit more credit be­fore sigh­ing and nod­ding wisely about its limi­ta­tions.

  • In con­clu­sion, ra­tio­nal agents are not in­ca­pable of co­op­er­a­tion, ra­tio­nal agents are not con­stantly fight­ing their own source code, ra­tio­nal agents do not go around hel­plessly wish­ing they were less ra­tio­nal, and fi­nally, ra­tio­nal agents win.

Those of you who’ve read the quan­tum me­chan­ics se­quence can ex­trap­o­late from past ex­pe­rience that I’m not bluffing. But it’s not clear to me that writ­ing this book would be my best pos­si­ble ex­pen­di­ture of the re­quired time.