# Decision Theories: A Less Wrong Primer

Sum­mary: If you’ve been won­der­ing why peo­ple keep go­ing on about de­ci­sion the­ory on Less Wrong, I wrote you this post as an an­swer. I ex­plain what de­ci­sion the­o­ries are, show how Causal De­ci­sion The­ory works and where it seems to give the wrong an­swers, in­tro­duce (very briefly) some can­di­dates for a more ad­vanced de­ci­sion the­ory, and touch on the (pos­si­ble) con­nec­tion be­tween de­ci­sion the­ory and ethics.

## What is a de­ci­sion the­ory?

This is go­ing to sound silly, but a de­ci­sion the­ory is an al­gorithm for mak­ing de­ci­sions.0 The in­puts are an agent’s knowl­edge of the world, and the agent’s goals and val­ues; the out­put is a par­tic­u­lar ac­tion (or plan of ac­tions). Ac­tu­ally, in many cases the goals and val­ues are im­plicit in the al­gorithm rather than given as in­put, but it’s worth keep­ing them dis­tinct in the­ory.

For ex­am­ple, we can think of a chess pro­gram as a sim­ple de­ci­sion the­ory. If you feed it the cur­rent state of the board, it re­turns a move, which ad­vances the im­plicit goal of win­ning. The ac­tual de­tails of the de­ci­sion the­ory in­clude things like writ­ing out the tree of pos­si­ble moves and coun­ter­moves, and eval­u­at­ing which pos­si­bil­ities bring it closer to win­ning.

Another ex­am­ple is an E. Coli bac­terium. It has two ba­sic op­tions at ev­ery mo­ment: it can use its flag­ella to swim for­ward in a straight line, or to change di­rec­tions by ran­domly tum­bling. It can sense whether the con­cen­tra­tion of food or toxin is in­creas­ing or de­creas­ing over time, and so it ex­e­cutes a sim­ple al­gorithm that ran­domly changes di­rec­tion more of­ten when things are “get­ting worse”. This is enough con­trol for bac­te­ria to rapidly seek out food and flee from tox­ins, with­out need­ing any sort of ad­vanced in­for­ma­tion pro­cess­ing.

A hu­man be­ing is a much more com­pli­cated ex­am­ple which com­bines some as­pects of the two sim­pler ex­am­ples; we men­tally model con­se­quences in or­der to make many de­ci­sions, and we also fol­low heuris­tics that have evolved to work well with­out ex­plic­itly mod­el­ing the world.1 We can’t model any­thing quite like the com­pli­cated way that hu­man be­ings make de­ci­sions, but we can study sim­ple de­ci­sion the­o­ries on sim­ple prob­lems; and the re­sults of this anal­y­sis were of­ten more effec­tive than the raw in­tu­itions of hu­man be­ings (who evolved to suc­ceed in small sa­van­nah tribes, not ne­go­ti­ate a nu­clear arms race). But the stan­dard model used for this anal­y­sis, Causal De­ci­sion The­ory, has a se­ri­ous draw­back of its own, and the sug­gested re­place­ments are im­por­tant for a num­ber of things that Less Wrong read­ers might care about.

## What is Causal De­ci­sion The­ory?

Causal de­ci­sion the­ory (CDT to all the cool kids) is a par­tic­u­lar class of de­ci­sion the­o­ries with some nice prop­er­ties. It’s straight­for­ward to state, has some nice math­e­mat­i­cal fea­tures, can be adapted to any util­ity func­tion, and gives good an­swers on many prob­lems. We’ll de­scribe how it works in a fairly sim­ple but gen­eral setup.

Let X be an agent who shares a world with some other agents (Y1 through Yn). All these agents are go­ing to pri­vately choose ac­tions and then perform them si­mul­ta­neously, and the ac­tions will have con­se­quences. (For in­stance, they could be play­ing a round of Di­plo­macy.)

We’ll as­sume that X has goals and val­ues rep­re­sented by a util­ity func­tion: for ev­ery con­se­quence C, there’s a num­ber U(C) rep­re­sent­ing just how much X prefers that out­come, and X views equal ex­pected util­ities with in­differ­ence: a 50% chance of util­ity 0 and 50% chance of util­ity 10 is no bet­ter or worse than 100% chance of util­ity 5, for in­stance. (If these as­sump­tions sound ar­tifi­cial, re­mem­ber that we’re try­ing to make this as math­e­mat­i­cally sim­ple as we can in or­der to an­a­lyze it. I don’t think it’s as ar­tifi­cial as it seems, but that’s a differ­ent topic.)

X wants to max­i­mize its ex­pected util­ity. If there were no other agents, this would be sim­ple: model the world, es­ti­mate how likely each con­se­quence is to hap­pen if it does this ac­tion or that, calcu­late the ex­pected util­ity of each ac­tion, then perform the ac­tion that re­sults in the high­est ex­pected util­ity. But if there are other agents around, the out­comes de­pend on their ac­tions as well as on X’s ac­tion, and if X treats that un­cer­tainty like nor­mal un­cer­tainty, then there might be an op­por­tu­nity for the Ys to ex­ploit X.

This is a Difficult Prob­lem in gen­eral; a full dis­cus­sion would in­volve Nash equil­ibria, but even that doesn’t fully set­tle the mat­ter- there can be more than one equil­ibrium! Also, X can some­times treat an­other agent as pre­dictable (like a fixed out­come or an or­di­nary ran­dom vari­able) and get away with it.

CDT is a class of de­ci­sion the­o­ries, not a spe­cific de­ci­sion the­ory, so it’s im­pos­si­ble to spec­ify with full gen­er­al­ity how X will de­cide if X is a causal de­ci­sion the­o­rist. But there is one key prop­erty that dis­t­in­guishes CDT from the de­ci­sion the­o­ries we’ll talk about later: a CDT agent as­sumes that X’s de­ci­sion is in­de­pen­dent from the si­mul­ta­neous de­ci­sions of the Ys- that is, X could de­cide one way or an­other and ev­ery­one else’s de­ci­sions would stay the same.

There­fore, there is at least one case where we can say what a CDT agent will do in a multi-player game: some strate­gies are dom­i­nated by oth­ers. For ex­am­ple, if X and Y are both de­cid­ing whether to walk to the zoo, and X will be hap­piest if X and Y both go, but X would still be hap­pier at the zoo than at home even if Y doesn’t come along, then X should go to the zoo re­gard­less of what Y does. (Pre­sum­ing that X’s util­ity func­tion is fo­cused on be­ing happy that af­ter­noon.) This crite­rion is enough to “solve” many prob­lems for a CDT agent, and in zero-sum two-player games the solu­tion can be shown to be op­ti­mal for X.

## What’s the prob­lem with Causal De­ci­sion The­ory?

There are many sim­plifi­ca­tions and ab­strac­tions in­volved in CDT, but that as­sump­tion of in­de­pen­dence turns out to be key. In prac­tice, peo­ple put a lot of effort into pre­dict­ing what other peo­ple might de­cide, some­times with im­pres­sive ac­cu­racy, and then base their own de­ci­sions on that pre­dic­tion. This wrecks the in­de­pen­dence of de­ci­sions, and so it turns out that in a non-zero-sum game, it’s pos­si­ble to “beat” the out­come that CDT gets.

The clas­si­cal thought ex­per­i­ment in this con­text is called New­comb’s Prob­lem. X meets with a very smart and hon­est alien, Omega, that has the power to ac­cu­rately pre­dict what X would do in var­i­ous hy­po­thet­i­cal situ­a­tions. Omega pre­sents X with two boxes, a clear one con­tain­ing $1,000 and an opaque one con­tain­ing ei­ther$1,000,000 or noth­ing. Omega ex­plains that X can ei­ther take the opaque box (this is called one-box­ing) or both boxes (two-box­ing), but there’s a trick: Omega pre­dicted in ad­vance what X would do, and put $1,000,000 into the opaque box only if X was pre­dicted to one-box. (This is a lit­tle de­vi­ous, so take some time to pon­der it if you haven’t seen New­comb’s Prob­lem be­fore- or read here for a ful­ler ex­pla­na­tion.) If X is a causal de­ci­sion the­o­rist, the choice is clear: what­ever Omega de­cided, it de­cided already, and whether the opaque box is full or empty, X is bet­ter off tak­ing both. (That is, two-box­ing is a dom­i­nant strat­egy over one-box­ing.) So X two-boxes, and walks away with$1,000 (since Omega eas­ily pre­dicted that this would hap­pen). Mean­while, X’s cousin Z (not a CDT) de­cides to one-box, and finds the box full with $1,000,000. So it cer­tainly seems that one could do bet­ter than CDT in this case. But is this a fair prob­lem? After all, we can always come up with prob­lems that trick the ra­tio­nal agent into mak­ing the wrong choice, while a dumber agent lucks into the right one. Hav­ing a very pow­er­ful pre­dic­tor around might seem ar­tifi­cial, al­though the prob­lem might look much the same if Omega had a 90% suc­cess rate rather than 100%. One rea­son that this is a fair prob­lem is that the out­come de­pends only on what ac­tion X is simu­lated to take, not on what pro­cess pro­duced the de­ci­sion. Be­sides, we can see the same be­hav­ior in an­other fa­mous game the­ory prob­lem: the Pri­soner’s Dilemma. X and Y are col­lab­o­rat­ing on a pro­ject, but they have differ­ent goals for it, and ei­ther one has the op­por­tu­nity to achieve their goal a lit­tle bet­ter at the cost of sig­nifi­cantly im­ped­ing their part­ner’s goal. (The op­tions are called co­op­er­a­tion and defec­tion.) If they both co­op­er­ate, they get a util­ity of +50 each; if X co­op­er­ates and Y defects, then X winds up at +10 but Y gets +70, and vice versa; but if they both defect, then both wind up at +30 each.2 If X is a CDT agent, then defect­ing dom­i­nates co­op­er­at­ing as a strat­egy, so X will always defect in the Pri­soner’s Dilemma (as long as there are no fur­ther ram­ifi­ca­tions; the Iter­ated Pri­soner’s Dilemma can be differ­ent, be­cause X’s cur­rent de­ci­sion can in­fluence Y’s fu­ture de­ci­sions). Even if you know­ingly pair up X with a copy of it­self (with a differ­ent goal but the same de­ci­sion the­ory), it will defect even though it could prove that the two de­ci­sions will be iden­ti­cal. Mean­while, its cousin Z also plays the Pri­soner’s Dilemma: Z co­op­er­ates when it’s fac­ing an agent that has the same de­ci­sion the­ory, and defects oth­er­wise. This is a strictly bet­ter out­come than X gets. (Z isn’t op­ti­mal, though; I’m just show­ing that you can find a strict im­prove­ment on X.)3 ## What de­ci­sion the­o­ries are bet­ter than CDT? I re­al­ize this post is pretty long already, but it’s way too short to out­line the ad­vanced de­ci­sion the­o­ries that have been pro­posed and de­vel­oped re­cently by a num­ber of peo­ple (in­clud­ing Eliezer, Gary Drescher, Wei Dai, Vladimir Nesov and Vladimir Slep­nev). In­stead, I’ll list the fea­tures that we would want an ad­vanced de­ci­sion the­ory to have: 1. The de­ci­sion the­ory should be for­mal­iz­able at least as well as CDT is. 2. The de­ci­sion the­ory should give an­swers that are at least as good as CDT’s an­swers. In par­tic­u­lar, it should always get the right an­swer in 1-player games and find a Nash equil­ibrium in zero-sum two-player games (when the other player is also able to do so). 3. The de­ci­sion the­ory should strictly out­perform CDT on the Pri­soner’s Dilemma- it should elicit mu­tual co­op­er­a­tion in the Pri­soner’s Dilemma from some agents that CDT elic­its mu­tual defec­tion from, it shouldn’t co­op­er­ate when its part­ner defects, and (ar­guably) it should defect if its part­ner would co­op­er­ate re­gard­less. 4. The de­ci­sion the­ory should one-box on New­comb’s Prob­lem. 5. The de­ci­sion the­ory should be rea­son­ably sim­ple, and not in­clude a bunch of ad-hoc rules. We want to solve prob­lems in­volv­ing pre­dic­tion of ac­tions in gen­eral, not just the spe­cial cases. There are now a cou­ple of can­di­date de­ci­sion the­o­ries (Time­less De­ci­sion The­ory, Up­date­less De­ci­sion The­ory, and Am­bi­ent De­ci­sion The­ory) which seem to meet these crite­ria. In­ter­est­ingly, for­mal­iz­ing any of these tends to deeply in­volve the math­e­mat­ics of self-refer­ence (Gödel’s The­o­rem and Löb’s The­o­rem) in or­der to avoid the in­finite regress in­her­ent in simu­lat­ing an agent that’s simu­lat­ing you. But for the time be­ing, we can mas­sively over­sim­plify and out­line them. TDT con­sid­ers your ul­ti­mate de­ci­sion as the cause of both your ac­tion and other agents’ valid pre­dic­tions of your ac­tion, and tries to pick the de­ci­sion that works best un­der that model. ADT uses a kind of di­ag­o­nal­iza­tion to pre­dict the effects of differ­ent de­ci­sions with­out hav­ing the fi­nal de­ci­sion throw off the pre­dic­tion. And UDT con­sid­ers the de­ci­sion that would be the best policy for all pos­si­ble ver­sions of you to em­ploy, on av­er­age. ## Why are ad­vanced de­ci­sion the­o­ries im­por­tant for Less Wrong? There are a few rea­sons. Firstly, there are those who think that ad­vanced de­ci­sion the­o­ries are a nat­u­ral base on which to build AI. One rea­son for this is some­thing I briefly men­tioned: even CDT al­lows for the idea that X’s cur­rent de­ci­sions can af­fect Y’s fu­ture de­ci­sions, and self-mod­ifi­ca­tion counts as a de­ci­sion. If X can self-mod­ify, and if X ex­pects to deal with situ­a­tions where an ad­vanced de­ci­sion the­ory would out-perform its cur­rent self, then X will change it­self into an ad­vanced de­ci­sion the­ory (with some weird caveats: for ex­am­ple, if X started out as CDT, its mod­ifi­ca­tion will only care about other agents’ de­ci­sions made af­ter X self-mod­ified). More rele­vantly to ra­tio­nal­ists, the bad choices that CDT makes are of­ten held up as ex­am­ples of why you shouldn’t try to be ra­tio­nal, or why ra­tio­nal­ists can’t co­op­er­ate. But in­stru­men­tal ra­tio­nal­ity doesn’t need to be syn­ony­mous with causal de­ci­sion the­ory: if there are other de­ci­sion the­o­ries that do strictly bet­ter, we should adopt those rather than CDT! So figur­ing out ad­vanced de­ci­sion the­o­ries, even if we can’t im­ple­ment them on real-world prob­lems, helps us see that the ideal of ra­tio­nal­ity isn’t go­ing to fall flat on its face. Fi­nally, ad­vanced de­ci­sion the­ory could be rele­vant to moral­ity. If, as many of us sus­pect, there’s no ba­sis for hu­man moral­ity apart from what goes on in hu­man brains, then why do we feel there’s still a dis­tinc­tion be­tween what-we-want and what-is-right? One an­swer is that if we feed in what-we-want into an ad­vanced de­ci­sion the­ory, then just as co­op­er­a­tion emerges in the Pri­soner’s Dilemma, many kinds of pat­terns that we take as ba­sic moral rules emerge as the equil­ibrium be­hav­ior. The idea is de­vel­oped more sub­stan­tially in Gary Drescher’s Good and Real, and (be­fore there was a can­di­date for an ad­vanced de­ci­sion the­ory) in Dou­glas Hofs­tadter’s con­cept of su­per­ra­tional­ity. It’s still at the spec­u­la­tive stage, be­cause it’s difficult to work out what in­ter­ac­tions be­tween agents with ad­vanced de­ci­sion the­o­ries would look like (in par­tic­u­lar, we don’t know whether bar­gain­ing would end in a fair split or in a Xanatos Gam­bit Pileup of chicken threats, though we think and hope it’s the former). But it’s at least a promis­ing ap­proach to the slip­pery ques­tion of what ‘right’ could ac­tu­ally mean. And if you want to un­der­stand this on a slightly more tech­ni­cal level… well, I’ve started a se­quence. ### Notes: 0. Rather con­fus­ingly, de­ci­sion the­ory is the name for the study of de­ci­sion the­o­ries. 1. Both pat­terns ap­pear in our con­scious rea­son­ing as well as our sub­con­scious think­ing- we care about con­se­quences we can di­rectly fore­see and also about moral rules that don’t seem at­tached to any par­tic­u­lar con­se­quence. How­ever, just as the sim­ple “pro­gram” for the bac­terium was con­structed by evolu­tion, our moral rules are there for evolu­tion­ary rea­sons as well- per­haps even for rea­sons that have to do with ad­vanced de­ci­sion the­ory... Also, it’s worth not­ing that we’re not con­sciously aware of all of our val­ues and goals, though at least we have a bet­ter idea of them than E.Coli does. This is a prob­lem for the idea of rep­re­sent­ing our usual de­ci­sions in terms of de­ci­sion the­ory, though we can still hope that our ap­prox­i­ma­tions are good enough (e.g. that our real val­ues re­gard­ing the Cold War roughly cor­re­sponded to our es­ti­mates of how bad a nu­clear war or a Soviet world takeover would be). 2. Eliezer once pointed out that our in­tu­itions on most for­mu­la­tions of the Pri­soner’s Dilemma are skewed by our no­tions of fair­ness, and a more out­landish ex­am­ple might serve bet­ter to illus­trate how a gen­uine PD re­ally feels. For an ex­am­ple where peo­ple are no­to­ri­ous for not car­ing about each oth­ers’ goals, let’s con­sider aes­thet­ics: peo­ple who love one form of mu­sic of­ten re­ally feel that an­other pop­u­lar form is a waste of time. One might feel that if the works of Artist Q sud­denly dis­ap­peared from the world, it would ob­jec­tively be a tragedy; while if the same hap­pened to the works of Artist R, then it’s no big deal and R’s fans should be glad to be freed from that dreck. We can use this aes­thetic in­tol­er­ance to con­struct a more gen­uine Pri­soner’s Dilemma with­out invit­ing aliens or any­thing like that. Say X is a writer and Y is an illus­tra­tor, and they have very differ­ent prefer­ences for how a cer­tain scene should come across, so they’ve worked out a com­pro­mise. Now, both of them could co­op­er­ate and get a scene that both are OK with, or X could se­cretly change the di­alogue in hopes of get­ting his idea to come across, or Y could draw the scene differ­ently in or­der to get her idea of the scene across. But if they both “defect” from the com­pro­mise, then the scene gets con­fus­ing to read­ers. If both X and Y pre­fer their own idea to the com­pro­mise, pre­fer the com­pro­mise to the mud­dle, and pre­fer the mud­dle to their part­ner’s idea, then this is a gen­uine Pri­soner’s Dilemma. 3. I’ve avoided men­tion­ing Ev­i­den­tial De­ci­sion The­ory, the “usual” coun­ter­part to CDT; it’s worth not­ing that EDT one-boxes on New­comb’s Prob­lem but gives the wrong an­swer on a clas­si­cal one-player prob­lem (The Smok­ing Le­sion) which the ad­vanced de­ci­sion the­o­ries han­dle cor­rectly. It’s also far less amenable to for­mal­iza­tion than the oth­ers. • This is a good post, but it would be su­per valuable if you could ex­plain the more ad­vanced de­ci­sion the­o­ries and the cur­rent prob­lems peo­ple are work­ing on as clearly as you ex­plained the ba­sics here. • Yes, they definitely need a non-tech­ni­cal in­tro­duc­tion as well (and none of the posts on them seem to serve that pur­pose). I’ll see if I feel in­spired again this week­end. • It’s pretty easy to ex­plain the main in­no­va­tion in TDT/​UDT/​ADT: they all differ from EDT/​CDT in how they an­swer “What is it that you’re de­cid­ing when you make a de­ci­sion?” and “What are the con­se­quences of a de­ci­sion?”, and in roughly the same way. They an­swer the former by “You’re de­cid­ing the log­i­cal fact that the pro­gram-that-is-you makes a cer­tain out­put.” and the lat­ter by “The con­se­quences are the log­i­cal con­se­quences of that log­i­cal fact.” UDT differs from ADT in that UDT uses an un­speci­fied “math in­tu­ition mod­ule” to form prob­a­bil­ity dis­tri­bu­tion over pos­si­ble log­i­cal con­se­quences, whereas ADT uses log­i­cal de­duc­tion and only con­sid­ers con­se­quences that it can prove. (TDT also makes use of Pearl’s the­ory of causal­ity which I ad­mit­tedly do not un­der­stand.) • “You’re de­cid­ing the log­i­cal fact that the pro­gram-that-is-you makes a cer­tain out­put.” There is no need to fo­cus on these con­cepts. That the fact of de­ci­sion is “log­i­cal” doesn’t use­fully char­ac­ter­ize it: if we talk about the “phys­i­cal” fact of mak­ing a de­ci­sion, then ev­ery­thing else re­mains the same, you’d just need to see what this phys­i­cal event im­plies about de­ci­sions made by your near-copies el­se­where (among the nor­mal con­se­quences). Like­wise, point­ing to a phys­i­cal event doesn’t re­quire con­cep­tu­al­iz­ing a “pro­gram” or even an “agent” that com­putes the state of this event, you could just spec­ify co­or­di­nates in space­time and work on figur­ing out what’s there (roughly speak­ing). (It’s of course con­ve­nient to work with ab­stractly defined struc­tures, in par­tic­u­lar de­ci­sions gen­er­ated by pro­grams (rather than ab­stractly defined in a more gen­eral way), and at least with math­e­mat­i­cal struc­tural­ism in mind work­ing with ab­stract struc­tures looks like the right way of de­scribing things in gen­eral.) • But how does one iden­tify/​en­code a phys­i­cal fact? With a log­i­cal fact you can say “Pro­gram with source code X out­puts Y” and then de­duce con­se­quences from that. I don’t see what the equiv­a­lent is with a “phys­i­cal” no­tion of de­ci­sion. Is the agent sup­posed to have hard-coded knowl­edge of the laws of physics and its space­time co­or­di­nates (which would take the place of knowl­edge of its own source code) and then rep­re­sent a de­ci­sion as “the ob­ject at co­or­di­nate X in the uni­verse with laws Y and ini­tial con­di­tions Z does A”? That seems like a much less el­e­gant and prac­ti­cal solu­tion to me. And you’re still us­ing it as a log­i­cal fact, i.e., de­duc­ing log­i­cal con­se­quences from it, right? I feel like you must be mak­ing a point that I’m not get­ting... • The same way you find a way home. How does that work? Pre­sum­ably only if we as­sume the con­text of a par­tic­u­lar col­lec­tion of phys­i­cal wor­lds (per­haps with se­lected preferred ap­prox­i­mate lo­ca­tion). Given that we’re con­sid­er­ing only some wor­lds, ad­di­tional in­for­ma­tion that an agent has al­lows it to find a lo­ca­tion within these wor­lds, with­out know­ing the defi­ni­tion of those wor­lds. This I think is an im­por­tant point, and it comes up fre­quently for var­i­ous rea­sons: to use­fully act or rea­son, an agent doesn’t have to “per­son­ally” un­der­stand what’s go­ing on, there may be “ex­ter­nal” as­sump­tions that en­able an agent to act within them with­out hav­ing ac­cess to them. • (I’m prob­a­bly re­hash­ing what is already ob­vi­ous to all read­ers, or miss­ing some­thing, but:) That seems like a much less el­e­gant and prac­ti­cal solu­tion to me. ’Course, which Nesov ac­knowl­edged with his clos­ing sen­tence, but even so it’s con­ceiv­able, which in­di­cates that the fo­cus on log­i­cal facts isn’t a nec­es­sary dis­tinc­tion be­tween old and new de­ci­sion the­o­ries. And Nesov’s claim was only there there is no need to fo­cus on log­i­cal-ness as such to ex­plain the dis­tinc­tion. And you’re still us­ing it as a log­i­cal fact, i.e., de­duc­ing log­i­cal con­se­quences from it, right? Does com­pre­hen­sive phys­i­cal knowl­edge look any differ­ent from de­com­pressed log­i­cal knowl­edge? Log­i­cal facts seem to be facts about what re­mains true no mat­ter where you are, but if you know ev­ery­thing about where you are already then the log­i­cal as­pect of your knowl­edge doesn’t need to be ac­knowl­edged or rep­re­sented. More con­cretely, if you have a de­tailed phys­i­cal model of your selves, i.e. all in­stan­ti­a­tions of the pro­gram that is you, across all pos­si­ble quan­tum branches, and you know that all of them like to eat cake, then there’s no ad­di­tional in­for­ma­tion hid­den in the log­i­cal fact “pro­gram X, i.e. me, likes to eat cake”. You can rep­re­sent the knowl­edge ei­ther way, at least the­o­ret­i­cally, which I think is Nesov’s point, maybe? But this only seems true about phys­i­cally in­stan­ti­ated agents rea­son­ing about de­ci­sions from a first per­son per­spec­tive so to speak, so I’m con­fused; there doesn’t seem to be a cor­re­spond­ing “phys­i­cal” way to model how purely math­e­mat­i­cal ob­jects can am­bi­ently con­trol other purely math­e­mat­i­cal ob­jects. Is it as­sumed that such math­e­mat­i­cal ob­jects can only have causal in­fluence by way of their show­ing up in a phys­i­cally in­stan­ti­ated de­ci­sion calcu­lus some­where (at least for our pur­poses)? Or is the abil­ity of new de­ci­sion the­o­ries to rea­son about purely math­e­mat­i­cal ob­jects con­sid­ered rel­a­tively tan­gen­tial to the de­ci­sion the­o­ries’ defin­ing fea­tures (even if it is a real ad­van­tage)? • This post isn’t re­ally cor­rect about what dis­t­in­guishes CDT from EDT or TDT. The dis­tinc­tion has noth­ing to do with the pres­ence of other agents and can be seen in the ab­sence of such (e.g. Smok­ing Le­sion). In­deed nei­ther de­ci­sion the­ory con­tains a no­tion of “other agents”; both sim­ply re­gard things that we might clas­sify as “other agents” sim­ply as fea­tures of the en­vi­ron­ment. Fun­da­men­tally, the fol­low­ing para­graph is wrong: X wants to max­i­mize its ex­pected util­ity. If there were no other agents, this would be sim­ple: calcu­late the ex­pected value of each ac­tion (given its in­for­ma­tion on how likely each con­se­quence is to hap­pen if it does this ac­tion or that), then perform the ac­tion that re­sults in the high­est ex­pected value. The differ­ence be­tween these the­o­ries is ac­tu­ally in how they in­ter­pret the idea of “how likely each con­se­quence is to hap­pen if it does this ac­tion or that”; hence they differ even in that “sim­ple” case. (Note: I’m only con­sid­er­ing CDT, EDT, and TDT here. I think the oth­ers may work by some other mechanism?) • The differ­ence be­tween these the­o­ries is ac­tu­ally in how they in­ter­pret the idea of “how likely each con­se­quence is to hap­pen if it does this ac­tion or that”; hence they differ even in that “sim­ple” case. EDT does differ from CDT in that case (hence the Smok­ing Le­sion prob­lem), but EDT is clearly wrong to do so, and I can’t think of any one-player games that CDT gets wrong, or in which CDT dis­agrees with TDT. both sim­ply re­gard things that we might clas­sify as “other agents” sim­ply as fea­tures of the en­vi­ron­ment. I don’t think this is right in gen­eral- I re­al­ized overnight that these de­ci­sion the­o­ries are un­der­speci­fied on how they treat other agents, so in­stead they should be re­garded as classes of de­ci­sion the­o­ries. There are some CDT agents who treat other agents as un­cer­tain fea­tures of the en­vi­ron­ment, and some that treat them as pure un­knowns that one has to find a Nash equil­ibrium for. Both satisfy the re­quire­ments of CDT, and they’ll come to differ­ent an­swers some­times. (That is, if X doesn’t have a very good abil­ity to pre­dict how Y works, then a “fea­ture of the en­vi­ron­ment” CDT will treat Y’s ac­tion with an ig­no­rance prior which may be very differ­ent from any Nash equil­ibrium for Y, and X’s de­ci­sion might not be an equil­ibrium strat­egy. If X has the abil­ity to pre­dict Y well, and vice versa, then the two should be iden­ti­cal.) • Thanks for the re­cap. It still doesn’t an­swer my ques­tion, though: If X is a causal de­ci­sion the­o­rist, the choice is clear: what­ever Omega de­cided, it de­cided already This ap­pears to be in­cor­rect if the CDT knows that Omega always makes cor­rect predictions the prob­lem looks much the same if Omega has a 90% suc­cess rate rather than 100%. And this ap­pears to be in­cor­rect in all cases. The right de­ci­sion de­pends on ex­act na­ture of the noise. If Omega makes the de­ci­sion by an­a­lyz­ing the agent’s psy­cholog­i­cal tests taken in child­hood, then the agent should two-box. And if Omega makes a perfect simu­la­tion and then adds ran­dom noise, the agent should one-box. • If Omega makes the de­ci­sion by an­a­lyz­ing the agent’s psy­cholog­i­cal tests taken in child­hood, then the agent should two-box. Sorry, could you ex­plain this in more de­tail? • I think the idea is that even if Omega always pre­dicted two-box­ing, it still could be said to pre­dict with 90% ac­cu­racy if 10% of the hu­man pop­u­la­tion hap­pened to be one-box­ers. And yet you should two-box in that case. So ba­si­cally, the non-de­ter­minis­tic ver­sion of New­comb’s prob­lem isn’t speci­fied clearly enough. • I dis­agree. To be at all mean­ingful to the prob­lem, the “90% ac­cu­racy” has to mean that, given all the in­for­ma­tion available to you, you as­sign a 90% prob­a­bil­ity to Omega cor­rectly pre­dict­ing your choice. This is quite differ­ent from cor­rectly pre­dict­ing the choices of 90% of the hu­man pop­u­la­tion. • I don’t think this works in the ex­am­ple given, where Omega always pre­dicts 2-box­ing. We agree that the cor­rect thing to do in that case is to 2-box. And if I’ve de­cided to 2-box then I can be > 90% con­fi­dent that Omega will pre­dict my per­sonal ac­tions cor­rectly. But this still shouldn’t make me 1-box. I’ve com­mented on New­comb in pre­vi­ous threads… in my view it re­ally does mat­ter how Omega makes its pre­dic­tions, and whether they are perfectly re­li­able or just very re­li­able. • Agreed for that case, but perfect re­li­a­bil­ity still isn’t nec­es­sary (con­sider omega 99.99% ac­cu­rate/​10% one box­ers for ex­am­ple) What mat­ters is that your un­cer­tainty in omegas pre­dic­tion is tied to your un­cer­tainty in your ac­tions. If you’re 90% con­fi­dent that omega gets it right con­di­tion­ing on de­cid­ing to one box and 90% con­fi­dent that omega gets it right con­di­tional on de­cid­ing to two box, then you should one box. (0.9 1M>1K+0.1 1M) • Far bet­ter ex­pla­na­tion than mine, thanks! • Good point. I don’t think this is worth go­ing into within this post, but I in­tro­duced a weasel word to sig­nify that the cir­cum­stances of a 90% Pre­dic­tor do mat­ter. • Very nice, thanks! • Oh. That’s very nice, thanks! • If Omega makes the de­ci­sion by an­a­lyz­ing the agent’s psy­cholog­i­cal tests taken in child­hood, then the agent should two-box. Sorry, could you ex­plain this in more de­tail? Hu­mans are time-in­con­sis­tent de­ci­sion mak­ers. Why would Omega choose to fill the boxes ac­cord­ing to a cer­tain point in con­figu­ra­tion space rather than some av­er­age mea­sure? Most of your life you would have two-boxed af­ter all. There­fore if Omega was to pre­dict whether you (space-time-worm) will take both boxes or not, when it meets you at an ar­bi­trary point in con­figu­ra­tion space, it might pre­dict that you are go­ing to two-box if you are not go­ing to life for much longer in which time-pe­riod you are go­ing to con­sis­tently choose to one-box. ETA It doesn’t re­ally mat­ter when a su­per­in­tel­li­gence will meet you. What mat­ters is for how long a pe­riod you adopted which de­ci­sion pro­ce­dure, re­spec­tively were sus­cep­ti­ble to what kind of ex­ploita­tion. If you only changed your mind about a de­ci­sion pro­ce­dure for .01% of your life it might still worth to act on that acausally. • Hmm, I’m not sure this is an ad­e­quate for­mal­iza­tion, but: Lets as­sume there is an evolved pop­u­la­tion of agents. Each agent has an in­ter­nal pa­ram­e­ter p, 0<=p<=1, and im­ple­ments a de­ci­sion pro­ce­dure p*CDT + (1-p)*EDT. That is, given a prob­lem, the agent tosses a pseu­do­ran­dom p-bi­ased coin and de­cides ac­cord­ing to ei­ther CDT or EDT, de­pend­ing on the re­sults of the toss. As­sume fur­ther that there is a test set of a hun­dred bi­nary de­ci­sion prob­lems, and Omega knows the test re­sults for ev­ery agent, and does not know any­thing else about them. Then Omega can es­ti­mate P(agent’s p = q | test re­sults) and pre­dict “one box” if the max­i­mum like­li­hood es­ti­mate of p is >1/​2 and “two box” oth­er­wise. [Here I as­sume for the sake of ar­gu­ment that CDT always two-boxes.] Given a right dis­tri­bu­tion of p-s in the pop­u­la­tion, Omega can be made to pre­dict with any given ac­cu­racy. Yet, there ap­pears to be no rea­son to one-box... • Wait, are you de­riv­ing the use­less­ness of UDT from the fact that the pop­u­la­tion doesn’t con­tain UDT? That looks cir­cu­lar, un­less I’m miss­ing some­thing... • Err, no, I’m not de­riv­ing the use­less­ness of ei­ther de­ci­sion the­ory here. My point is that only the “pure” New­comb’s prob­lem—where Omega always pre­dicts cor­rectly and the agent knows it—is well-defined. The “noisy” prob­lem, where Omega is known to some­times guess wrong, is un­der­speci­fied. The cor­rect solu­tion (that is whether one-box­ing or two-box­ing is the util­ity max­i­miz­ing move) de­pends on ex­actly how and why Omega makes mis­takes. Sim­ply say­ing “prob­a­bil­ity 0.9 of cor­rect pre­dic­tion” is in­suffi­cient. But in the “pure” New­comb’s prob­lem, it seems to me that CDT would ac­tu­ally one-box, rea­son­ing as fol­lows: 1. Since Omega always pre­dicts cor­rectly, I can as­sume that it makes its pre­dic­tions us­ing a full simu­la­tion. 2. Then this situ­a­tion in which I find my­self now (mak­ing the de­ci­sion in New­comb’s prob­lem) can be ei­ther out­side or within the simu­la­tion. I have no way to know, since it would look the same to me ei­ther way. 3. There­fore I should de­cide as­sum­ing 12 prob­a­bil­ity that I am in­side Omega’s simu­la­tion and 12 that I am out­side. 4. So I one-box. • 12 Mar 2012 14:58 UTC −1 points Parent f X is a causal de­ci­sion the­o­rist, the choice is clear: what­ever Omega de­cided, it de­cided already This ap­pears to be in­cor­rect if the CDT knows that Omega always makes cor­rect predictions If a CDT agent A is told about the prob­lem be­fore Omega makes its pre­dic­tion and fills the boxes, then A will want to stop be­ing a CDT agent for the du­ra­tion of the ex­per­i­ment. Maybe that’s what you mean? • No, I mean I think CDT can one-box within the reg­u­lar New­comb’s prob­lem situ­a­tion, if its rea­son­ing ca­pa­bil­ities are suffi­ciently strong. In de­tail: here and in the thread here. • This might not satis­fac­to­rily an­swer your con­fu­sion but: CDT is defined by the fact that it has in­cor­rect causal graphs. If it has cor­rect causal graphs then it’s not CDT. Why bother talk­ing about a “de­ci­sion the­ory” that is ar­bi­trar­ily limited to in­cor­rect causal graphs? Be­cause that’s the de­ci­sion the­ory that aca­demic de­ci­sion the­o­rists like to talk about and treat as de­fault. Why did aca­demic de­ci­sion the­o­rists never re­al­ize that their causal graphs were wrong? No one has a very good model of that, but check out Wei Dai’s re­lated spec­u­la­tion here. Note that if you define causal­ity in a tech­ni­cal Marko­vian way and use Bayes nets then there is no differ­ence be­tween CDT and TDT. I used to get an­noyed be­cause CDT with a good enough world model should clearly one-box yet peo­ple stipu­lated that it wouldn’t; only later did I re­al­ize that it’s mostly a rhetor­i­cal thing and no one thinks that if you ac­tu­ally im­ple­mented an AGI with “CDT” that it’d be as dumb as academia/​LessWrong’s ver­sion of CDT. If I’m wrong about any of the above then some­one please cor­rect me as this is rele­vant to FAI strat­egy. • No, I mean I think CDT can one-box within the reg­u­lar New­comb’s prob­lem situ­a­tion, if its rea­son­ing ca­pa­bil­ities are suffi­ciently strong. In de­tail: here and in the thread here. No, if you have an agent that is one box­ing ei­ther it is not a CDT agent or the game it is play­ing is not New­comb’s prob­lem. More speci­fi­cally, in your first link you de­scribe a game that is not New­comb’s prob­lem and in the sec­ond link you de­scribe an agent that does not im­ple­ment CDT. • More speci­fi­cally, in your first link you de­scribe a game that is not New­comb’s prob­lem and in the sec­ond link you de­scribe an agent that does not im­ple­ment CDT It would be a lit­tle more helpful, al­though prob­a­bly not quite as cool-sound­ing, if you ex­plained in what way the game is not New­comb’s in the first link, and the agent not a CDT in the sec­ond. AFAIK, the two links de­scribe ex­actly the same prob­lem and ex­actly the same agent, and I wrote both com­ments. • It would be a lit­tle more helpful, al­though prob­a­bly not quite as cool-sound­ing, That doesn’t seem to make helping you ap­peal­ing. if you ex­plained in what way the game is not New­comb’s in the first link, The agent be­lieves that it is has 50% chance of be­ing in an ac­tual New­comb’s prob­lem and 50% chance of be­ing in a simu­la­tion which will be used to pre­sent an­other agent with a New­comb’s prob­lem some time in the fu­ture. and the agent not a CDT in the sec­ond. Orthonor­mal already ex­plained this in the con­text. • That doesn’t seem to make helping you ap­peal­ing. Yes, I have this prob­lem, work­ing on it. I’m sorry, and thanks for your pa­tience! The agent be­lieves that it is has 50% chance of be­ing in an ac­tual New­comb’s prob­lem and 50% chance of be­ing in a simu­la­tion which will be used to pre­sent an­other agent with a New­comb’s prob­lem some time in the fu­ture. Yes, ex­cept for s/​an­other agent/​it­self/​. In what way this is not a cor­rect de­scrip­tion of a pure New­comb’s prob­lem from the agent’s point of view? This is my origi­nal still unan­swered ques­tion. Note: in the usual for­mu­la­tions of New­comb’s prob­lem for UDT, the agent knows ex­actly that—it is called twice, and when it is run­ning it does not know which of the two calls is be­ing eval­u­ated. Orthonor­mal already ex­plained this in the con­text. I an­swered his ex­pla­na­tion in the con­text, and he ap­peared to agree. His other ob­jec­tion seems to be based on a mis­taken un­der­stand­ing. • This is worth writ­ing into its own post- a CDT agent with a non-self-cen­tered util­ity func­tion (like a pa­per­clip max­i­mizer) and a cer­tain model of an­throp­ics (in which, if it knows it’s be­ing simu­lated, it views it­self as pos­si­bly within the simu­la­tion), when faced with a Pre­dic­tor that pre­dicts by simu­lat­ing (which is not always the case), one-boxes on New­comb’s Prob­lem. This is a novel and sur­pris­ing re­sult in the aca­demic liter­a­ture on CDT, not the pre­dic­tion they ex­pected. But it seems to me that if you vi­o­late any of the con­di­tions above, one-box­ing col­lapses back into two-box­ing; and fur­ther­more, it won’t co­op­er­ate in the Pri­soner’s Dilemma against a CDT agent with an or­thog­o­nal util­ity func­tion. That, at least, is in­escapable from the in­de­pen­dence as­sump­tion. • And as I replied there, this de­pends on its util­ity func­tion be­ing such that “filling the box for my non-simu­lated copy” has utiity com­pa­rable to “tak­ing the ex­tra box when I’m not simu­lated”. There are util­ity func­tions for which this works (e.g. max­i­miz­ing pa­per­clips in the real world) and util­ity func­tions for which it doesn’t (e.g. max­i­miz­ing he­dons in my per­sonal fu­ture, whether I’m be­ing simu­lated or not), and Omega can slightly change the prob­lem (simu­late an agent with the same de­ci­sion al­gorithm as X but a differ­ent util­ity func­tion) in a way that makes CDT two-box again. (That trick wouldn’t stop TDT/​UDT/​ADT from one-box­ing.) • I think you missed my point. Omega can slightly change the prob­lem (simu­late an agent with the same de­ci­sion al­gorithm as X but a differ­ent util­ity func­tion) This is ir­rele­vant. The agent is ac­tu­ally out­side, think­ing what to do in the New­comb’s prob­lem. But only we know this, the agent it­self doesn’t. All the agent knows is that Omega always pre­dicts cor­rectly. Which means, the agent can model Omega as a perfect simu­la­tor. The ac­tual method that Omega uses to make pre­dic­tions does not mat­ter, the world would look the same to the agent, re­gard­less. • Un­less Omega pre­dicts with­out simu­lat­ing- for in­stance, this for­mu­la­tion of UDT can be for­mally proved to one-box with­out simu­lat­ing. • Er­rrr. The agent does not simu­late any­thing in my ar­gu­ment. The agent has a “men­tal model” of Omega, in which Omega is a perfect simu­la­tor. It’s about rep­re­sen­ta­tion of the prob­lem within the agent’s mind. In your link, Omega—the func­tion U() - is a perfect simu­la­tor. It calls the agent func­tion A() twice, once to get its pre­dic­tion, and once for the ac­tual de­ci­sion. • The prob­lem would work as well if the first call went not to A di­rectly but query­ing the or­a­cle whether A()=1. There are ways of pre­dict­ing that aren’t simu­la­tion, and if that’s the case then your idea falls apart. • A few minor points: • You men­tion util­ity (“can be adapted to any util­ity func­tion”) be­fore defin­ing what util­ity is. Also, you make it sound like the con­cept of util­ity is spe­cific to CDT rather than be­ing com­mon to all of the de­ci­sion the­o­ries men­tioned. • Utility isn’t the same as util­i­tar­i­anism. There are only cer­tain classes of util­ity func­tions that could rea­son­ably be con­sid­ered “util­i­tar­ian”, but de­ci­sion the­o­ries work for any util­ity func­tion. • What ex­actly do you mean by a “zero-sum game”? Are we talk­ing about two-player games only? (talk­ing about “the other play­ers” threw me off here) • Thanks! I’ve made some ed­its. You men­tion util­ity (“can be adapted to any util­ity func­tion”) be­fore defin­ing what util­ity is. I think the con­cept of util­ity func­tions is wide­spread enough that I can get away with it (and I can’t find an aes­thet­i­cally pleas­ing way to re­order that sec­tion and fix it). Utility isn’t the same as util­i­tar­i­anism. There are only cer­tain classes of util­ity func­tions that could rea­son­ably be con­sid­ered “util­i­tar­ian”, but de­ci­sion the­o­ries work for any util­ity func­tion. Nowhere in this post am I talk­ing about Ben­thamist al­tru­is­tic util­i­tar­i­anism. I re­al­ize the am­bi­guity of the terms, but again I don’t see a good way to fix it. What ex­actly do you mean by a “zero-sum game”? Are we talk­ing about two-player games only? (talk­ing about “the other play­ers” threw me off here) Oops, good catch. • Nowhere in this post am I talk­ing about Ben­thamist al­tru­is­tic utilitarianism Ah sorry—it was the link “but that’s a differ­ent topic” that I was talk­ing about, I re­al­ize I didn’t make that clear. I was ex­pect­ing the jus­tifi­ca­tion for as­sign­ing out­comes a util­ity would link to some­thing about the Von Neu­mann-Mor­gen­stern ax­ioms, which I think are less con­tro­ver­sial than al­tru­is­tic util­i­tar­i­anism. But it’s only a minor point. • Ah. It may be a mat­ter of in­ter­pre­ta­tion, but I view that post and this one as more en­light­en­ing on the need for ex­pected-util­ity calcu­la­tion, even for non-al­tru­ists, than the von Neu­mann-Mor­gen­stern ax­ioms. • Oooh, be­tween your de­scrip­tion of CDT, and the re­cent post on how ex­perts will defend their the­ory even against coun­ter­fac­tu­als, I fi­nally un­der­stand how some­one can pos­si­bly jus­tify two-box­ing in New­combe’s Prob­lem. As an added bonus, I’m also see­ing how be­ing “half ra­tio­nal” can be in­tensely dan­ger­ous, since it leads to things like two-box­ing in New­combe’s Prob­lem :) • There are a cou­ple of things I find odd about this. First, it seems to be taken for granted that one-box­ing is ob­vi­ously bet­ter than two box­ing, but I’m not sure that’s right. J.M. Joyce has an ar­gu­ment (in his foun­da­tions of causal de­ci­sion the­ory) that is sup­posed to con­vince you that two-box­ing is the right solu­tion. Im­por­tantly, he ac­cepts that you might still wish you weren’t a CDT (so that Omega pre­dicted you would one-box). But, he says, in ei­ther case, once the boxes are in front of you, whether you are a CDT or a EDT, you should two-box! The dom­i­nance rea­son­ing works in ei­ther case, once the pre­dic­tion has been made and the boxes are in front of you. But this leads me on to my sec­ond point. I’m not sure how much of a flaw New­comb’s prob­lem is in a de­ci­sion the­ory, given that it re­lies on the in­ter­ven­tion of an alien that can ac­cu­rately pre­dict what you will do. Let’s leave aside the gen­eral prob­lem of pre­dict­ing real agents’ ac­tions with that de­gree of ac­cu­racy. If you know that the pre­dic­tion of your choice af­fects the suc­cess of your choices, I think that re­flex­ivity or self refer­ence sim­ply makes the pre­dic­tion mean­ingless. We’re all used to self-refer­ence be­ing tricky, and I think in this case it just un­der­mines the whole set up. That is, I don’t see the force of the ob­jec­tion from New­comb’s prob­lem, be­cause I don’t think it’s a prob­lem we could ever pos­si­bly face. Here’s an ex­am­ple of a re­lated kind of “re­flex­ivity makes pre­dic­tion mean­ingless”. Let’s say Omega bets you$100 that she can pre­dict what you will eat for break­fast. Once you ac­cept this bet, you now try to think of some­thing that you would never oth­er­wise think to eat for break­fast, in or­der to win the bet. The fact that your ac­tions and the pre­dic­tion of your ac­tions have been con­nected in this way by the bet makes your ac­tions un­pre­dictable.

Go­ing on to the pris­oner’s dilemma. Again, I don’t think that it’s the job of de­ci­sion the­ory to get “the right” re­sult in PD. Again, the dom­i­nance rea­son­ing seems im­pec­ca­ble to me. In fact, I’m tempted to say that I would want any fu­ture ad­vanced de­ci­sion the­ory to satisfy some form of this dom­i­nance prin­ci­ple: it’s crazy to ever choice an act that is guaran­teed to be worse. All you need to do to “fix” PD is to have the agent at­tach enough weight to the welfare of oth­ers. That’s not a mod­ifi­ca­tion of the de­ci­sion the­ory, that’s a mod­ifi­ca­tion of the util­ity func­tion.

• I gen­er­ally share your reser­va­tions.

But as I un­der­stand it, pro­po­nents of al­ter­na­tive DTs are talk­ing about a con­di­tional PD where you know you face an op­po­nent ex­e­cut­ing a par­tic­u­lar DT. The fancy-DT-users all defect on PD when the prior of their PD-part­ner be­ing on CDT or similar is high enough, right?

Wouldn’t you like to be the type of agent who co­op­er­ates with near-copies of your­self? Wouldn’t you like to be the type of agent who one-boxes? The trick is to satisfy this de­sire with­out us­ing a bunch of stupid spe­cial-case rules, and show that it doesn’t lead to poor de­ci­sions el­se­where.

• But as I un­der­stand it, pro­po­nents of al­ter­na­tive DTs are talk­ing about a con­di­tional PD where you know you face an op­po­nent ex­e­cut­ing a par­tic­u­lar DT. The fancy-DT-users all defect on PD when the prior of their PD-part­ner be­ing on CDT or similar is high enough, right?

(Yes, you are cor­rect!)

• Wouldn’t you like to be the type of agent who co­op­er­ates with near-copies of your­self? Wouldn’t you like to be the type of agent who one-boxes?

Yes, but it would be strictly bet­ter (for me) to be the kind of agent who defects against near-copies of my­self when they co-op­er­ate in one-shot games. It would be bet­ter to be the kind of agent who is pre­dicted to one-box, but then two-box once the money has been put in the opaque box.

But the point is re­ally that I don’t see it as the job of an al­ter­na­tive de­ci­sion the­ory to get “the right” an­swers to these sorts of ques­tions.

• The larger point makes sense. Those two things you pre­fer are im­pos­si­ble ac­cord­ing to the rules, though.

• They’re not nec­es­sar­ily im­pos­si­ble. If you have gen­uine rea­son to be­lieve you can out­smart Omega, or that you can out­smart the near-copy of your­self in PD, then you should two-box or defect.

But if the only in­for­ma­tion you have is that you’re play­ing against a near-copy of your­self in PD, then co­op­er­at­ing is prob­a­bly the smart thing to do. I un­der­stand this kind of thing is still be­ing figured out.

• Ac­cord­ing to what rules? And any­way I have prefer­ences for all kinds of im­pos­si­ble things. For ex­am­ple, I pre­fer co­op­er­at­ing with copies of my­self, even though I know it would never hap­pen, since we’d both ac­cept the dom­i­nance rea­son­ing and defect.

• Ac­cord­ing to what rules?

I think he meant ac­cord­ing to the rules of the thought ex­per­i­ments. In New­comb’s prob­lem, Omega pre­dicts what you do. What­ever you choose to do, that’s what Omega pre­dicted you would choose to do. You can­not to choose to do some­thing that Omega wouldn’t pre­dict—it’s im­pos­si­ble. There is no such thing as “the kind of agent who is pre­dicted to one-box, but then two-box once the money has been put in the opaque box”.

• Else­where on this com­ment thread I’ve dis­cussed why I think those “rules” are not in­ter­est­ing. Ba­si­cally, be­cause they’re im­pos­si­ble to im­ple­ment.

• Right. The rules of the re­spec­tive thought ex­per­i­ments.. Similarly, if you’re the sort to defect against near copies of your­self in one-shot PD, then so is your near copy. (edit: I see now that scm­bradley already wrote about that—sorry for the re­dun­dancy).

• Here’s an ex­am­ple of a re­lated kind of “re­flex­ivity makes pre­dic­tion mean­ingless”. Let’s say Omega bets you $100 that she can pre­dict what you will eat for break­fast. Once you ac­cept this bet, you now try to think of some­thing that you would never oth­er­wise think to eat for break­fast, in or­der to win the bet. The fact that your ac­tions and the pre­dic­tion of your ac­tions have been con­nected in this way by the bet makes your ac­tions un­pre­dictable. Your ac­tions have been de­ter­mined in part by the bet that Omega has made with you—I do not see how that is sup­posed to make them un­pre­dictable any more than adding any other vari­able would do so. Re­mem­ber: You only ap­pear to have free will from within the al­gorithm, you may de­cide to think of some­thing you’d never oth­er­wise think about but Omega is ad­vanced enough to model you down to the most ba­sic level—it can pre­dict your more com­plex be­havi­ours based upon the com­bi­na­tion of far sim­pler rules. You can­not nec­es­sar­ily just de­cide to think of some­thing ran­dom which would be re­quired in or­der to be un­pre­dictable. Similarly, the whole ques­tion of whether you should choose to two box or one box is a bit iffy. Strictly speak­ing there’s no SHOULD about it. You will one box or you will two box. The ques­tion phrased as a should ques­tion—as a choice—is mean­ingless un­less you’re treat­ing choice as a high-level ab­strac­tion of lower level rules; and if you do that, then the difficult dis­ap­pears—just as you don’t ask a rock whether it should or shouldn’t crush some­one when it falls down a hill. Mean­ingfully, we might ask whether it is prefer­able to be the type of per­son who two boxes or the type of per­son who one boxes. As it turns out it seems to be more prefer­able to one-box and make stink­ing great piles of dosh. And as it turns out I’m the sort of per­son who, hold­ing a de­sire for filthy lu­cre, will do so. It’s re­ally difficult to side step your in­tu­itions—your illu­sion that you ac­tu­ally get a free choice here. And I think the phras­ing of the prob­lem and its an­swers them­selves have a lot to do with that. I think if you think that peo­ple get a choice—and the mechanisms of Omega’s pre­dic­tion hinge upon you be­ing strongly de­ter­mined—then the ques­tion just ceases to make sense. And you’ve got to jet­ti­son one of the two; ei­ther Omega’s pre­dic­tion abil­ity or your abil­ity to make a choice in the sense con­ven­tion­ally meant. • we might ask whether it is prefer­able to be the type of per­son who two boxes or the type of per­son who one boxes. As it turns out it seems to be more prefer­able to one-box No. What is prefer­able is to be the kind of per­son Omega will pre­dict will one-box, and then ac­tu­ally two-box. As long as you “trick” Omega, you get strictly more money. But I guess your point is you can’t trick Omega this way. Which brings me back to whether Omega is fea­si­ble. I just don’t share the in­tu­ition that Omega is ca­pa­ble of the sort of pre­dic­tive ca­pac­ity re­quired of it. • Which brings me back to whether Omega is fea­si­ble. I just don’t share the in­tu­ition that Omega is ca­pa­ble of the sort of pre­dic­tive ca­pac­ity re­quired of it. Well, I guess my re­sponse to that would be that it’s a thought ex­per­i­ment. Omega’s re­ally just an ex­treme—hy­po­thet­i­cal—case of a pow­er­ful pre­dic­tor, that makes prob­lems in CDT more eas­ily seen by am­plify­ing them. If we were to talk about the pris­oner’s dilemma, we could eas­ily have roughly the same un­der­ly­ing dis­cus­sion. • See mine and or­thonor­mal’s com­ments on the PD on this post for my view of that. The point I’m strug­gling to ex­press is that I don’t think we should worry about the thought ex­per­i­ment, be­cause I have the feel­ing that Omega is some­how im­pos­si­ble. The sug­ges­tion is that New­comb’s prob­lem makes a prob­lem with CDT clearer. But I ar­gue that New­comb’s prob­lem makes the prob­lem. The flaw is not with the de­ci­sion the­ory, but with the con­cept of such a pre­dic­tor. So you can’t use CDT’s “failure” in this cir­cum­stance as ev­i­dence that CDT is wrong. Here’s a re­lated point: Omega will never put the money in the box. Smith act like a one-boxer. Omega pre­dicts that Smith will one-box. So the mil­lion is put in the opaque box. Now Omega rea­sons as fol­lows: “Wait though. Even if Smith is a one-boxer, now that I’ve fixed what will be in the boxes, Smith is bet­ter off two-box­ing. Smith is smart enough to re­al­ise that two-box­ing is dom­i­nant, once I can’t causally af­fect the con­tents of the boxes.” So Omega doesn’t put the money in the box. Would one-box­ing ever be ad­van­ta­geous if Omega were rea­son­ing like that? No. The point is Omega will always rea­son that two-box­ing dom­i­nates once the con­tents are fixed. There seems to be some­thing un­sta­ble about Omega’s rea­son­ing. I think this is re­lated to why I feel Omega is im­pos­si­ble. (Though I’m not sure how the points in­ter­act ex­actly.) • Here’s a re­lated point: Omega will never put the money in the box. Smith act like a one-boxer. Omega pre­dicts that Smith will one-box. So the mil­lion is put in the opaque box. Now Omega rea­sons as fol­lows: “Wait though. Even if Smith is a one-boxer, now that I’ve fixed what will be in the boxes, Smith is bet­ter off two-box­ing. Smith is smart enough to re­al­ise that two-box­ing is dom­i­nant, once I can’t causally af­fect the con­tents of the boxes.” So Omega doesn’t put the money in the box. By that logic, you can never win in Kavka’s toxin/​Parfit’s hitch­hiker sce­nario. • So I agree. It’s lucky I’ve never met a game the­o­rist in the desert. Less flip­pantly. The logic pretty much the same yes. But I don’t see that as a prob­lem for the point I’m mak­ing; which is that the perfect pre­dic­tor isn’t a thought ex­per­i­ment we should worry about. • “Wait though. Even if Smith is a one-boxer, now that I’ve fixed what will be in the boxes, Smith is bet­ter off two-box­ing. Smith is smart enough to re­al­ise that two-box­ing is dom­i­nant, once I can’t causally af­fect the con­tents of the boxes.” So Omega doesn’t put the money in the box. That line of rea­son­ing is though available to Smith as well, so he can choose to one-box­ing be­cause he knows that Omega is a perfect pre­dic­tor. You’re right to say that the in­ter­play be­tween Omega-pre­dic­tion-of-Smith and Smith-pre­dic­tion-of-Omega are in a meta-sta­ble state, BUT: Smith has to de­cide, he is go­ing to make a de­ci­sion, and so what­ever al­gorithm it im­ple­ments, if it ever goes down this line of meta-sta­ble rea­son­ing, must have a way to get out and choose some­thing, even if it’s just bounded com­pu­ta­tional power (or the limit step of com­pu­ta­tion in Hamk­ins in­finite Tur­ing ma­chine). But since Omega is a perfect pre­dic­tor, it will know that and choose ac­cord­ingly. I have the feel­ing that Omega ex­is­tence is some­thing like an ax­iom, you can re­fuse or ac­cept it and both stances are co­her­ent. • Well, i can im­ple­ment omega by scan­ning your brain and simu­lat­ing you. The other ‘non im­ple­men­ta­tions’ of omega, though, imo are best ig­nored en­tirely. You can’t re­ally blame a de­ci­sion the­ory for failure if there’s no sen­si­ble model of the world for it to use. My de­ci­sion the­ory, per­son­ally, al­lows me to ig­nore un­known and edit my ex­pected util­ity for­mula in ad-hoc way if i’m suffi­ciently con­vinced that omega will work as de­scribed. I think that’s prac­ti­cally use­ful be­cause effec­tive heuris­tics of­ten have to be in­vented on spot with­out suffi­cient model of the world. edit: albeit, if i was con­vinced that omega works as de­scribed, i’d be con­vinced that it has scanned my brain and is em­u­lat­ing my de­ci­sion pro­ce­dure, or is us­ing time travel, or is de­cid­ing ran­domly then de­stroy­ing the uni­verses where it was wrong… with more time i can prob­a­bly come up with other im­ple­men­ta­tions, the com­mon thing about the im­ple­men­ta­tions though is that i should 1-box. • Well, i can im­ple­ment omega by scan­ning your brain and simu­lat­ing you. Pro­vided my brain’s choice isn’t af­fected by quan­tum noise, oth­er­wise I don’t think you can. :-) • Peo­ple with mem­ory prob­lems tend to re­peat “spon­ta­neous” in­ter­ac­tions in es­sen­tially the same way, which is ev­i­dence that quan­tum noise doesn’t usu­ally sway choices. • Good point. Still, the brain’s choice can be quite de­ter­minis­tic, if you give it enough thought—av­er­ag­ing out noise. • You can­not nec­es­sar­ily just de­cide to think of some­thing ran­dom which would be re­quired in or­der to be un­pre­dictable. Pre­sented with this sce­nario, I’d come up with a scheme de­scribing a table of as many differ­ent op­tions as I could man­age—ideally a very large num­ber, but the com­bi­na­torics would prob­a­bly get un­wieldy af­ter a while—and pull num­bers from http://​​www.four­milab.ch/​​hot­bits/​​ to make a se­lec­tion. I might still lose, but know­ing (to some small p-value) that it’s pos­si­ble to pre­dict ra­dioac­tive de­cay would eas­ily be worth$100.

Of course, that’s the smar­tassed an­swer.

• Well the smar­tarse re­sponse is that Omega’s just plugged him­self in on the other end of your hot­bits re­quest =p

• Again, the dom­i­nance rea­son­ing seems im­pec­ca­ble to me. In fact, I’m tempted to say that I would want any fu­ture ad­vanced de­ci­sion the­ory to satisfy some form of this dom­i­nance prin­ci­ple: it’s crazy to ever choice an act that is guaran­teed to be worse.

It’s not always co­op­er­at­ing- that would be dumb. The claim is that there can be im­prove­ments on what a CDT al­gorithm can achieve: TDT or UDT still defects against an op­po­nent that always defects or always co­op­er­ates, but achieves (C,C) in some situ­a­tions where CDT gets (D,D). The dom­i­nance rea­son­ing is only im­pec­ca­ble if agents’ de­ci­sions re­ally are in­de­pen­dent, just like cer­tain the­o­rems in prob­a­bil­ity only hold when the ran­dom vari­ables are in­de­pen­dent. (And yes, this is a pre­cisely analo­gous mean­ing of “in­de­pen­dent”.)

• Aha. So when agents’ ac­tions are prob­a­bil­is­ti­cally in­de­pen­dent, only then does the dom­i­nance rea­son­ing kick in?

So the causal de­ci­sion the­o­rist will say that the dom­i­nance rea­son­ing is ap­pli­ca­ble when­ever the agents’ ac­tions are causally in­de­pen­dent. So do these other de­ci­sion the­o­ries deny this? That is, do they claim that the dom­i­nance rea­son­ing can be un­sound even when my choice doesn’t causally im­pact the choice of the other?

• That’s one valid way of look­ing at the dis­tinc­tion.

CDT al­lows the causal link from its cur­rent move in chess to its op­po­nent’s next move, so it doesn’t view the two as in­de­pen­dent.

In New­comb’s Prob­lem, tra­di­tional CDT doesn’t al­low a causal link from its de­ci­sion now to Omega’s ac­tion be­fore, so it ap­plies the in­de­pen­dence as­sump­tion to con­clude that two-box­ing is the dom­i­nant strat­egy. Ditto with play­ing PD against its clone.

(Come to think of it, it’s ba­si­cally a Markov chain for­mal­ism.)

• So these al­ter­na­tive de­ci­sion the­o­ries have re­la­tions of de­pen­dence go­ing back in time? Are they sort of couter­fac­tual de­pen­dences like “If I were to one-box, Omega would have put the mil­lion in the box”? That just sounds like the Ev­i­den­tial­ist “news value” ac­count. So it must be some other kind of re­la­tion of de­pen­dence go­ing back­wards in time that rules out the dom­i­nance rea­son­ing. I guess I need “Other De­ci­sion The­o­ries: A Less Wrong Primer”.

• (gah. I wanted to delete this be­cause I de­cided it was sort of a use­less thing to say, but now it’s here in dis­tract­ing re­tracted form, be­ing even worse)

All you need to do to “fix” PD is to have the agent at­tach enough weight to the welfare of oth­ers. That’s not a mod­ifi­ca­tion of the de­ci­sion the­ory, that’s a mod­ifi­ca­tion of the util­ity func­tion.

And it’s ar­guably tel­ling that this is the solu­tion evolu­tion found. Hu­mans are ac­tu­ally pretty good at avoid­ing proper pris­on­ers’ dilem­mas, due to our some­what pro-so­cial util­ity func­tions.

• Thank you so much for this. I re­ally don’t feel like I un­der­stand de­ci­sion the­o­ries very well, so this was helpful. But I’d like to ask a cou­ple ques­tions that I’ve had for awhile that this didn’t re­ally an­swer.

1. Why does ev­i­den­tial de­ci­sion the­ory nec­es­sar­ily fail the Smok­ing Le­sion Prob­lem? That link is tech­ni­cally Solomon’s prob­lem, not Smok­ing Le­sion prob­lem, but it’s re­lated. If p(Cancer|smok­ing|le­sion) = p(Cancer|not smok­ing|le­sion), why is Ev­i­den­tial De­ci­sion The­ory for­bid­den from us­ing these prob­a­bil­ities? Ev­i­den­tial de­ci­sion the­ory makes a lot of in­tu­itive sense to me and I don’t re­ally see why it’s demon­stra­bly wrong.

2. It seems like TDT can pretty eas­ily fail a ver­sion of New­comb’s prob­lem to me. (Maybe ev­ery­one knows this already and I haven’t just seen it any­where.) Sup­pose there is a CDT AI that, over the course of a year, mod­ifies to be­come a TDT. Sup­pose also that this AI is pre­sented with a vari­a­tion of New­comb’s prob­lem. The twist is this: Omega’s place­ment of money into the opaque box is de­ter­mined not by the de­ci­sion the­ory you cur­rently op­er­ate by, but that which you op­er­ated by a year ago. As such, Omega will leave $0 in the box. But TDT, as I un­der­stand it, acts as though it con­trols all nodes of the de­ci­sion mak­ing pro­cess si­mul­ta­neously, which makes it vuln­er­a­ble to pro­cesses that take ex­tended pe­ri­ods of time, where the agents de­ci­sion the­ory may well have changed. You can prob­a­bly see where this is go­ing: TDT/​former CDT one boxes and finds$0, the in­fe­rior op­tion to $1000. This isn’t re­ally a rigor­ous cri­tique of TDT, since it’s ob­vi­ously pred­i­cated on not be­ing a TDT at a prior point, but it was a ques­tion I thought about in the con­text of a self mod­ify­ing CDT AI. 1. It’s ac­tu­ally tough to pre­dict what EDT would do, since it de­pends on pick­ing the right refer­ence class for your­self, and we have no idea how to for­mal­ize those. But the ex­pla­na­tions of why EDT would one-box on New­comb’s Prob­lem ap­pear iso­mor­phic to ex­pla­na­tions of why EDT would forgo smok­ing in the Smok­ing Le­sion prob­lem, so it ap­pears that a ba­sic im­ple­men­ta­tion would fail one or the other. 2. That shouldn’t hap­pen. If TDT knows that the boxes are be­ing filled by simu­lat­ing a CDT al­gorithm (even if that al­gorithm was its an­ces­tor), then it will two-box. • Eliezer once pointed out that our in­tu­itions on most for­mu­la­tions of the Pri­soner’s Dilemma are skewed by our no­tions of fair­ness, and a more out­landish ex­am­ple might serve bet­ter to illus­trate how a gen­uine PD re­ally feels. It might also help to con­sider ex­am­ples in which “co­op­er­a­tion” doesn’t give warm fuzzy feel­ings and “defec­tion” the op­po­site. Busi­ness­men form­ing a car­tel are also in a PD situ­a­tion. Do we want busi­ness­men to gang up against their cus­tomers? This may be cul­turally spe­cific, though. It’s in­ter­est­ing that in the stan­dard PD, we’re sup­posed to be root­ing for the pris­on­ers to go free. Is that how it’s viewed in other coun­tries? • I think in PD we are root­ing that it doesn’t hap­pen so that the worse pris­oner goes free and the more hon­ourable one sits in jail. • Busi­ness­men form­ing a car­tel are also in a PD situ­a­tion. Do we want busi­ness­men to gang up against their cus­tomers? It prob­a­bly de­pends on how you frame the game—are busi­ness­men the only play­ers, or are their cus­tomers play­ers too? In this model, only play­ers have utilons, there are no utilons for the “en­vi­ron­ment”. Or per­haps the well-be­ing of gen­eral pop­u­la­tion could also be a part of busi­ness­man’s util­ity func­tion. If cus­tomers are not play­ers, and their well-be­ing does not af­fect busi­ness­men’s util­ity func­tion (in other words, we talk about psy­cho­pathic busi­ness­men), then it is a clas­si­cal PD. But it is good to note than in re­al­ity “warm fuzzy feel­ings” are part of the hu­man util­ity func­tion, whether the model ac­knowl­edges it or not, so some our in­tu­itions may be wrong. • Thanks for writ­ing this. I would ob­ject to call­ing a de­ci­sion the­ory an “al­gorithm”, though, since it doesn’t ac­tu­ally spec­ify how to make the com­pu­ta­tion, and in prac­tice the im­plied com­pu­ta­tions from most de­ci­sion the­o­ries are com­pletely in­fea­si­ble (for in­stance, the chess de­ci­sion the­ory re­quires a full search of the game tree). Of course, it would be much more satis­fy­ing and use­ful if de­ci­sion the­o­ries ac­tu­ally were al­gorithms, and I would be very in­ter­ested to see any that achieve this or move in that di­rec­tion. One an­swer is that if we feed in what-we-want into an ad­vanced de­ci­sion the­ory, then just as co­op­er­a­tion emerges in the Pri­soner’s Dilemma, many kinds of pat­terns that we take as ba­sic moral rules emerge as the equil­ibrium be­hav­ior. The idea is de­vel­oped more sub­stan­tially in Gary Drescher’s Good and Real, and (be­fore there was a can­di­date for an ad­vanced de­ci­sion the­ory) in Dou­glas Hofs­tadter’s con­cept of su­per­ra­tional­ity. This rea­son­ing strikes me as some­what odd. Even if it turned out that these pat­terns don’t emerge at all, we would still dis­t­in­guish “what-we-want” from “what-is-right”. • This rea­son­ing strikes me as some­what odd. Even if it turned out that these pat­terns don’t emerge at all, we would still dis­t­in­guish “what-we-want” from “what-is-right”. True. The spec­u­la­tion is that what-we-want, when pro­cessed through ad­vanced de­ci­sion the­ory, comes out as a good match for our in­tu­itions on what-is-right, and this would serve as a le­gi­t­i­mate re­duc­tion­is­tic ground­ing of metaethics. If it turned out not to match, we’d have to look for other ways to ground metaethics. • Or per­haps we’d have to stop tak­ing our in­tu­itions on what-is-right at face value. • Or that, yes. • I wish you’d stop say­ing “ad­vanced de­ci­sion the­ory”, as it’s way too in­fan­tile cur­rently to be called “ad­vanced”... • I want a term to dis­t­in­guish the de­ci­sion the­o­ries (TDT, UDT, ADT) that pass the con­di­tions 1-5 above. I’m open to sug­ges­tions. Ac­tu­ally, hang on, I’ll make a quick Dis­cus­sion post. • 17 Mar 2012 14:44 UTC 2 points Does any­one have a de­cent idea of the differ­ences be­tween UDT, TDT and ADT? (Not in terms of the con­cepts they’re based on; in terms of prob­lems to which they give differ­ent an­swers.) • Per­son­ally I don’t see the ex­am­ples given as flaws in Causal De­ci­sion The­ory at all. The flaw is in the prob­lem state­ments not CDT. In the alien pre­dic­tor ex­am­ple, the key ques­tion is “when does the agent set its strat­egy?”. If the agent’s strat­egy is set be­fore the pre­dic­tion is made, then CDT works fine. The agent de­cides in ad­vance to com­mit it­self to open­ing one box, the alien re­al­ises that, the agent fol­lows through with it and gets$1,000,000. Which is ex­actly how hu­mans win that game as well. If on the other hand the agent’s strat­egy is not set un­til af­ter the pre­dic­tion, well I ask you what is the alien ac­tu­ally pre­dict­ing? The alien can­not pre­dic­tion the agent’s choice, be­cause we’ve just said the agent’s strat­egy is not defined yet. How­ever, what the alien can pre­dict is the pro­cess by which the agent’s strat­egy will be set. In that case, there is a meta-agent, which has a strat­egy of “Force the agent to use CDT” (or some­thing like that). In that case, the alien is re­ally pre­dict­ing the meta-agent, and the meta-agent has made a bad de­ci­sion. The re­al­ity in that case is that the meta-agent is the one play­ing the game. There’s no $1,000,000 in the box not be­cause of any­thing the agent did, but be­cause the meta-agent made a poor de­ci­sion to com­mit to cre­at­ing an agent that would open both boxes. By the time the agent starts to op­er­ate there’s already no money in the box and it’s ac­tu­ally do­ing the cor­rect thing to at least get$1,000 any­way.

The con­fu­sion comes from the prob­lem be­ing framed in terms of the sub­game of “one box or two”, when the real game be­ing played is “pick a box-open­ing strat­egy for the alien to pre­dict”, and CDT can solve that one perfectly well.

A similar prob­lem oc­curs in the pris­oner’s dilemna. Again the prob­lem is “when is the agent’s strat­egy set?”, or in this case in par­tic­u­lar “when is the op­po­nent’s strat­egy set?”. If the op­po­nent’s strat­egy is already set be­fore the agent makes its de­ci­sion, then defect­ing is cor­rect. How­ever, in the ex­am­ple you give the agent sup­pos­edly knows whether it’s play­ing against it­self. The im­pli­ca­tion is that by chang­ing its strat­egy it im­plic­itly changes its op­po­nent’s strat­egy as well. If the agent does not know this, then it has sim­ply been mis­led. If it does know this, then CDT is perfectly ca­pa­ble of tel­ling it to co-op­er­ate with it­self to acheive the best pay­out. Again, the metagame is “pick a strat­egy that op­ti­mises the pris­oner’s dilemna where you might be play­ing against your­self, or against other agents who are aware of your strat­egy etc.”. Once again CDT is perfectly ca­pa­ble of han­dling this metagame.

I’ve got noth­ing against the idea of de­ci­sion the­o­ries that are ro­bust to mis­lead­ing prob­lem state­ments, or that are “ex­tended” to in­clude aware­ness of the fact that they’re in a metagame, but the ex­am­ples given here don’t demon­strate any flaws in CDT to me as such. They seem akin to a chess-play­ing agent that isn’t told the op­po­nent has a queen, they’re sim­ply get­ting the wrong an­swers be­cause they’re solv­ing the wrong prob­lems.

I might be miss­ing the point about the differ­ences be­tween these things, but since my un­der­stand­ing of the ter­minol­ogy (I was fa­mil­iar with the con­cepts already) is based on this ar­ti­cle it’s still a prob­lem with the ar­ti­cle. Or my read­ing com­pre­hen­sion skills I sup­pose, but I’ll stick with blam­ing the ar­ti­cle for now.

• In the alien pre­dic­tor ex­am­ple, the key ques­tion is “when does the agent set its strat­egy?”. If the agent’s strat­egy is set be­fore the pre­dic­tion is made, then CDT works fine.

What if the pre­dic­tion was made be­fore the agent was cre­ated?

• I effec­tively already go on to ad­dress that ques­tion in the rest of the para­graph. I see no mean­ingful differ­ence be­tween “strat­egy not set” and “agent not cre­ated”, if there is a differ­ence to you please elb­o­rate.

At risk of re­peat­ing my­self, to an­swer your ques­tion any­way, I ask: How can the alien suc­cess­fully pre­dict the strat­egy of some­thing which has yet to even be cre­ated? If the alien is just guess­ing then open­ing both boxes is clearly cor­rect any­way. Other­wise, the alien must know some­thing about the pro­cess by which the agent is cre­ated. In this case, as I ex­plain in the origi­nal com­ment, there is a meta-agent which is what­ever cre­ates the agent, and that is also what the alien is pre­dict­ing the be­havi­our of. If there’s no money in the box it’s due to a poor meta-agent strat­egy which the agent then has no means to rec­tify. CDT seems to me perfectly ca­pa­ble of gen­er­at­ing the cor­rect strate­gies for both the agent and the meta-agent in this case.

• In this case, as I ex­plain in the origi­nal com­ment, there is a meta-agent which is what­ever cre­ates the agent, and that is also what the alien is pre­dict­ing the be­havi­our of. If there’s no money in the box it’s due to a poor meta-agent strat­egy which the agent then has no means to rec­tify.

There doesn’t have to be any “meta-agent”. Hu­mans evolved from non-agent stuff.

How can the alien suc­cess­fully pre­dict the strat­egy of some­thing which has yet to even be cre­ated?

As­sume that the agent de­ter­minis­ti­cally origi­nates from ini­tial con­di­tions known to the pre­dic­tor, but the ini­tial con­di­tions don’t yet con­sti­tute an agent.

I see no mean­ingful differ­ence be­tween “strat­egy not set” and “agent not cre­ated”, if there is a differ­ence to you please elb­o­rate.

If the agent that even­tu­ally ap­pears, but wasn’t pre­sent at the out­set, fol­lows some­thing like TDT, it wins New­comb’s prob­lem, even though it didn’t have an op­por­tu­nity to set an ini­tial strat­egy (make a pre­com­mit­ment).

This is what I meant in the grand­par­ent com­ment: “When does the agent set its strat­egy?” is not a key ques­tion when there is no agent to set that strat­egy, and yet such situ­a­tion isn’t hope­less, it can be con­trol­led us­ing con­sid­er­a­tions other than pre­com­mit­ment.

• Ok, so firstly I now at least un­der­stand the differ­ence you see be­tween strat­egy-not-set and agent-not-cre­ated—that in only one case was there the clear po­ten­tial for pre-com­mit­ment. I still think it’s beside the point, but that does re­quire some ex­plain­ing.

When I talk about a meta-agent, I don’t mean to im­ply the ex­is­tence of any sort of in­tel­li­gence or sen­tience for it, I sim­ply mean there ex­ists some pro­cess out­side of the agent. The agent can­not gain or lose the $1,000,000 with­out chang­ing that pro­cess, some­thing which it has no con­trol over. Whether this out­side pro­cess is by way of an in­tel­li­gent meta-agent that should have known bet­ter, the blind whims of chance, or the harsh­ness of a de­ter­minis­tic re­al­ity is beside the point. Whether agents which choose one box do bet­ter than agents which choose both is a differ­ent ques­tion from whether it is cor­rect to choose both boxes or not. When you switch a win­ning agent for a los­ing one, you si­mul­ta­neously switch the situ­a­tion that agent is pre­sented with from a win­ning situ­a­tion to a los­ing one. It makes me think of the fol­low­ing para­dox: Imag­ine that at some point in your life, God (or an alien or what­ever) looks at whether you have been a perfect ra­tio­nal­ist, and if so punches you in the nose. And as­sume of course that you are well aware of this. “Easy!” you think, “Just make one minor ir­ra­tional de­ci­sion and your nose will be fine”. But, that would of course be the perfectly ra­tio­nal thing to do, and so you still get punched in the nose. You get the idea. Should I now go on to say that ra­tio­nal­ism doesn’t always win, and we there­fore need to con­struct a new ap­proach which does? (hint: of course not, but try and see the par­allels here.) In any case, rather than con­tinue to ar­gue over what is a fairly con­tro­ver­tial para­dox even for hu­mans let alone de­ci­sion the­o­ries, let me take an­other tack here. If you are a firm be­liever in the pre­dic­tive power of the alien then the prob­lem is en­tirely equiv­a­lent to: • Choice A: Get$1,000

• Choice B: Get $1,000,000 If CDT is pre­sented with this prob­lem, it would surely choose$1,000,000. The only way I see it wouldn’t is if CDT is defined as some­thing like “Make the cor­rect de­ci­sion, ex­cept for be­ing de­liber­ately ob­tuse in in­sist­ing that causal­ity is strictly tem­po­ral”, and then lo and be­hold it loses in para­doxes re­lat­ing to non-tem­po­ral causal­ity. If that’s what CDT effec­tively means, then fine, it loses. But to me, we don’t need a sub­stan­tially differ­ent de­ci­sion the­ory to re­solve this para­dox, we need to ap­ply sub­stan­tially the same de­ci­sion the­ory to a differ­ent prob­lem. To me, an­swer­ing the ques­tion “what are the out­comes of my de­ci­sions” is part of defin­ing the prob­lem, not part of de­ci­sion the­ory.

So the ex­am­ples in the ar­ti­cle still don’t mo­ti­vate me to see a need for sub­stan­tially new de­ci­sion the­o­ries, just more clearly defined prob­lems. If the other de­ci­sion the­o­ries are all about say­ing “I’m solv­ing the wrong prob­lem” then that’s fine, and I can imag­ine po­ten­tially use­ful, but based on the ex­am­ples given at least it still seems like the back­wards way of go­ing about things.

• When I talk about a meta-agent, I don’t mean to im­ply the ex­is­tence of any sort of in­tel­li­gence or sen­tience for it, I sim­ply mean there ex­ists some pro­cess out­side of the agent. The agent can­not gain or lose the $1,000,000 with­out chang­ing that pro­cess, some­thing which it has no con­trol over. An agent may have no con­trol over what its source code is, but it does have con­trol over what that source code does. • You can’t have it both ways. Either the agent’s be­havi­our is de­ter­minis­tic, or the alien can­not re­li­ably pre­dict it. If it is de­ter­minis­tic, what the source code is de­ter­mines what the source code does, so it is con­tra­dic­tory to claim the agent can change one but not the other (if by “con­trol” you mean “is re­spon­si­ble for” then that’s a differ­ent is­sue). If it is not de­ter­minis­tic, then aside from any­thing else the whole para­dox falls apart. • Nope. See also the free will se­quence. The de­ci­sion is de­ter­minis­tic. The agent is the part of the de­ter­minis­tic struc­ture that de­ter­mines it, that con­trols what it ac­tu­ally is, the agent is the source code. The agent can’t change nei­ther its source code, nor its de­ci­sion, but it does de­ter­mine its de­ci­sion, it con­trols what the de­ci­sion ac­tu­ally is with­out of course chang­ing what it ac­tu­ally is, be­cause it can be noth­ing else than what the agent de­cides. • with some weird caveats: for ex­am­ple, if X started out as CDT, its mod­ifi­ca­tion will only care about other agents’ de­ci­sions made af­ter X self-modified I’m guess­ing what mat­ters is not so much time as the causal de­pen­dence of those de­ci­sions made by other agents on the phys­i­cal event of the de­ci­sion of X to self-mod­ify. So the im­proved X still won’t care about its in­fluence on fu­ture de­ci­sions made by other agents for rea­sons other than X hav­ing self-mod­ified. For ex­am­ple, take the (fu­ture) de­ci­sions of other agents that are space-like sep­a­rated from X’s self-mod­ifi­ca­tion. Even more strangely, I’m guess­ing agents that took a snap­shot of X just be­fore self-mod­ifi­ca­tion, able to pre­dict its fu­ture ac­tions, will be treated by X differ­ently de­pend­ing on whether they re­spond to ob­ser­va­tions of the phys­i­cal events caused by the be­hav­ior of im­proved X, or to the (iden­ti­cal) in­fer­ences made based on the ear­lier snap­shot. • Cor­rect, of course. I sac­ri­ficed a lit­tle ac­cu­racy for the sake of be­ing eas­ier for a novice to read; is there a sen­tence that would op­ti­mize both? • Do you un­der­stand this effect well enough to rule its state­ment “ob­vi­ously cor­rect”? I’m not that sure it’s true, it’s some­thing built out of an in­tu­itive model of “what CDT cares about”, not tech­ni­cal un­der­stand­ing, so I would be in­ter­ested in an ex­pla­na­tion that is eas­ier for me to read… (See an­other ex­am­ple in the up­dated ver­sion of grand­par­ent.) • Right, we don’t re­ally un­der­stand it yet, but on an in­for­mal level it ap­pears valid. I think it’s worth men­tion­ing at a ba­sic level, though it de­serves a ful­ler dis­cus­sion. Good ex­am­ple. • Ex­cel­lent. And I love your aes­thetic in­tol­er­ance dilemma. That’s a great spin on it; has it ap­peared any­where be­fore? • Not that I’ve seen- it just oc­curred to me to­day. Thanks! • (I didn’t like it: it’s too ab­stract and un­spe­cific for an ex­am­ple.) • We can use this aes­thetic in­tol­er­ance to con­struct a more gen­uine Pri­soner’s Dilemma with­out invit­ing aliens or any­thing like that. Say X is a writer and Y is an illus­tra­tor, and they have very differ­ent prefer­ences for how a cer­tain scene should come across, so they’ve worked out a com­pro­mise. As it hap­pens, I do know of a real-world case of this kind of prob­lem, where the par­ties in­volved chose… defec­tion. From an in­ter­view with former anime stu­dio Gainax pres­i­dent Toshio Okada: Okada: NADIA was true chaos, good chaos and bad chaos! [LAUGHS] On NADIA, Anno didn’t di­rect the mid­dle epi­sodes, Shinji Higuchi did. And some epi­sodes were di­rected in Korea—why, no one knows ex­actly. [LAUGHS] That’s real chaos, not good! What I mean to say is, con­trol­led chaos—that’s good. Con­trol­led chaos is where you’ve got all the staff in the same room, look­ing at each other. But on NADIA you had Higuchi say­ing, “Oh, I’ll sur­prise Anno”, hide, and change the screen­play! Screen­plays and sto­ry­boards got changed when peo­ple went home, and the next morn­ing, if no one could find the origi­nal, I au­tho­rized them to go ahead with the changes. No one can be a real di­rec­tor or a real scriptwriter in such a chaos situ­a­tion. But on GUNBUSTER, that chaos was con­trol­led, be­cause we were all friends, and all work­ing in the same place. But on NADIA, half our staff was Korean, liv­ing over­seas. We never met them. No con­trol. This may have been re­spon­si­ble for the much-ex­e­crated ‘desert is­land epi­sodes’ arc in Na­dia. • I can’t be alone in think­ing this; where does one ac­quire the prob­a­bil­ity val­ues to make any of this the­ory use­ful? Past data? Can it be used on an in­di­vi­d­ual or only or­ga­ni­za­tional level? • Many peo­ple on Less Wrong see Judea Pearl’s work as the right ap­proach to for­mal­iz­ing causal­ity and get­ting the right con­di­tional prob­a­bil­ities. But if it was sim­ple to for­mally spec­ify the right causal model of the world given sen­sory in­for­ma­tion, we’d prob­a­bly have AGI already. • Sit­ting and figur­ing out how ex­actly causal­ity works, is the kind of thing we want the AGI to be able to do on it’s own. We don’t seem to be born with any ad­vanced ex­pec­ta­tions of the world, such as no­tion of causal­ity; we even learn to see; the causal­ity took a while to in­vent, and great many peo­ple have su­per­sti­tions which are poorly com­pat­i­ble with causal­ity; yet even my cats seem to, in their lives, have learnt some un­der­stand­ing of causal­ity. • We seem to start with very lit­tle. edit: i mean, I’d say we definitely in­vent causal­ity. It’s im­plau­si­ble we evolved im­ple­men­ta­tion of the equiv­a­lent of Pearl’s work, and even if we did, that is once again a case of in­tel­li­gent-ish pro­cess (evolu­tion) figur­ing it out. • I’d say that our in­tu­itive grasp of causal­ity re­lates to the­o­ret­i­cal work like Pearl’s in the same way that our sub­con­scious heuris­tics of vi­sion re­late to the the­ory of com­puter vi­sion. That is to say, we have evolved heuris­tics that we’re not usu­ally con­scious of, and which func­tion very well and cheaply on the cir­cum­stances our an­ces­tors fre­quently en­coun­tered, but we cur­rently don’t un­der­stand the de­tails well enough to pro­gram some­thing of equiv­a­lent power. • Ex­cept a lot of our vi­sion heuris­tics seem not to be pro­duced by evolu­tion, but by the pro­cess that hap­pens in first years of life. Ditto goes for vir­tu­ally all as­pects of brain or­ga­ni­za­tion, given the ra­tio be­tween brain com­plex­ity and DNA’s com­plex­ity. • Steven Pinker cov­ers this topic well. I highly recom­mend How The Mind Works; The Blank Slate may be more rele­vant, but I haven’t read it yet. Essen­tially, the hu­man brain isn’t a gen­eral-pur­pose learner, but one with strong heuris­tics (like nat­u­ral gram­mar and all kinds of par­tic­u­lar vi­sual pat­tern-seek­ing) that are meta enough to en­com­pass a wide va­ri­ety of hu­man lan­guages and vi­sual sur­round­ings. The de­vel­op­ment of the hu­man brain re­sponds to the en­vi­ron­ment not be­cause it’s a ghost of perfect empti­ness but be­cause it has a very par­tic­u­lar ar­chi­tec­ture already, which is adapted to the range of pos­si­ble ex­pe­rience. The vi­sual cor­tex has spe­cial struc­ture be­fore the eyes ever open. • Hon­estly, the guys who never wrote any vi­sion al­gorithm should just go and stop try­ing to think about the sub­ject too hard, they aren’t get­ting any­where sane with the in­tu­itions that are many or­ders of mag­ni­tude off. That goes for much of the cog­ni­tion evolu­tion out there. We know the evolu­tion can pro­duce ‘com­plex’ stuff, we got that ham­mer, we use it on ev­ery nail, even when the nail is in fact a gi­ant screw which is way big­ger than the ham­mer it­self, which ain’t go­ing to even move if hit with the ham­mer—firstly, it is big and sec­ondly it needs en­tirely differ­ent mo­tion—yet one who can’t see size of the screw can in­sist it would. Some ad­just­ments for the near vs far con­nec­tivity, per­haps even some­what spe­cific lay­er­ing, but that’s all the evolu­tion is go­ing to give you in the rele­vant timeframe for brains big­ger than peanuts. That’s just the way things are, folks. Peo­ple with early brain dam­age have other parts of brain take over, and perform nearly as well, sug­gest­ing that the net­works pro­cess­ing the vi­sual in­put have only minor op­ti­miza­tions for vi­sual task com­pared to the rest, i.e. pre­cisely the near-vs-far con­nec­tivity tweaks, more/​less neu­rons per cor­ti­cal column, that kind of stuff which makes it some­what more effi­cient at the task. edit: here’s the taster: mam­mals never evolved ex­tra pair of limbs, ex­tra eyes, or any­thing of that sort. But in the world of the evolu­tion­ary psy­chol­o­gists and evolu­tion­ary ‘cog­ni­tive sci­en­tists’, mam­mals should evolve an ex­tra eye or ex­tra pair of limbs, along with much more com­plex multi-step adap­ta­tions, ev­ery cou­ple mil­lions years. This is out­right ridicu­lous. Just look at your legs, look how much slower is the fastest hu­man run­ner than any com­pa­rable an­i­mal. You’ll need to be run­ning for an­other ten mil­lions years be­fore you are any good at run­ning. And in that time, in which mon­key’s body can barely adapt to lo­co­mo­tion on a flat sur­face again, mon­key’s brain evolves some com­plex al­gorithms for gram­mar of the hu­man lan­guages? The hunter-gath­erer spe­cific func­tion­al­ity? You got to be kid­ding me. All of that evolu­tion­ary ex­plain­ing of com­plex phe­nom­ena via vague hand­wave and talk of how benefi­cial it would’ve been in an­ces­tral en­vi­ron­ment, will be re­garded as ut­ter and com­plete pseu­do­science within 20..50 years. It’s not enough to show that some­thing was benefi­cial. The ex­tra eye on the back, too, would have been benefi­cial for great many an­i­mals. They are given free pass to use evolu­tion as magic be­cause brains don’t fos­silize. The things that fos­silize, how­ever, provide good sam­ple of how many gen­er­a­tions it takes for evolu­tion to make some­thing. • We’ve drifted way off topic. Brain plas­tic­ity is a good point, but it’s not the only piece of ev­i­dence available. I promise you that if you check out How the Mind Works, you’ll find un­am­bigu­ous ev­i­dence that the hu­man brain is not a gen­eral-pur­pose learner, but be­gins with plenty of struc­ture. If you doubt the ex­is­tence of uni­ver­sal gram­mar, you should try The Lan­guage In­stinct as well. You can have the last word, but I’m tap­ping out on this par­tic­u­lar topic. • If you doubt the ex­is­tence of uni­ver­sal gram­mar, you should try The Lan­guage In­stinct as well. While some lin­guis­tic uni­ver­sals definitely ex­ist, and a suffi­ciently weak ver­sion of the lan­guage ac­qui­si­tion de­vice idea is pretty much ob­vi­ous (‘the hu­man brain has the abil­ity to learn hu­man lan­guage’), I think Chom­sky’s ideas are way too strong. See e.g. Scholz, Bar­bara C. and Ge­offrey K. Pul­lum (2006) Ir­ra­tional na­tivist exuberance • re: struc­ture, yes, it is made of cor­ti­cal columns, and yes there’s some global wiring, no­body’s been doubt­ing that. I cre­ated a new topic for that . The is­sue with at­tribut­ing brain func­tion­al­ity to evolu­tion is the im­mense difficulty of cod­ing any spe­cific wiring in the DNA, es­pe­cially in mam­mals. In­sects can do it—go­ing through sev­eral gen­er­a­tions in a year, and hav­ing the genome that con­trols the brain down to in­di­vi­d­ual neu­rons. Mam­mals aren’t struc­tured like this, and live much too long. • Ex­cept a lot of our vi­sion heuris­tics seem not to be pro­duced by evolu­tion, but by the pro­cess that hap­pens in first years of life. It’s hard to draw a clear line be­tween the two. Cer­tainly much of what evolu­tion evolved in this area is the abil­ity to re­li­ably de­velop suit­able heuris­tics given ex­pected stim­u­lus so I more or less agree with you here. If we de­velop with­out sight this area of the brain gets used for en­tirely differ­ent pur­poses. Ditto goes for vir­tu­ally all as­pects of brain or­ga­ni­za­tion, given the ra­tio be­tween brain com­plex­ity and DNA’s com­plex­ity. Here I must dis­agree. The high level or­ga­ni­za­tion we are stuck with and even at some­what lower lev­els the other ar­eas of the brain aren’t quite so ver­sa­tile as the cor­tex that han­dles the (high level as­pects of) vi­sion. • I’m not ex­pect­ing the spe­cial­iza­tion to be go­ing be­yond ad­just­ing e.g. num­ber of lo­cal vs long range con­nec­tions, ba­sic mo­tion de­tec­tion back from cou­ple hun­dred mil­lions years of evolu­tion as a fish etc. Peo­ple are ex­pect­ing some mir­a­cles from evolu­tion, along the lines of hard-cod­ing spe­cific highly non­triv­ial al­gorithms. Mostly due to very screwy in­tu­itions re­gard­ing how com­plex any­thing re­sem­bling an al­gorithm is (com­pared to rest of the body). Yes, it is the case that other parts of brain are worse at perform­ing this func­tion, no, it isn’t be­cause there’s some ac­tual al­gorithms hard-wired there. In so much as other parts of brain are able to perform even re­motely close in func­tion, sight is learn-able. One par­tic­u­lar thing about the from-scratch AI crowd is that it doesn’t like to give much credit to brain when its due. • Mostly due to very screwy in­tu­itions re­gard­ing how com­plex any­thing re­sem­bling an al­gorithm is (com­pared to rest of the body). Most al­gorithms are far, far less com­pli­cated than say, the be­hav­iors that con­sti­tute the im­mune sys­tem. From what I can tell there are rather a lot of be­hav­ioral al­gorithms that are hard wired at the lower lev­els, par­tic­u­lar when it comes to emo­tional re­sponses and de­sires. The brain then learns to spe­cial­ize them to the cir­cum­stance it finds it­self in. More par­tic­u­larly, any set of (com­mon) be­hav­iors that give long term benefits rather than solv­ing im­me­di­ate prob­lems is al­most cer­tainly hard wired. The brain doesn’t even know what is be­ing op­ti­mized much less how this par­tic­u­lar al­gorithm is sup­posed to help! • Im­mune sys­tem is way old. Why is it just the com­plex al­gorithms we don’t quite un­der­stand, that we think evolve quickly in mam­mals, but not the ob­vi­ous things like reti­nal pig­ments, num­ber of eyes, num­ber of limbs, etc? Why we ‘evolve sup­port for lan­guage’ in the time dur­ing which we barely adapt our legs to walk­ing on flat sur­face again? The emo­tional re­sponses and de­sires are, to some ex­tent, evolved, but the com­plex mechanisms of calcu­la­tion which ob­jects to de­sire have to be cre­ated from scratch. The brain does 100 steps per sec­ond, 8 640 000 steps per day, 3 153 600 000 steps per year. The evolu­tion does 1 step per gen­er­a­tion. There are very tight bounds on what func­tion­al­ity could evolve in a given timeframe. And there is a lot that can be gen­er­ated in very short time by the brain. • Im­mune sys­tem is way old. Yes. Very old, in­cred­ibly pow­er­ful and amaz­ingly com­plex. The com­plex­ity of that fea­ture of hu­mans (and di­verse rel­a­tives) makes the ap­peal “re­gard­ing how com­plex any­thing re­sem­bling an al­gorithm is (com­pared to rest of the body)” in­cred­ibly weak. Most al­gorithms used by the brain, learned or oth­er­wise, are sim­pler than what the rest of the body does. • The im­mune sys­tem doesn’t have some image recog­ni­tion algo that looks at pro­jec­tion of pro­teins and rec­og­nize their shapes. It uses molec­u­lar bind­ing. And it evolved over many billions gen­er­a­tions, in much shorter liv­ing an­i­mals, re-us­ing, to huge ex­tent, the fea­tures that evolved back in sin­gle cel­led or­ganisms. And as far as al­gorithms go, it con­sists just of a huge num­ber of if clauses, chained very few lev­els deep. The 3D ob­ject recog­ni­tion from 2D images from the eyes, for com­par­i­son, is an in­cred­ibly difficult task. edit: on topic of im­mune ‘al­gorithm’ that makes you ac­quire im­mu­nity to for­eign, but not your, chem­i­cals: http://​​en.wikipe­dia.org/​​wiki/​​So­matic_hypermutation Ran­domly edit the pro­teins so that they stick. The ran­dom edit­ing hap­pens by util­is­ing some­what bro­ken repli­ca­tion ma­chin­ery. Some of your body evolves im­mune re­sponse when you catch flu or cold. The prod­ucts of that evolu­tion are not even passed down, that’s how amaz­ingly com­plex and well evolved it is (not). • The im­mune sys­tem doesn’t have some image recog­ni­tion algo that looks at pro­jec­tion of pro­teins and rec­og­nize their shapes. It uses molec­u­lar bind­ing. And it evolved over many billions gen­er­a­tions, in much shorter liv­ing an­i­mals, re-us­ing, to huge ex­tent, the fea­tures that evolved back in sin­gle cel­led or­ganisms. And as far as al­gorithms go, it con­sists just of a huge num­ber of if clauses, chained very few lev­els deep. This matches my un­der­stand­ing. The 3D ob­ject recog­ni­tion from 2D images from the eyes, for com­par­i­son, is an in­cred­ibly difficult task. And here I no longer agree, at least when it comes to the as­sump­tion that the afore­men­tioned task is not in­cred­ibly difficult. • I added in an edit a refer­ence as to how im­mune sys­tem, ba­si­cally, op­er­ates. You have pop­u­la­tion of b-cells, which evolves for elimi­na­tion of for­eign sub­stances. Good ol evolu­tion re-used to evolve a part of the b-cell genome, in­side your body. The re­sults seem very im­pres­sive—recog­ni­tion of sub­stances—but all the heavy lift­ing is done us­ing very sim­ple and very stupid meth­ods. If any­thing, our prone­ness to sea­sonal cold and flu is a great demon­stra­tion of the ex­treme stu­pidity of the im­mune sys­tem. The viruses only need to mod­ify some en­tirely non-func­tional pro­teins to have to be rec­og­nized afresh. That’s be­cause there is no pat­tern recog­ni­tion go­ing on what so ever, only in­cred­ibly stupid pro­cess of evolu­tion of b-cells. • If I was try­ing to claim that im­mune sys­tems were com­plex in a way that is similar in na­ture to learned cor­ti­cal al­gorithms then I would be thor­oughly dis­suaded by now. • The im­mune sys­tem is ac­tu­ally a rather good ex­am­ple of what sort of mechanisms you can ex­pect to evolve over many billions gen­er­a­tions, and in which way they can be called ‘com­plex’. My origi­nal point was that much of evolu­tion­ary cog­ni­tive sci­ence is ex­plain­ing way more com­plex mechanisms (with a lot of hid­den com­plex­ity. For very out­ra­geous ex­am­ple con­sider prefer­ence for spe­cific de­tails of mate body shape, which is a task with im­mense hid­den com­plex­ity) as evolv­ing in thou­sandth the gen­er­a­tions count of the im­mune sys­tem. In­stead of be­ing gen­er­ated in some way by op­er­a­tion of the brain, in the con­text whereby other brain ar­eas are only marginally less effec­tive at the tasks—sug­gest­ing not the hard­wiring of al­gorithms of any kind but minor tweaks to the prop­er­ties of the net­work which slightly im­prove the net­work’s effi­ciency af­ter the net­work learns the spe­cific task. • We prob­a­bly don’t dis­agree too much on the core is­sue here by the way. Com­pared to an ar­bi­trary refer­ence class that is some­what mean­ingful I tend to be far more likely to ac­cept­ing of the ‘blank slate’ ca­pa­bil­ities of the brain. The way it just learns how to build mod­els of re­al­ity from vi­sual in­put is amaz­ing. It’s par­tic­u­larly fas­ci­nat­ing to see ar­eas in the brain that are con­sis­tent across (nearly) all peo­ple that turn out not to be hard­wired af­ter all. Ex­cept in as much as they hap­pen to be always con­nected to the same stuff and usu­ally de­velop in the same way! • Im­mune sys­tem is way old. So are eyes. • I liked this post. it would be good if you put it some­where in the se­quences so that peo­ple new to LessWrong can find this ba­sic in­tro ear­lier. • At the very least it should be the ba­sis for an eas­ily ac­cessible wiki page. • The de­ci­sion the­o­ries need some­what spe­cific mod­els of the world to op­er­ate cor­rectly. In The Smok­ing Le­sion, for ex­am­ple, the le­sion has to some­how lead to you smok­ing. E.g. the le­sion could make you fol­low CDT while ab­sence of the le­sion makes you fol­low EDT. It’s definitely worse to have CDT if it comes at the ex­pense of hav­ing the le­sion. The is­sue here is se­lec­tion. If you find you opt to smoke, your prior for hav­ing le­sion goes up, of course; and so you need to be more con­cerned about the can­cer—if you can’t check for the le­sion you have to per­haps do chest x-rays more of­ten, which cost money. So there’s that nega­tive con­se­quence of de­cid­ing to smoke, ex­cept the de­ci­sion the­ory you use needs not be con­cerned with this par­tic­u­lar con­se­quence when de­cid­ing to smoke, be­cause the de­ci­sion is it­self a con­se­quence of the le­sion in the cases whereby the le­sion is pre­dic­tive of smok­ing, and only isn’t a con­se­quence of the le­sion in the cases where le­sion is not pre­dic­tive. • I think the as­sump­tion is that your de­ci­sion the­ory is fixed, and the le­sion has an in­fluence on your util­ity func­tion via how much you want to smoke (though in a noisy way, so you can’t use it to con­clude with cer­tainty whether you have the le­sion or not). • That also works. What would EDT do if it has ev­i­dence (pos­si­bly ob­tained from the­ory about the physics, de­rived from em­piri­cal ev­i­dence in sup­port of causal­ity) that it is (or must be) the de­sire to smoke that is cor­re­lated with the can­cer? Shouldn’t it ‘can­cel out’ the im­pact of cor­re­la­tion of the de­ci­sion with the can­cer, on the de­ci­sion? It seems to me that good de­ci­sion the­o­ries can dis­agree on the de­ci­sions made with im­perfect data and in­com­plete model. The ev­i­dence based de­ci­sion the­ory should be able to pro­cess the ev­i­dence for the ob­served phe­nomenon of ‘causal­ity’, and pro­cess it all the way to the no­tion that de­ci­sion won’t af­fect can­cer. At same time if an agent can not ob­serve ev­i­dence for causal­ity and rea­son about it cor­rectly, that agent is se­ri­ously crip­pled in many ways—would it even be able to figure out e.g. new­to­nian physics from ob­ser­va­tion, if it can’t figure out causal­ity? The CDT looks like a hack where you hard-code causal­ity into an agent, which you (mankind) figured out from ob­ser­va­tion and ev­i­dence (and it took a while to figure it out and figure out how to ap­ply it). edit: This seem to go for some of the ad­vanced de­ci­sion the­o­ries too. You shouldn’t be work­ing so hard in­vent­ing the world-spe­cific stuff to hard-code into an agent. The agent should figure it out from prop­er­ties of the real world and per­haps con­sid­er­a­tions for hy­po­thet­i­cal ex­am­ples. • I agree with you. I don’t think that EDT is wrong on the Smok­ing Le­sion. Sup­pose that, in the world of this prob­lem, you see some­one else de­cide to smoke. What do you con­clude from that? Your pos­te­rior prob­a­bil­ity of that per­son hav­ing the le­sion goes up over the prior. Now what if that per­son is you? I think the same logic should ap­ply. • That part is cor­rect, but opt­ing not to smoke for the pur­pose of avoid­ing this in­crease in prob­a­bil­ity s an er­ror. An er­ror that an ev­i­dence based de­ci­sion the­ory needs not make if it can pro­cess the ev­i­dence that causal­ity works and that it is ac­tu­ally the pre-ex­ist­ing le­sion that causes smok­ing, and con­trol for the pre-ex­ist­ing le­sion when com­par­ing the out­comes of ac­tions. (And if the agent is ig­no­rant of the way world works—then we shouldn’t bench­mark it against an agent into which we coded the way our world works) • That part is cor­rect, but opt­ing not to smoke for the pur­pose of avoid­ing this in­crease in prob­a­bil­ity s an er­ror. I still don’t see how it is. If the agent has no other in­for­ma­tion, all he knows is that if he de­cides to smoke it is more likely that he has the le­sion. His de­ci­sion it­self doesn’t in­fluence whether he has the le­sion, of course. But he de­sires to not have the le­sion, and there­fore should de­sire to de­cide not to smoke. The way the le­sion in­fluences de­cid­ing to smoke will be through the util­ity func­tion or the de­ci­sion the­ory. With no other in­for­ma­tion, the agent can’t trust that his de­ci­sion will out­smart the le­sion. • Ahh, I guess we are talk­ing about same thing. My point is that given more in­for­ma­tion—and mak­ing more con­clu­sions—EDT should smoke. The CDT gets around re­quire­ment for more in­for­ma­tion by cheat­ing—we wrote some of that in­for­ma­tion im­plic­itly into CDT—we thought CDT is a good idea be­cause we know our world is causal. When­ever EDT can rea­son that CDT will work bet­ter—based on ev­i­dence in sup­port of causal­ity, the model of how le­sions work, et cetera—the EDT will act like CDT. And when­ever CDT rea­sons that EDT will work bet­ter—the CDT self mod­ifies to be EDT, ex­cept that CDT can’t do it on spot and has to do it in ad­vance. The ad­vanced de­ci­sion the­o­ries try to ‘hard­code’ more of our con­clu­sions about the world into the de­ci­sion the­ory. This is very silly. If you test hu­mans, I think it is pretty clear that hu­mans work like EDT + ev­i­dence for causal­ity. Take away ev­i­dence for causal­ity, and peo­ple can be­lieve that de­cid­ing to smoke retroac­tively in­tro­duces the le­sion. edit: ahh, wait, the EDT is some pretty naive the­ory that can not even pro­cess any­thing as com­pli­cated as ev­i­dence for causal­ity work­ing in our uni­verse. What­ever then, a thoughtless ap­proach leads to thoughtless re­sults, end of story. The cor­rect de­ci­sion the­ory should be able to con­trol for pre-ex­ist­ing le­sion when it makes sense to do so. • edit: ahh, wait, the EDT is some pretty naive the­ory that can not even pro­cess any­thing as com­pli­cated as ev­i­dence for causal­ity work­ing in our uni­verse. Can you ex­plain this? EDT is de­scribed as$V(A) = \sum_{j} P(O_j | A) U(O_j)$. If you have knowl­edge about the mechanisms be­hind the how the le­sion causes smok­ing, that would change$P(A | O_j)$and there­fore also$P(O_j | A)$. • I don’t see how knowl­edge how the le­sion works would af­fect the prob­a­bil­ities when you don’t know if you have le­sion and the prob­a­bil­ity of hav­ing le­sion. • Also: when you don’t know if you have le­sion and the prob­a­bil­ity of hav­ing le­sion. You would still have pri­ors for all of these things. • Even if you do, how is know­ing that the le­sion causes can­cer go­ing to change any­thing about P(smokes|gets can­cer) ? The is­sue is that you need to do two equa­tions, one for case when you do have le­sion, and other for when you don’t have le­sion. The EDT just con­fuses those to­gether. • The le­sion could work in (at least) two ways: 1. it makes you more likely to use a de­ci­sion the­ory that leads you to de­cide to smoke 2. it only makes ir­ra­tional peo­ple more likely to smoke. 3. it changes peo­ple’s util­ity of smok­ing. In case 1, you should fol­low EDT, and use a de­ci­sion the­ory that will make you not de­cide to smoke. In case 2, you know that the le­sion doesn’t ap­ply to you, so go ahead and smoke. In case 3, con­di­tioned on your util­ity func­tion (which you know), the prob­a­bil­ity of the le­sion no longer de­pends on your de­ci­sion. So, you can smoke. • edit: ahh, wait, the EDT is some pretty naive the­ory that can not even pro­cess any­thing as com­pli­cated as ev­i­dence for causal­ity work­ing in our uni­verse. What­ever then, a thoughtless ap­proach leads to thoughtless re­sults, end of story. The cor­rect de­ci­sion the­ory should be able to con­trol for pre-ex­ist­ing le­sion when it makes sense to do so. I think you’ve got it. Pure EDT and CDT re­ally just are that stupid—and ir­re­deemably so be­cause agents im­ple­ment­ing them will not want to learn how to re­place their de­ci­sion strat­egy (be­yond re­solv­ing them­selves to their re­spec­tive pre­de­ter­mined sta­ble out­comes). Usu­ally when peo­ple think ei­ther of them are a good idea it is be­cause they have been in­ci­den­tally sup­ple­ment­ing and sub­vert­ing them with a whole lot of their own com­mon sense! • Usu­ally when peo­ple think ei­ther of them are a good idea it is be­cause they have been in­ci­den­tally sup­ple­ment­ing and sub­vert­ing them with a whole lot of their own com­mon sense! As a per­son who (right now) thinks that EDT is a good idea, could you help en­lighten me? Wikipe­dia states that un­der EDT the ac­tion with the max­i­mum value is cho­sen, where value is de­ter­mined as V(A) =sum{out­comes O} P(O|A) U(O). The agent can put in knowl­edge about how the uni­verse works into P(O|A), right? Now the smok­ing le­sion prob­lem. It can be for­mally writ­ten as some­thing like this, U(smok­ing) = 1 U(can­cer) = −100000 P(can­cer | le­sion) > P(can­cer | !le­sion) P(smok­ing | le­sion) > P(smok­ing | !le­sion) P(can­cer | le­sion&smok­ing) = P(can­cer | le­sion&!smok­ing) = P(can­cer | le­sion) P(can­cer | !le­sion&smok­ing) = P(can­cer | !le­sion&!smok­ing) = P(can­cer | !le­sion)  I think the tricky part is P(smok­ing | le­sion) > P(smok­ing | !le­sion), be­cause this puts a prob­a­bil­ity on some­thing that the agent gets to de­cide. Since prob­a­bil­ities are about un­cer­tainty, and the agent would be cer­tain about its ac­tions, this makes no sense. Is that the main prob­lem with EDT? Ac­tu­ally the known fact is more like P(X smok­ing | X le­sion), the prob­a­bil­ity of any agent with a le­sion de­cid­ing to smoke. From this the agent will have to de­rive P(me smok­ing | me le­sion). If the agent is an avarage hu­man be­ing, then they would be equal. But if the agent is spe­cial be­cause he uses some spe­cific de­ci­sion the­ory or util­ity func­tion, he should only look at a smaller refer­ence class. I think in this way you get quite close to TDT/​UDT. • I pro­pose a non­stupid de­ci­sion the­ory then. In the smok­ing le­sion, I do two wor­lds: in one I have le­sion, in other I don’t, weighted with p and 1-p . That’s just how i pro­cess un­cer­tain­ties . Then I ap­ply my pre­dic­tions to both wor­lds, given my ac­tion, and I ob­tain the re­sults which I weight by p and 1-p (i never seen the pos­si­ble wor­lds in­ter­act) . Then I can de­cide on ac­tion as­sum­ing 0<p<1 . I don’t even need to know the p, and up­dates to my es­ti­mate of p that re­sult from my ac­tions don’t change the de­ci­sions. In the new­comb’s prob­lem, i’m in­clined to do ex­act same thing: let p is prob­a­bil­ity that the one-box was pre­dicted, then onebox < twobox by 1000000 p + 0 (1-p) < 1001000 p + 1000 (1-p) . And I am to­tally go­ing to do this if I am be­ing pre­dicted based on psy­chol­ogy test I took back in el­e­men­tary school, or based on ge­net­ics. But I get told that the 1001000 p and 0 (1-p) never hap­pens, i.e. I get told that the equa­tion is wrong, and if i as­sign high enough con­fi­dence to that, higher than to my equa­tion, I can strike out the 1001000 p and 0 (1-p) from the equa­tion (and get some non­sense which i fix by re­mov­ing the prob­a­bil­ities al­to­gether), de­cid­ing to one-box as the best effort i can do when i’m told that my equa­tion won’t work, and I don’t quite know why. (The world model of mine be­ing what it is, I’ll also have to come up with some ex­pla­na­tions for how the pre­dic­tor works be­fore i as­sign high enough prob­a­bil­ity to the pre­dic­tor work­ing cor­rectly for me. E.g. I could say that the pre­dic­tor is pre­dict­ing us­ing a quan­tum coin flip and then cut­ting off branches in MWI where it was wrong, or I could say, the pre­dic­tor is work­ing via mind simu­la­tion, or even that my ac­tions some­how go into the past. ) Of course it is bloody hard to for­mal­ize an agent that got a world model of some kind, and which can cor­rect it’s equa­tions if it is con­vinced with good enough ev­i­dence that the equa­tion is some­how wrong (which is pretty much the premise of New­comb’s para­dox). • I spent the first sev­eral sec­onds try­ing to figure out the tree di­a­gram at the top. What does it rep­re­sent? • I was won­der­ing when some­one would ask that! It comes from the Wikipe­dia ar­ti­cle on alpha-beta prun­ing in game the­ory, which is about strate­giz­ing (in a CDT fash­ion) in a game where your move and your op­po­nent’s move al­ter­nate. So it’s not di­rectly con­nected to the post, but I felt it im­parts the right in­tu­itive fla­vor to the no­tion of a de­ci­sion the­ory. • Cool, thank you. It does im­part the right fla­vor. I re­ally like what Luke has started with putting pic­tures in main posts. Some­thing about it makes it eas­ier to read. • It was clear to me that the tree rep­re­sents a two-player zero sum game with al­ter­na­tive moves, where Square is try­ing to max­i­mize some quan­tity and Cir­cle min­i­mize it. It wasn’t clear from the pic­ture what al­gorithm caused the “prun­ing” though. • A de­ci­sion tree (the en­tirety of my game the­ory ex­pe­rience has been a few on­line videos, so I likely have the ter­minol­ogy wrong), with de­ci­sion 1 at the top and the end out­comes at the bot­tom. The sec­tions marked ‘max’ have the de­cider try­ing to pick the high­est-value end out­come, and the sec­tions marked ‘min’ have the de­cider try­ing to pick the low­est-value end out­come. The num­bers in ev­ery line ex­cept the bot­tom prop­a­gate up de­pend­ing on which solu­tion will be picked by who­ever is cur­rently do­ing the pick­ing, so if Max and Min max­i­mize and min­i­mize prop­erly the tree’s value is 6. I don’t quite re­mem­ber how the three branches be­ing pruned off work. • Are you sure that you need an ad­vanced de­ci­sion the­ory two han­dle the one-box/​two-box prob­lem, or the PD-with-men­tal-clone prob­lem? You write that a CDT agent as­sumes that X’s de­ci­sion is in­de­pen­dent from the si­mul­ta­neous de­ci­sions of the Ys- that is, X could de­cide one way or an­other and ev­ery­one else’s de­ci­sions would stay the same. Well, that’s a com­mon situ­a­tion an­a­lyzed in game the­ory, but it’s not es­sen­tial to CDT. Con­sider play­ing a game of chess: your choice clearly af­fects the choice of your op­po­nent. Or con­sider the de­ci­sion of whether to punch a 6′5″, 250 lb. mus­cle-man who has just in­sulted you—your choice again has a strong in­fluence on his choice of ac­tion. CDT is ad­e­quate for an­a­lyz­ing both of these situ­a­tions. It is true that in my two ex­am­ples the other agent’s choice is made af­ter X’s choice, rather than be­ing si­mul­ta­neous with his. But of what rele­vance is the stipu­la­tion of si­mul­tane­ity? It’s only rele­vance is that it leads one to as­sume that the other de­ci­sions are in­de­pen­dent of X’s de­ci­sion! That is, the root of the difficulty is sim­ply that you’re an­a­lyz­ing the prob­lem us­ing an as­sump­tion that you know to be false! It seems to me that you can an­a­lyze the one-box/​two-box prob­lem or the PD-with-a-men­tal-clone prob­lem perfectly well us­ing CDT; you just have to use the right causal graph. The causal graph needs an arc from your de­ci­sion to Omega’s pre­dic­tion for the first prob­lem, and an arc from your de­ci­sion to the clone’s de­ci­sion in the sec­ond prob­lem. Then you do the usual max­i­miza­tion of ex­pected util­ity. • Of course, in these two prob­lems we know which causal links to draw. They were writ­ten to be sim­ple enough. The trick is to have a gen­eral the­ory that draws the right links here with­out draw­ing wrong links in other prob­lems, and which is for­mal­iz­able so that it can an­swer prob­lems more com­pli­cated than com­mon sense can han­dle. Among hu­man be­ings, the rele­vant dis­tinc­tion is be­tween de­ci­sions made be­fore or af­ter the other agent be­comes aware of your de­ci­sion- and you can cer­tainly come up with ex­am­ples where mu­tual ig­no­rance hap­pens. Fi­nally, situ­a­tions with iter­ated moves can be de­cided differ­ently by differ­ent de­ci­sion the­o­ries as well: con­sider New­comb’s Prob­lem where the big box is trans­par­ent as well! A CDT will always find the big box empty, and two-box; a UDT/​ADT will always find the big box full, and one-box. (TDT might two-box in that case, ac­tu­ally.) • Of course, in these two prob­lems we know which causal links to draw. [...] The trick is to have a gen­eral the­ory that draws the right links here with­out draw­ing wrong links in other prob­lems, If you don’t know that Omega’s de­ci­sion de­pends on yours, or that the other player in a Pri­soner’s Dilemma is your men­tal clone, then no the­ory can help you make the right choice; you lack the cru­cial piece of in­for­ma­tion. If you do know this in­for­ma­tion, then sim­ply crank­ing through stan­dard max­i­miza­tion of ex­pected util­ity gives you the right an­swer. Among hu­man be­ings, the rele­vant dis­tinc­tion is be­tween de­ci­sions made be­fore or af­ter the other agent be­comes aware of your decision No, the rele­vant dis­tinc­tion is whether or not your de­ci­sion is rele­vant to pre­dict­ing (post­dict­ing?) the other agent’s de­ci­sion. The cheat in New­comb’s Prob­lem and the PD-with-a-clone prob­lem is this: • you cre­ate an un­usual situ­a­tion where X’s de­ci­sion is clearly rele­vant to pre­dict­ing Y’s de­ci­sion, even though X’s de­ci­sion does not pre­cede Y’s, • then you in­sist that X must pre­tend that there is no con­nec­tion, even though he knows bet­ter, due to the lack of tem­po­ral prece­dence. Let’s take a look at what hap­pens in New­comb’s prob­lem if we just grind through the math. We have P(box 2 has$1 mil­lion | you choose to take both boxes) = 0

P(box 2 has $1 mil­lion | you choose to take only the sec­ond box) = 1 E[money gained | you choose to take both boxes] =$1000 + 0 * $1e6 =$1000

E[money gained | you choose to take only the sec­ond box] = $1000 + 1 *$1e6 = \$1001000

So where’s the prob­lem?

• Let’s take a look at what hap­pens in New­comb’s prob­lem if we just grind through the math. We have

That’s ev­i­den­tial de­ci­sion the­ory, which gives the wrong an­swer to the smok­ing le­sion prob­lem.

• I don’t un­der­stand the need for this “ad­vanced” de­ci­sion the­ory. The situ­a­tions you men­tion—Omega and the boxes, PD with a men­tal clone—are highly ar­tifi­cial; no hu­man be­ing has ever en­coun­tered such a situ­a­tion. So what rele­vance do these “ad­vanced” de­ci­sion the­o­ries have to de­ci­sions of real peo­ple in the real world?

• They’re no more ar­tifi­cial than the rest of Game The­ory- no hu­man be­ing has ever known their ex­act pay­offs for con­se­quences in terms of util­ity, ei­ther. Like I said, there may be a good deal of ad­vanced-de­ci­sion-the­ory-struc­ture in the way peo­ple sub­con­sciously de­cide to trust one an­other given par­tial in­for­ma­tion, and that’s some­thing that CDT anal­y­sis would treat as ir­ra­tional even when benefi­cial.

One bit of rele­vance is that “ra­tio­nal” has been wrongly con­flated with strate­gies akin to defect­ing in the Pri­soner’s Dilemma, or be­ing un­able to ge­ni­unely promise any­thing with high enough stakes, and ad­vanced de­ci­sion the­o­ries are the key to see­ing that the ra­tio­nal ideal doesn’t fail like that.

• They’re no more ar­tifi­cial than the rest of Game The­ory-

That’s an in­valid anal­ogy. We use math­e­mat­i­cal mod­els that we know are ideal ap­prox­i­ma­tions to re­al­ity all the time… but they are in­tended to be ap­prox­i­ma­tions of ac­tu­ally en­coun­tered cir­cum­stances. The ex­am­ples given in the ar­ti­cle bear no rele­vance to any cir­cum­stance any hu­man be­ing has ever en­coun­tered.

there may be a good deal of ad­vanced-de­ci­sion-the­ory-struc­ture in the way peo­ple sub­con­sciously de­cide to trust one an­other given par­tial in­for­ma­tion, and that’s some­thing that CDT anal­y­sis would treat as ir­ra­tional even when benefi­cial.

That doesn’t fol­low from any­thing said in the ar­ti­cle. Care to ex­plain fur­ther?

One bit of rele­vance is that “ra­tio­nal” has been wrongly con­flated with strate­gies akin to defect­ing in the Pri­soner’s Dilemma,

Defect­ing is the right thing to do in the Pri­soner’s Dilemma it­self; it is only when you mod­ify the con­di­tions in some way (im­plic­itly chang­ing the pay­offs, or hav­ing the other player’s de­ci­sion de­pend on yours) that the best de­ci­sion changes. In your ex­am­ple of the men­tal clone, a sim­ple ex­pected-util­ity max­i­miza­tion gives you the right an­swer, as­sum­ing you know that the other player will make the same move that you do.

• a sim­ple ex­pected-util­ity max­i­miza­tion gives you the right an­swer, as­sum­ing you know that the other player will make the same move that you do.

A sim­ple ex­pected util­ity max­i­miza­tion does. A CDT de­ci­sion doesn’t. For­mally spec­i­fy­ing a max­i­miza­tion al­gorithm that be­haves like CDT is, from what I un­der­stand, less sim­ple than mak­ing it fol­low UDT.

• If all we need to do is max­i­mize ex­pected util­ity, then where is the need for an “ad­vanced” de­ci­sion the­ory?

From Wikipe­dia: “Causal de­ci­sion the­ory is a school of thought within de­ci­sion the­ory which main­tains that the ex­pected util­ity of ac­tions should be eval­u­ated with re­spect to their po­ten­tial causal con­se­quences.”

It seems to me that the source of the prob­lem is in that phrase “causal con­se­quences”, and the con­fu­sion sur­round­ing the whole no­tion of causal­ity. The two prob­lems men­tioned in the ar­ti­cle are hard to fit within stan­dard no­tions of causal­ity.

It’s worth men­tion­ing that you can turn Pearl’s causal nets into plain old Bayesian net­works by ex­plic­itly mod­el­ing the no­tion of an in­ter­ven­tion. (Pearl him­self men­tions this in his book.) You just have to add some ad­di­tional vari­ables and their effects; this al­lows you to in­cor­po­rate the in­for­ma­tion con­tained in your causal in­tu­itions. This sug­gests to me that causal­ity re­ally isn’t a fun­da­men­tal con­cept, and that causal­ity co­nun­drums re­sults from failing to in­clude all the rele­vant in­for­ma­tion in your model.

[The term “model” here just refers to the joint prob­a­bil­ity dis­tri­bu­tion you use to rep­re­sent your state of in­for­ma­tion.]

Where I’m get­ting to with all of this is that if you model your in­for­ma­tion cor­rectly, the differ­ence be­tween Causal De­ci­sion The­ory and Ev­i­den­tial De­ci­sion The­ory dis­solves, and New­comb’s Para­dox and the Cloned Pri­soner’s Dilemma are eas­ily re­solved.

I think I’m go­ing to have to write this up as an ar­ti­cle of my own to re­ally ex­plain my­self...

• See my com­ment here—though if this prob­lem keeps com­ing up then a post should be writ­ten by some­one I guess.

• Game/​de­ci­sion the­ory is a math­e­mat­i­cal dis­ci­pline. It doesn’t get much more ar­tifi­cal than that. The fact that it is some­what ap­pli­ca­ble to re­al­ity is an in­ter­est­ing side effect.

• If your goal is to figure out what to have for break­fast, not much rele­vance at all.
If your goal is to pro­gram an au­to­mated de­ci­sion-mak­ing sys­tem to figure out what break­fast sup­plies to make available to the pop­u­la­tion of the West Coast of the U.S., per­haps quite a lot.
If your goal is to pro­gram an au­to­mated de­ci­sion-mak­ing sys­tem to figure out how to op­ti­mize all available re­sources for the max­i­mum benefit of hu­man­ity, per­haps even more.

There are lots of groups rep­re­sented on LW, with differ­ent per­ceived needs. Some are pri­mar­ily in­ter­ested in self-help threads, oth­ers pri­mar­ily in­ter­ested in aca­demic de­ci­sion-the­ory threads, and many oth­ers. Easiest is to ig­nore threads that don’t in­ter­est you.

• If your goal is to pro­gram an au­to­mated de­ci­sion-mak­ing sys­tem to figure out what break­fast sup­plies to make available to the pop­u­la­tion of the West Coast of the U.S., per­haps quite a lot.

This ex­am­ple has noth­ing like the char­ac­ter of the one-box/​two-box prob­lem or the PD-with-men­tal-clone prob­lem de­scribed in the ar­ti­cle. Why should it re­quire an “ad­vanced” de­ci­sion the­ory? Be­cause peo­ple’s con­sump­tion will re­spond to the sup­plies made available? But stan­dard game the­ory can han­dle that.

There are lots of groups rep­re­sented on LW, with differ­ent per­ceived needs. [...]Easiest is to ig­nore threads that don’t in­ter­est you.

It’s not that I’m not in­ter­ested; it’s that I’m puz­zled as to what pos­si­ble use these “ad­vanced” de­ci­sion the­o­ries can ever have to any­one.

• OK, ig­nore those ex­am­ples for a sec­ond, and ig­nore the word “ad­vanced.”

The OP is draw­ing a dis­tinc­tion be­tween CDT, which he claims fails in situ­a­tions where com­pet­ing agents can pre­dict one an­other’s be­hav­ior to vary­ing de­grees, and other de­ci­sion the­o­ries, which don’t fail. If he’s wrong in that claim, then ar­tic­u­lat­ing why would be helpful.

If, in­stead, he’s right in that claim, then I don’t see what’s use­less about the­o­ries that don’t fail in that situ­a­tion. At least, it cer­tainly seems to me that com­pet­ing agents pre­dict­ing one an­other’s be­hav­ior is some­thing that hap­pens all the time in the real world. Does it not seem that way to you?

• But the ba­sic as­sump­tion of stan­dard game the­ory, which I pre­sume he means to in­clude in CDT, is that the agents can pre­dict each other’s be­hav­ior—it is as­sumed that each will make the best move they pos­si­bly can.

I don’t think that pre­dict­ing be­hav­ior is the fun­da­men­tal dis­tinc­tion here. Game the­ory is all about deal­ing with in­tel­li­gent ac­tors who are try­ing to an­ti­ci­pate your own choices. That’s why the Nash equil­ibrium is gen­er­ally a prob­a­bil­is­tic strat­egy—to make your move un­pre­dictable.

• But the ba­sic as­sump­tion of stan­dard game the­ory, which I pre­sume he means to in­clude in CDT, is that the agents can pre­dict each other’s be­hav­ior—it is as­sumed that each will make the best move they pos­si­bly can.

Not quite. A unique Nash equil­ibrium is an un-ex­ploitable strat­egy; you don’t need to pre­dict what the other agents will do, be­cause the worst ex­pected util­ity for you is if they also pick the equil­ibrium. If they de­part, you can of­ten profit.

Non-unique Nash equil­ibria (like the co­or­di­na­tion game) are a clas­si­cal game the­ory prob­lem with­out a gen­eral solu­tion.

Clas­si­cal game the­ory uses the ax­iom of in­de­pen­dence to avoid hav­ing to pre­dict other agents in de­tail. The point of the ad­vanced de­ci­sion the­o­ries is that we can some­times do bet­ter than that out­come if in­de­pen­dence is in fact vi­o­lated.

• I’m not sure that equat­ing “CDT” with “stan­dard game the­ory” as you refer­ence it here is pre­serv­ing the OP’s point.

• This is nit pick­ing, but in the spe­cific case of New­comb’s prob­lem, it’s in­ten­tion­ally un­clear if your de­ci­sion af­fects Omega’s.

• There are mul­ti­ple for­mu­la­tions- I picked one that worked for my pur­poses, in which X has full knowl­edge of the setup, in­clud­ing the fact that the boxes are pre­pared be­fore X even en­ters the room. This rules out some ways of dodg­ing the ques­tion.

• I claim your vari­a­tion doesn’t ad­dress the hid­den as­sump­tion that P(pre­dict X | do X) is near unity, given that P(do X | pre­dict X) is near unity.
But there’s a stronger ob­jec­tion;
Why pick such a con­tentious ex­am­ple for a primer? Peo­ple have cached thoughts about the clas­sic ver­sion of the prob­lem, and that’s go­ing to in­terfere with the point you’re try­ing to make.

• Ac­cord­ing to Wikipe­dia there are mul­ti­ple clas­sic for­mu­la­tions, some in­volv­ing un­cer­tainty and oth­ers in­volv­ing cer­tainty.

• But there is one key prop­erty that dis­t­in­guishes CDT from the de­ci­sion the­o­ries we’ll talk about later: in its mod­el­ing of the world, X only al­lows event A to af­fect the prob­a­bil­ity of event B if A hap­pens be­fore B. (This is what causal means in Causal De­ci­sion The­ory.)

This leads to the as­sump­tion that X’s de­ci­sion is in­de­pen­dent from the si­mul­ta­neous de­ci­sions of the Ys- that is, X could de­cide one way or an­other and ev­ery­one else’s de­ci­sions would stay the same.

That doesn’t seem to fol­low.

It is sci­en­tifi­cally con­ven­tional to have the past caus­ing the fu­ture.

How­ever, de­ci­sions made by iden­ti­cal twins (and other sys­tems with shared in­ner work­ings) aren’t in­de­pen­dent. Not be­cause of some kind of spooky back­wards-in-time-cau­sa­tion, but be­cause both de­ci­sions de­pend on the ge­netic makeup of the twins—which was jointly de­ter­mined by the mother long ago.

So: this “in­de­pen­dence” prop­erty doesn’t seem to fol­low from the “past causal­ity” prop­erty.

So: where is the idea that CDT in­volves “in­de­pen­dent de­ci­sions” com­ing from?

• You know, you’re right. The in­de­pen­dence as­sump­tion doesn’t fol­low from time-causal­ity; it’s the main as­sump­tion it­self. (X’s pro­gram­mer writ­ing a CDT agent is a past cause of both the pre­dic­tion and the ac­tion.) I’ll fix the post.

• Thanks. I was in­ter­ested in where the “in­de­pen­dent de­ci­sions” idea comes from. This page on Causal De­ci­sion The­ory sug­gests that it prob­a­bly came from Robert Stal­naker in the 1970s—and was rol­led into CDT in:

• Gib­bard, Allan and William Harper. [1978] 1981. “Coun­ter­fac­tu­als and Two Kinds of Ex­pected Utility.”

• How­ever, de­ci­sions made by iden­ti­cal twins (and other sys­tems with shared in­ner work­ings) aren’t in­de­pen­dent. Not be­cause of some kind of spooky back­wards-in-time-cau­sa­tion, but be­cause both de­ci­sions de­pend on the ge­netic makeup of the twins—which was jointly de­ter­mined by the mother long ago.

Then again, in the chew­ing-gum var­i­ant of the smok­ing le­sion prob­lem, your de­ci­sion whether to chew gum and your ge­netic propen­sity to get throat ab­scesses aren’t in­de­pen­dent ei­ther. But ev­ery­body would agree that choos­ing to chew is still the right choice, wouldn’t they?

• I don’t think that af­fects my point (which was that con­sid­er­ing de­ci­sions made by differ­ent agents to be “in­de­pen­dent” of each other is not a con­se­quence of com­mon-sense sci­en­tific causal­ity). The idea seems to be com­ing from some­where else—but where?

• Great post!

“X should go to the zoo what­ever Y does”

Should be “when­ever.”

ETA: Oops. See Gust’s post.

• I guess he meant that “X should go to the zoo in­de­pen­dently of what Y does”.

• Heh. Oh yeah.

• What Gust said. Let me see if I can find a clearer phras­ing...

• Upvoted; very in­ter­est­ing.