# ACDT: a hack-y acausal decision theory

In­spired by my post on prob­lems with causal de­ci­sion the­ory (CDT), here is a hacked ver­sion of CDT that seems to be able to imi­tate time­less de­ci­sion the­ory (TDT) and func­tional de­ci­sion the­ory[1] (FDT), as well as up­date­less de­ci­sion the­ory (UDT) un­der cer­tain cir­cum­stances.

Call this ACDT, for (a)causal de­ci­sion the­ory. It is, es­sen­tially, CDT which can draw ex­tra, acausal ar­rows on the causal graphs, and which at­tempts to figure out which graph rep­re­sents the world it’s in. The draw­back is its lack of el­e­gance; the ad­van­tage, if it works, is that it’s sim­ple to spec­ify and fo­cuses at­ten­tion on the im­por­tant as­pects of de­duc­ing the graph.

# Defin­ing ACDT

## CDT and the New­comb problem

In the New­comb prob­lem, there is a pre­dic­tor who leaves two boxes, and pre­dicts whether you will take one (“one-box”) or both (“two-box”). If pre­dicts you will one-box, it had put a large prize in that first box; oth­er­wise that box is empty. There is always a small con­so­la­tion prize in the sec­ond box.

In terms of causal graphs, we can rep­re­sent it this way:

The dark red node is the de­ci­sion node, which the agent can af­fect. The green node is a util­ity node, whose value the agent cares about.

The CDT agent uses the “” op­er­a­tor from Pearl’s Causal­ity. Essen­tially all the in­com­ing ar­rows to the de­ci­sion node are cut (though the CDT agent keeps track of any in­for­ma­tion gained that way), then the CDT agent max­imises its util­ity by choos­ing its ac­tion:

In this situ­a­tion, the CDT agent will always two-box, since it treats ’s de­ci­sion as fixed, and in that case two-box­ing dom­i­nates, since you get what­ever’s in the first box, plus the con­so­la­tion prize.

## ACDT algorithm

The ACDT al­gorithm is similar, ex­cept that when it cuts the causal links to its de­ci­sion, it also adds po­ten­tial links from that de­ci­sion node to all the other nodes in the graph. Then it at­tempts to figure out which di­a­gram is cor­rect, and then max­imises its util­ity in the CDT way.

Note that ACDT doesn’t take a po­si­tion on what these ex­tra links are—whether they are point­ing back in time or are re­flect­ing some more com­pli­cated struc­ture (such as the ex­is­tence of pre­dic­tors). It just as­sumes the links could be there, and then works from that.

In a sense, ACDT can be seen as an­te­rior to CDT. How do we know that causal­ity ex­ists, and the rules it runs on? From our ex­pe­rience in the world. If we lived in a world where the New­comb prob­lem or the pre­dic­tors ex­ist prob­lem were com­mon­place, then we’d have a differ­ent view of causal­ity.

It might seem gra­tu­itous and wrong to draw ex­tra links com­ing out of your de­ci­sion node—but it was also gra­tu­itous and wrong to cut all the links that go into your de­ci­sion node. Draw­ing these ex­tra ar­rows un­does some of the dam­age, in a way that a CDT agent can un­der­stand (they don’t un­der­stand things that cause their ac­tions, but they do un­der­stand con­se­quences of their ac­tions).

## ACDT and the New­comb problem

As well as the stan­dard CDT graph above, ACDT can also con­sider the fol­low­ing graph, with a link from its de­ci­sion to ’s pre­dic­tion:

It now has to figure out which graph rep­re­sents the bet­ter struc­ture for the situ­a­tion it finds it­self in. If it’s en­coun­tered the New­comb prob­lem be­fore, and tried to one-box and two-box a few times, then it knows that the sec­ond graph gives more ac­cu­rate pre­dic­tions. And so it will one-box, just as well as the TDT fam­ily does.

## Gen­er­al­is­ing from other agents

If the ACDT agent has not en­coun­tered them­selves, but has seen it do the New­comb prob­lem for other agents, then the “figure out the true graph” be­comes more sub­tle. UDT and TDT are built from the as­sump­tion that equiv­a­lent al­gorithms/​agents in equiv­a­lent situ­a­tions will pro­duce equiv­a­lent re­sults.

But ACDT, built out of CDT and its solip­sis­tic cut­ting pro­cess, has no such as­sump­tions—at least, not ini­tially. It has to learn that the fate of other, similar agents, is ev­i­dence for its own graph. Once it learns that gen­er­al­i­sa­tion, then it can start to learn from the ex­pe­rience of oth­ers.

# ACDT on other de­ci­sion problems

## Pre­dic­tors exist

Each round of the pre­dic­tors ex­ist has a graph similar to the New­comb prob­lem, with the ad­di­tion of a node to re­peat the game:

After a few rounds, the ACDT agent will learn that the fol­low­ing graph best rep­re­sents its situ­a­tion:

And it will then swiftly choose to leave the game.

## Pri­soner’s dilemma with iden­ti­cal copy of itself

If con­fronted by the pris­oner’s dilemma with an iden­ti­cal copy of it­self, the ACDT agent, though un­able to for­mal­ise “we are iden­ti­cal”, will re­al­ise that they always make the same de­ci­sion:

And it will then choose to co­op­er­ate.

## Parfit’s hitchhiker

The Parfit’s hitch­hiker prob­lem is as fol­lows:

A greedy pre­dic­tor with a strong rep­u­ta­tion for hon­esty learns whether or not it’s true, and drafts a let­ter: I know whether or not you have ter­mites, and I have sent you this let­ter iff ex­actly one of the fol­low­ing is true: (i) the ru­mor is false, and you are go­ing to pay me $1,000 upon re­ceiv­ing this let­ter; or (ii) the ru­mor is true, and you will not pay me upon re­ceiv­ing this let­ter. The pre­dic­tor then pre­dicts what the agent would do upon re­ceiv­ing the let­ter, and sends the agent the let­ter iff ex­actly one of (i) or (ii) is true. Thus, the claim made by the let­ter is true. As­sume the agent re­ceives the let­ter. Should she pay up? The CDT agent will have the fol­low­ing graph: And the CDT agent will make the sim­ple and cor­rect de­ci­sion not to pay. ACDT can even­tu­ally reach the same con­clu­sion, but may re­quire more ev­i­dence. It also has to con­sider graphs of the fol­low­ing sort: The er­ror of ev­i­den­tial de­ci­sion the­ory (EDT) is, in effect, to act as if the light green ar­row ex­isted: that they can af­fect the ex­is­tence of the ter­mites through their de­ci­sion. ACDT, if con­fronted with similar prob­lems of­ten enough, will even­tu­ally learn that the light green ar­row has no effect, while the dark green one does have an effect (more cor­rectly: the model with the dark green ar­row is more ac­cu­rate, while the light green ar­row doesn’t add ac­cu­racy). It will then re­fuse to pay, just like the CDT agent does. Note that we might define ACDT as only cre­at­ing links with its own par­ent nodes—putting back the links it cut, but in the other di­rec­tion. In that case it would only con­sider links with “Your de­ci­sion al­gorithm” and “Let­ter sent”, not with “Ter­mites in house?”, and would never pay. But note that “Your de­ci­sion al­gorithm” is log­i­cal node, that might not ex­ist in phys­i­cal re­al­ity; that’s why I de­signed ACDT to al­low links to ar­bi­trary nodes, not just the ones that are its an­ces­tors, so it can cap­ture more mod­els about how the world works. ## Not UDT: coun­ter­fac­tual mugging The ACDT agent de­scribed above differs from UDT in that it doesn’t pay the coun­ter­fac­tual mug­ger: ap­pears and says that it has just tossed a fair coin, and given that the coin came up tails, it de­cided to ask you to give it$100. What­ever you do in this situ­a­tion, noth­ing else will hap­pen differ­ently in re­al­ity as a re­sult. Nat­u­rally you don’t want to give up your $100. But also tells you that if the coin came up heads in­stead of tails, it’d give you$10,000, but only if you’d agree to give it $100 if the coin came up tails. Do you give the$100?

Non-co­in­ci­den­tally, this prob­lem is difficult to rep­re­sent in a causal graph. One way of see­ing it could be this way:

Here the be­havi­our of the agent in the tails world, de­ter­mines ’s be­havi­our in the heads world. It would be tempt­ing to try and ex­tend ACDT, by draw­ing an ar­row from that de­ci­sion node to the node in the heads world.

But that doesn’t work, be­cause that de­ci­sion only hap­pens in the tails world—in the heads world, the agent has no de­ci­sion to make, so ACDT will do noth­ing. And in the tails world, the heads world is only coun­ter­fac­tu­ally rele­vant.

Now ACDT, like EDT, can learn, in some cir­cum­stances, to pay the coun­ter­fac­tual mug­ger. If this sce­nario hap­pens a lot, then it can note that agents that pay in the tails world get re­warded in the heads world, thus get­ting some­thing like this:

But that’s a bit too much of a hack, even for a hack-y method like this. More nat­u­ral and proper would be to have the ACDT agent not use its de­ci­sion as the node to cut-and-add-links from, but its policy (or, as in this post, its code). In that case, the coun­ter­fac­tual mug­ging can be rep­re­sented as a graph by the ACDT agent:

## Fully acausal trade

The ACDT agent might have is­sues with fully acausal trade (though, de­pend­ing on your view, this might be a fea­ture not a bug).

The rea­son be­ing, that since the ACDT agent never gets to ex­pe­rience acausal trade, it never gets to check whether there is a link be­tween it and hy­po­thet­i­cal other agents—imag­ine a New­comb prob­lem where you never get to see the money (which may be go­ing to a char­ity you sup­port—but that char­ity may not ex­ist ei­ther), nor whether ex­ists.

If an ACDT ever dis­cov­ered acausal trade, it would have to do so in an in­cre­men­tal fash­ion. It would first have to be­come com­fortable enough with pre­dic­tion prob­lems so that draw­ing links to pre­dic­tors is a nat­u­ral thing for it to do. It would have to be­come com­fortable enough with hy­po­thet­i­cal ar­gu­ments be­ing cor­rect, that it could gen­er­al­ise to situ­a­tions where it can­not ever get any em­piri­cal ev­i­dence.

So whether an ACDT agent ever en­gages in fully acausal trade, de­pends on how it gen­er­al­ises from ex­am­ples.

## Neu­ral nets learn­ing to be ACDT

It would be in­ter­est­ing to pro­gram a neu­ral net ACDT agent, based on these ex­am­ple. If any­one is in­ter­ested in do­ing so, let me know and go ahead.

# Learn­ing graphs and pri­ors over graphs

The ACDT agent is some­what slow and clunky at learn­ing, need­ing quite a few ex­am­ples be­fore it can ac­cept un­con­ven­tional se­tups.

If we want it to go faster, we can choose to mod­ify its pri­ors. For ex­am­ple, we can look at what ev­i­dence would con­vince us that an ac­cu­rate pre­dic­tor ex­isted, and put a prior that would have a cer­tain graph, con­di­tional on see­ing that ev­i­dence.

Or if we want to be closer to UDT, we could for­mal­ise state­ments about al­gorithms, and about their fea­tures and similar­i­ties (or for­mal­ise math­e­mat­i­cal re­sults about proofs, and about how to gen­er­al­ise from known math­e­mat­i­cal re­sults). Ad­ding that to the ACDT agent gives an agent much closer to UDT.

So it seems that ACDT+”the cor­rect pri­ors”, is close to var­i­ous differ­ent acausal agent de­signs.

1. Since FDT is still some­what un­defined, I’m view­ing as TDT-like rather than UDT-like for the mo­ment. ↩︎

• Nice post! I found the di­a­grams par­tic­u­larly read­able, it makes a lot of sense to me to have them in such a prob­lem.

I’m not very well-read on this sort of work, so feel free to ig­nore any of the fol­low­ing.

The key ques­tion I have is the cor­rect­ness of the sec­tion:

In a sense, ACDT can be seen as an­te­rior to CDT. How do we know that causal­ity ex­ists, and the rules it runs on? From our ex­pe­rience in the world. If we lived in a world where the New­comb prob­lem or the pre­dic­tors ex­ist prob­lem were com­mon­place, then we’d have a differ­ent view of causal­ity.

It might seem gra­tu­itous and wrong to draw ex­tra links com­ing out of your de­ci­sion node—but it was also gra­tu­itous and wrong to cut all the links that go into your de­ci­sion node. Draw­ing these ex­tra ar­rows un­does some of the dam­age, in a way that a CDT agent can un­der­stand (they don’t un­der­stand things that cause their ac­tions, but they do un­der­stand con­se­quences of their ac­tions).

I don’t quite see why the causal­ity is this flex­ible and ar­bi­trary. I haven’t read Causal­ity, but think I get the gist.

It’s definitely con­ve­nient here to be un­cer­tain about causal­ity. But it would be similarly con­ve­nient to have un­cer­tainty about the cor­rect de­ci­sion the­ory. A similar for­mu­la­tion could in­volve a meta-de­ci­sion-al­gorithm that has tries differ­ent de­ci­sion al­gorithms un­til one pro­duces fa­vor­able out­comes. Per­son­ally I think I’d be eas­ier to be con­vinced that acausal de­ci­sion the­ory is cor­rect than that a differ­ent causal struc­ture is cor­rect.

Semi-re­lated, one as­pect of New­comb’s prob­lem that has re­ally con­fused me is the po­ten­tial for Omega to have sce­nar­ios that fa­vor in­cor­rect be­liefs. It would be ar­bi­trary to imag­ine that New­comb would offer \$1,000 only if it could tell that one be­lieves that “19 + 2 = 20”. One could solve that by imag­in­ing that the par­ti­ci­pant should have un­cer­tainty about what “19 + 2″ is, try­ing out mul­ti­ple op­tions, and see­ing which would pro­duce the most fa­vor­able out­come.

Separately,

If it’s en­coun­tered the New­comb prob­lem be­fore, and tried to one-box and two-box a few times, then it knows that the sec­ond graph gives more ac­cu­rate predictions

To be clear, I’d as­sume that the agent would be smart enough to simu­late this be­fore ac­tu­ally hav­ing it done? The out­come seems de­cently ap­par­ent to me.

• I don’t quite see why the causal­ity is this flex­ible and ar­bi­trary.

In sto­ries and movies, peo­ple of­ten find that the key tool/​skill/​knowl­edge they need to solve the prob­lem, is some­thing minor they picked up some time be­fore.

The world could work like this, so that ev­ery minor thing you spent any time on would have a pay­off at some point in the fu­ture. Call this a tele­olog­i­cal world.

This world would have a differ­ent “causal” struc­ture to our own, and we’d prob­a­bly not con­ceive tra­di­tional CDT agents as likely in this world.

• But it would be similarly con­ve­nient to have un­cer­tainty about the cor­rect de­ci­sion the­ory.

Yes, this is re­ally in­ter­est­ing for me. For ex­am­ple, if I have the New­comb-like prob­lem, but un­cer­tain about the de­ci­sion the­ory, I should one box, as in that case my ex­pected pay­off is higher (if I give equal prob­a­bil­ity to both out­comes of the New­comb ex­per­i­ment.)

• Planned sum­mary for the pre­vi­ous post for the Align­ment Newslet­ter:

Con­sider a set­ting in which an agent can play a game against a pre­dic­tor. The agent can choose to say zero or one. It gets 3 util­ity if it says some­thing differ­ent from the pre­dic­tor, and −1 util­ity if it says the same thing. If the pre­dic­tor is near-perfect, but the agent mod­els it­self as hav­ing ac­cess to un­pre­dictable ran­dom­ness, then the agent will con­tinu­ally try to ran­dom­ize (which it calcu­lates has ex­pected util­ity 1), and will con­tinu­ally lose.

Planned sum­mary for this post:

The prob­lem with the pre­vi­ous agent is that it never learns that it has the wrong causal model. If the agent is able to learn a bet­ter causal model from ex­pe­rience, then it can learn that it is not ac­tu­ally able to use un­pre­dictable ran­dom­ness, and so it will no longer ex­pect a 50% chance of win­ning, and it will stop play­ing the game.
• If the pre­dic­tor is near-perfect, but the agent mod­els it­self as hav­ing ac­cess to un­pre­dictable ran­dom­ness, then the agent will con­tinu­ally try to ran­dom­ize (which it calcu­lates has ex­pected util­ity 1), and will con­tinu­ally lose.

It’s ac­tu­ally worse than that for CDT; the agent is not ac­tu­ally try­ing to ran­domise, it is com­pel­led to model the pre­dic­tor as a pro­cess that is com­pletely dis­con­nected from its own ac­tions, so it can freely pick the ac­tion that the pre­dic­tor is least likely to pick—ac­cord­ing to the CDT’s mod­el­ling of it. Or pick zero in the case of a tie. So the CDT agent is ac­tu­ally de­ter­minis­tic, and even if you gave it a source of ran­dom­ness, it wouldn’t see any need to use it.

The prob­lem with the pre­vi­ous agent is that it never learns that it has the wrong causal model. If the agent is able to learn a bet­ter causal model from ex­pe­rience, then it can learn that it is not ac­tu­ally able to use un­pre­dictable ran­dom­ness, and so it will no longer ex­pect a 50% chance of win­ning, and it will stop play­ing the game.

[...] then it can learn that the pre­dic­tor can ac­tu­ally pre­dict the agent suc­cess­fully, and so will no longer ex­pect a 50% [...]

• Thanks! I changed it to:

If the pre­dic­tor is near-perfect, but the agent mod­els its ac­tions as in­de­pen­dent of the pre­dic­tor (since the pre­dic­tion was made in the past), then the agent will have some be­lief about the pre­dic­tion and will choose the less likely ac­tion for ex­pected util­ity at least 1, and will con­tinu­ally lose.

The prob­lem with the pre­vi­ous agent is that it never learns that it has the wrong causal model. If the agent is able to learn a bet­ter causal model from ex­pe­rience, then it can learn that the pre­dic­tor can ac­tu­ally pre­dict the agent suc­cess­fully, and so will no longer ex­pect a 50% chance of win­ning, and it will stop play­ing the game.
• What you wrote is good, and not worth chang­ing. But I wanted to men­tion that CDT is even more bonkers than that: the pre­dic­tion can be made in the fu­ture, just as long as there is no causal path to how the pre­dic­tor is pre­dict­ing. In some cases, the pre­dic­tor can even know the ac­tion taken, and still pre­dict in a way that CDT thinks is causally dis­con­nected.

• You can also model the agent as failing to learn that its “un­pre­dictable ran­dom­ness” isn’t. It’s still the case that the sim­ple anal­y­sis of “agents which can’t learn a true fact will fail in cases where that some­thing mat­ters” is good enough.

• I don’t strong-up­vote of­ten. This is very cool.

One thing that I sus­pect will be­come nec­es­sary in the ad­ver­sar­ial cases (those where two agents’ de­ci­sions are de­pen­dent on (mod­els of) each other) is some kind of re­cur­sion in calcu­la­tion of out­come. Most of the prob­le­matic cases come down to whether the agent un­der ob­ser­va­tion or Omega can model the com­bi­na­tion of both agents bet­ter than the other, in­clud­ing how well A mod­els B mod­el­ing A mod­el­ing B …

In these cases, the bet­ter mod­eler wins, and at some point, a good DT will rec­og­nize that pick­ing the joint win (where the agent gets some­thing and Omega fulfills their con­tract) is bet­ter than an unattain­able big­ger win (where the agent gets more, but Omega is fooled, but we run out of re­cur­sion space be­fore find­ing the out­come, when we model omega as calcu­lat­ing last (more pow­er­ful) in our ex­e­cu­tion).

• To other read­ers: If you see bro­ken image links, try right-click+View Image, or open the page in Chrome or Sa­fari. In my Fire­fox 71 they are not work­ing.

• That’s an­noy­ing—thanks for point­ing it out. Any idea what the is­sue is?

• Oliver from LessWrong just helped me point the ac­cusatory finger at my­self. – The plu­gin Pri­vacy Badger was block­ing drop­box.com, so the images couldn’t be loaded.

• No idea. But I’ve sin­gled your post out un­fairly. I just re­mem­bered some other posts where I saw bro­ken links and they are also only bro­ken in Fire­fox. I’ve writ­ten to the LessWrong team, so I hope they’ll look into it.