ACDT: a hack-y acausal decision theory

In­spired by my post on prob­lems with causal de­ci­sion the­ory (CDT), here is a hacked ver­sion of CDT that seems to be able to imi­tate time­less de­ci­sion the­ory (TDT) and func­tional de­ci­sion the­ory[1] (FDT), as well as up­date­less de­ci­sion the­ory (UDT) un­der cer­tain cir­cum­stances.

Call this ACDT, for (a)causal de­ci­sion the­ory. It is, es­sen­tially, CDT which can draw ex­tra, acausal ar­rows on the causal graphs, and which at­tempts to figure out which graph rep­re­sents the world it’s in. The draw­back is its lack of el­e­gance; the ad­van­tage, if it works, is that it’s sim­ple to spec­ify and fo­cuses at­ten­tion on the im­por­tant as­pects of de­duc­ing the graph.

Defin­ing ACDT

CDT and the New­comb problem

In the New­comb prob­lem, there is a pre­dic­tor who leaves two boxes, and pre­dicts whether you will take one (“one-box”) or both (“two-box”). If pre­dicts you will one-box, it had put a large prize in that first box; oth­er­wise that box is empty. There is always a small con­so­la­tion prize in the sec­ond box.

In terms of causal graphs, we can rep­re­sent it this way:

The dark red node is the de­ci­sion node, which the agent can af­fect. The green node is a util­ity node, whose value the agent cares about.

The CDT agent uses the “” op­er­a­tor from Pearl’s Causal­ity. Essen­tially all the in­com­ing ar­rows to the de­ci­sion node are cut (though the CDT agent keeps track of any in­for­ma­tion gained that way), then the CDT agent max­imises its util­ity by choos­ing its ac­tion:

In this situ­a­tion, the CDT agent will always two-box, since it treats ’s de­ci­sion as fixed, and in that case two-box­ing dom­i­nates, since you get what­ever’s in the first box, plus the con­so­la­tion prize.

ACDT algorithm

The ACDT al­gorithm is similar, ex­cept that when it cuts the causal links to its de­ci­sion, it also adds po­ten­tial links from that de­ci­sion node to all the other nodes in the graph. Then it at­tempts to figure out which di­a­gram is cor­rect, and then max­imises its util­ity in the CDT way.

Note that ACDT doesn’t take a po­si­tion on what these ex­tra links are—whether they are point­ing back in time or are re­flect­ing some more com­pli­cated struc­ture (such as the ex­is­tence of pre­dic­tors). It just as­sumes the links could be there, and then works from that.

In a sense, ACDT can be seen as an­te­rior to CDT. How do we know that causal­ity ex­ists, and the rules it runs on? From our ex­pe­rience in the world. If we lived in a world where the New­comb prob­lem or the pre­dic­tors ex­ist prob­lem were com­mon­place, then we’d have a differ­ent view of causal­ity.

It might seem gra­tu­itous and wrong to draw ex­tra links com­ing out of your de­ci­sion node—but it was also gra­tu­itous and wrong to cut all the links that go into your de­ci­sion node. Draw­ing these ex­tra ar­rows un­does some of the dam­age, in a way that a CDT agent can un­der­stand (they don’t un­der­stand things that cause their ac­tions, but they do un­der­stand con­se­quences of their ac­tions).

ACDT and the New­comb problem

As well as the stan­dard CDT graph above, ACDT can also con­sider the fol­low­ing graph, with a link from its de­ci­sion to ’s pre­dic­tion:

It now has to figure out which graph rep­re­sents the bet­ter struc­ture for the situ­a­tion it finds it­self in. If it’s en­coun­tered the New­comb prob­lem be­fore, and tried to one-box and two-box a few times, then it knows that the sec­ond graph gives more ac­cu­rate pre­dic­tions. And so it will one-box, just as well as the TDT fam­ily does.

Gen­er­al­is­ing from other agents

If the ACDT agent has not en­coun­tered them­selves, but has seen it do the New­comb prob­lem for other agents, then the “figure out the true graph” be­comes more sub­tle. UDT and TDT are built from the as­sump­tion that equiv­a­lent al­gorithms/​agents in equiv­a­lent situ­a­tions will pro­duce equiv­a­lent re­sults.

But ACDT, built out of CDT and its solip­sis­tic cut­ting pro­cess, has no such as­sump­tions—at least, not ini­tially. It has to learn that the fate of other, similar agents, is ev­i­dence for its own graph. Once it learns that gen­er­al­i­sa­tion, then it can start to learn from the ex­pe­rience of oth­ers.

ACDT on other de­ci­sion problems

Pre­dic­tors exist

Each round of the pre­dic­tors ex­ist has a graph similar to the New­comb prob­lem, with the ad­di­tion of a node to re­peat the game:

After a few rounds, the ACDT agent will learn that the fol­low­ing graph best rep­re­sents its situ­a­tion:

And it will then swiftly choose to leave the game.

Pri­soner’s dilemma with iden­ti­cal copy of itself

If con­fronted by the pris­oner’s dilemma with an iden­ti­cal copy of it­self, the ACDT agent, though un­able to for­mal­ise “we are iden­ti­cal”, will re­al­ise that they always make the same de­ci­sion:

And it will then choose to co­op­er­ate.

Parfit’s hitchhiker

The Parfit’s hitch­hiker prob­lem is as fol­lows:

Sup­pose you’re out in the desert, run­ning out of wa­ter, and soon to die—when some­one in a mo­tor ve­hi­cle drives up next to you. Fur­ther­more, the driver of the mo­tor ve­hi­cle is a perfectly self­ish ideal game-the­o­retic agent, and even fur­ther, so are you; and what’s more, the driver is Paul Ek­man, who’s re­ally, re­ally good at read­ing fa­cial microex­pres­sions. The driver says, “Well, I’ll con­vey you to town if it’s in my in­ter­est to do so—so will you give me $100 from an ATM when we reach town?”

Now of course you wish you could an­swer “Yes”, but as an ideal game the­o­rist your­self, you re­al­ize that, once you ac­tu­ally reach town, you’ll have no fur­ther mo­tive to pay off the driver. “Yes,” you say. “You’re ly­ing,” says the driver, and drives off leav­ing you to die.

For ACDT, it will learn the fol­low­ing graph:

And will in­deed pay the driver.

XOR blackmail

XOR black­mail is one of my favourite de­ci­sion prob­lems.

An agent has been alerted to a ru­mor that her house has a ter­rible ter­mite in­fes­ta­tion that would cost her $1,000,000 in dam­ages. She doesn’t know whether this ru­mor is true.

A greedy pre­dic­tor with a strong rep­u­ta­tion for hon­esty learns whether or not it’s true, and drafts a let­ter: I know whether or not you have ter­mites, and I have sent you this let­ter iff ex­actly one of the fol­low­ing is true: (i) the ru­mor is false, and you are go­ing to pay me $1,000 upon re­ceiv­ing this let­ter; or (ii) the ru­mor is true, and you will not pay me upon re­ceiv­ing this let­ter.

The pre­dic­tor then pre­dicts what the agent would do upon re­ceiv­ing the let­ter, and sends the agent the let­ter iff ex­actly one of (i) or (ii) is true. Thus, the claim made by the let­ter is true. As­sume the agent re­ceives the let­ter. Should she pay up?

The CDT agent will have the fol­low­ing graph:

And the CDT agent will make the sim­ple and cor­rect de­ci­sion not to pay.

ACDT can even­tu­ally reach the same con­clu­sion, but may re­quire more ev­i­dence. It also has to con­sider graphs of the fol­low­ing sort:

The er­ror of ev­i­den­tial de­ci­sion the­ory (EDT) is, in effect, to act as if the light green ar­row ex­isted: that they can af­fect the ex­is­tence of the ter­mites through their de­ci­sion.

ACDT, if con­fronted with similar prob­lems of­ten enough, will even­tu­ally learn that the light green ar­row has no effect, while the dark green one does have an effect (more cor­rectly: the model with the dark green ar­row is more ac­cu­rate, while the light green ar­row doesn’t add ac­cu­racy). It will then re­fuse to pay, just like the CDT agent does.

Note that we might define ACDT as only cre­at­ing links with its own par­ent nodes—putting back the links it cut, but in the other di­rec­tion. In that case it would only con­sider links with “Your de­ci­sion al­gorithm” and “Let­ter sent”, not with “Ter­mites in house?”, and would never pay. But note that “Your de­ci­sion al­gorithm” is log­i­cal node, that might not ex­ist in phys­i­cal re­al­ity; that’s why I de­signed ACDT to al­low links to ar­bi­trary nodes, not just the ones that are its an­ces­tors, so it can cap­ture more mod­els about how the world works.

Not UDT: coun­ter­fac­tual mugging

The ACDT agent de­scribed above differs from UDT in that it doesn’t pay the coun­ter­fac­tual mug­ger:

ap­pears and says that it has just tossed a fair coin, and given that the coin came up tails, it de­cided to ask you to give it $100. What­ever you do in this situ­a­tion, noth­ing else will hap­pen differ­ently in re­al­ity as a re­sult. Nat­u­rally you don’t want to give up your $100. But also tells you that if the coin came up heads in­stead of tails, it’d give you $10,000, but only if you’d agree to give it $100 if the coin came up tails. Do you give the $100?

Non-co­in­ci­den­tally, this prob­lem is difficult to rep­re­sent in a causal graph. One way of see­ing it could be this way:

Here the be­havi­our of the agent in the tails world, de­ter­mines ’s be­havi­our in the heads world. It would be tempt­ing to try and ex­tend ACDT, by draw­ing an ar­row from that de­ci­sion node to the node in the heads world.

But that doesn’t work, be­cause that de­ci­sion only hap­pens in the tails world—in the heads world, the agent has no de­ci­sion to make, so ACDT will do noth­ing. And in the tails world, the heads world is only coun­ter­fac­tu­ally rele­vant.

Now ACDT, like EDT, can learn, in some cir­cum­stances, to pay the coun­ter­fac­tual mug­ger. If this sce­nario hap­pens a lot, then it can note that agents that pay in the tails world get re­warded in the heads world, thus get­ting some­thing like this:

But that’s a bit too much of a hack, even for a hack-y method like this. More nat­u­ral and proper would be to have the ACDT agent not use its de­ci­sion as the node to cut-and-add-links from, but its policy (or, as in this post, its code). In that case, the coun­ter­fac­tual mug­ging can be rep­re­sented as a graph by the ACDT agent:

Fully acausal trade

The ACDT agent might have is­sues with fully acausal trade (though, de­pend­ing on your view, this might be a fea­ture not a bug).

The rea­son be­ing, that since the ACDT agent never gets to ex­pe­rience acausal trade, it never gets to check whether there is a link be­tween it and hy­po­thet­i­cal other agents—imag­ine a New­comb prob­lem where you never get to see the money (which may be go­ing to a char­ity you sup­port—but that char­ity may not ex­ist ei­ther), nor whether ex­ists.

If an ACDT ever dis­cov­ered acausal trade, it would have to do so in an in­cre­men­tal fash­ion. It would first have to be­come com­fortable enough with pre­dic­tion prob­lems so that draw­ing links to pre­dic­tors is a nat­u­ral thing for it to do. It would have to be­come com­fortable enough with hy­po­thet­i­cal ar­gu­ments be­ing cor­rect, that it could gen­er­al­ise to situ­a­tions where it can­not ever get any em­piri­cal ev­i­dence.

So whether an ACDT agent ever en­gages in fully acausal trade, de­pends on how it gen­er­al­ises from ex­am­ples.

Neu­ral nets learn­ing to be ACDT

It would be in­ter­est­ing to pro­gram a neu­ral net ACDT agent, based on these ex­am­ple. If any­one is in­ter­ested in do­ing so, let me know and go ahead.

Learn­ing graphs and pri­ors over graphs

The ACDT agent is some­what slow and clunky at learn­ing, need­ing quite a few ex­am­ples be­fore it can ac­cept un­con­ven­tional se­tups.

If we want it to go faster, we can choose to mod­ify its pri­ors. For ex­am­ple, we can look at what ev­i­dence would con­vince us that an ac­cu­rate pre­dic­tor ex­isted, and put a prior that would have a cer­tain graph, con­di­tional on see­ing that ev­i­dence.

Or if we want to be closer to UDT, we could for­mal­ise state­ments about al­gorithms, and about their fea­tures and similar­i­ties (or for­mal­ise math­e­mat­i­cal re­sults about proofs, and about how to gen­er­al­ise from known math­e­mat­i­cal re­sults). Ad­ding that to the ACDT agent gives an agent much closer to UDT.

So it seems that ACDT+”the cor­rect pri­ors”, is close to var­i­ous differ­ent acausal agent de­signs.

  1. Since FDT is still some­what un­defined, I’m view­ing as TDT-like rather than UDT-like for the mo­ment. ↩︎