FDT defects in a realistic Twin Prisoners’ Dilemma

SMK15 Sep 2022 8:55 UTC

42 points

Newcomb's Problem World Modeling Psychological Twin Prisoner's Dilemma Updateless Decision Theory Functional Decision Theory Prisoner's Dilemma Decision theory

Thanks to Caspar Oesterheld for discussion and helpful comments, as well as Tristan Cook, James Faville, Daniel Kokotajlo and Lukas Finnveden.

Summary

Updateless decision theory (UDT)/functional decision theory (FDT) can be formulated with “logical conditional probability” as opposed to “logi-causalist counterfactuals”. I argue in favour of the former, on the grounds that this variant of UDT/FDT ensures robust mutual cooperation in the Twin Prisoner’s Dilemma between two realistic UDT/FDT agents, whereas the causalist variant does not. This falls out of thinking about how agents approximate decision theories and how they intervene on the outputs of different, yet similar, decision algorithms.

Introduction

Updateless decision theory does not necessarily have to be formulated with logical counterfactuals: you could also use conditional probability. This is true of functional decision theory and timeless decision theory as well, mutatis mutandis. Specifically, we can define ‘updateless (logically) causal decision theory’ (UCDT), which is the standard formulation, and ‘updateless evidential decision theory’ (UEDT) in the UDT1.1 framework^[1] respectively as:

U C D T 1.1 (s) = arg max π \in Π \sum o \in O U (o) P (┌ U C D T 1.1 (s) = π ┐ □\to o),

U E D T 1.1 (s) = arg max π \in Π \sum o \in O U (o) P (o | ┌ U E D T 1.1 (s) = π ┐) .

(It seems Wei Dai initially thought of UDT as something evidential; see this comment, for example.^[2])

As I understand it, Troll Bridge (a logical version of the Smoking Lesion) is a decision problem where UCDT and UEDT come apart, and UCDT is taken to give the correct answer. I personally think it is unclear what the takeaway from Troll Bridge should be, and I think there are problems in which UEDT is clearly preferable^[4] to UCDT. And the latter is what this post is about.

Mutual UDT cooperation

The causal variants of logical decision theories like UDT, FDT and TDT are understood to ensure mutual cooperation in the Twin Prisoner’s Dilemma. The reasoning goes as follows: “in virtue of us being twins, we are implementing the same decision algorithm, and now the question is about what this algorithm will output. Suppose it outputs ‘defect’; then both of us will defect. Suppose it outputs ‘cooperate’; then both of us will cooperate. And since $u (C, C) > u (D, D)$ , I will cooperate.”.

In terms of causal graphs, this is how I generally understand symmetric games between two agents that are both implementing UDT (the red area representing ‘logic’):

^[5]

So when you intervene on the output of your decision theory you will have a logically causal effect on the action the other player takes, since both of your actions are downstream of what UDT outputs in the given situation (or alternatively, the output is both of your actions). As said, you will then take the same action, meaning you should e.g. cooperate in the Prisoner’s Dilemma. All good so far (modulo concerns about “logical causality”, of course).

(In the context of causal graphs, I think of the decision theory, UDT, as a function that provides a [perhaps infinitely] large table of state-action pairs, i.e. a [perhaps infinite] amount of logical statements of the form “UDT(state) = action”, and the conjunction of these is the logical statement “UDT”—the node in the graph.)

Approximating UDT: when logical counterfactuals are brittle

But real-world agents do not implement the “ideal” version of UDT^[6]: rather—insofar as they think UDT actually is the “ideal”—they imperfectly approximate UDT in some way, due to boundedness and/or other practical constraints. In practice, this could, for example, look like doing Monte Carlo approximations for the expected utility, using approximations for $π$ and $e$ , using heuristics for intractable planning problems, et cetera—i.e. trying to implement UDT but doing something (ex ante) suboptimal with respect to the formalism. Also, these agents are probably going to approximate UDT differently, such that you do not get the neat situation above since these different approximations would then refer to wholly distinct logical statements (the statements, as before, being the conjunction of all of the action recommendations of the respective UDT approximations). That is, these approximations should be represented as two different nodes in our graph.

The way this plausibly works is that for a given decision problem, both players respectively do the thing that ideal UDT would have recommended with some probabilities corresponding to e.g. how much compute they have. Moreover, these approximations are conditionally independent given ‘ideal UDT’.

(Perhaps this is different to what is normally meant by an “approximation of a decision theory”. Here I am specifically thinking of it as something very ‘top-down’ where you look at the ideal version of the theory and then make concessions in the face of boundedness. You could also think of it as something more ‘bottom-up‘ where you start out with some minimal set of very basic principles of choice that [in the limit] evolve into the ideal theory as the agent grows more sophisticated and less bounded. This might be the more plausible perspective when thinking in terms of building AI systems. More on this later.)

Furthermore, these top-down approximations are arguably downstream of UDT itself (i.e. there is a path from UDT to the approximations in the causal graph). I do not have a great argument for this, but it seems intuitive to represent the process of top-down approximations as functions of the type $f : D T_{i d e a l} \to D T_{a p p r o x}$ , in which case I think it is natural to say that UDT logi-causes the approximations of UDT. (For example, we might want to think about the approximation function $f$ as adding e.g. Gaussian noise to the “ideal” distribution over actions for some decision problem.)

With these assumptions, an interesting situation arises where it seems to matter what specific formulation of UDT we are using. Consider the causal graph of the symmetric game again:

So in this case, when you intervene on the output of your decision theory, you do not have a logi-causal effect on the action of the other agent (since you are doing different approximations). However, I think we can say that there is some ‘logical correlation’ between the outputs of the two approximations; i.e. as long as there is no screening off, there is some (logi-)evidential dependence between $a^{'}$ and $a^{''}$ . This similarity should arguably be taken into account, and it is (not surprisingly) taken into account when we use conditionals for the expected utility probabilities.

That is, given sufficiently similar approximations, UEDT tells you to cooperate in the Prisoner’s Dilemma (under the assumption that both agents are doing some approximation of UEDT, of course).

On the other hand, without any further conceptual engineering, and given how we think of decision-theoretic approximations here, UCDT trivially recommends defecting, since it only thinks about logi-causal effects, and defecting dominates.

You could of course say something like “sure, but with some probability we are actually doing the same approximation, such that my action actually logi-causally determines the action of the other player in some world that is (hopefully) not too improbable”. In principle, I think this works, but since there are so many different ways of approximating a given decision theory—probably an infinite number—and considering you only need the slightest difference between them for cooperation to break down, the probability would not suffice. This is important to keep in mind.

(Further notes on the graphs above:

I do not have a firm grip on what ‘causality’ and ‘correlation’ are precisely supposed to mean in the logical context.
- Nevertheless, as an intuition, I take “statement A logically causes statement B” to very roughly mean something in the vicinity of “in proving stuff, you encounter A before B, and use A to prove B”.
- And then we could say that “B and C are logically correlated if learning B gives some evidence about the truth value of C”; e.g. because they have a common cause.
- So, in the graph, learning that my approximation outputs a given action gives me some evidence that the approximation of the other player also outputs that action.
Perhaps there are cases in which one approximation of UDT is based on another approximation, and you get arrows going from one to another. (This is not really what we are modelling, though.) It is however unclear whether UCDT even gets it right in this scenario since we are really just interested in potential arrows between the decision nodes. For example, we could draw the graph in the following way where we do not get cooperation:
- But this counterargument is not watertight since it depends on what kind of approximation we have in mind here: you could for example say that the outputs of the second approximation are partly determined by the individual outputs of the first approximation, such that you (in some way) have a causal arrow going from the bottom left node to the bottom right, in which case it is arguably possible for two UCDT-approximators to achieve mutual cooperation. One specific example (where the approximation of the second agent is elicited from the action recommendations of the approximation of the first agent plus some noise, say):
  - So when the first player intervenes on a given decision node, they have a logi-causal effect on the output of the second player’s approximation (since it is downstream).
Although two UEDT-approximators may be highly correlated, it is not obvious that this will lead to mutual cooperation. For example, it could just be that, due to low intelligence or too little compute, the agents are very bad at approximating UEDT and will not take this correlation into account properly or make some other big mistake.
- (One perspective I have found pretty interesting lately is thinking of CDT-ish decision-making as an approximation of EDT-ish decision-making.^[7] Therefore, for some situations at least, it might make sense to think about a low-intelligence, low-compute ‘UEDT-approximator’ as something akin to a UCDT agent, in which case we do not necessarily get mutual cooperation.)
We have only considered symmetric games thus far. But most games are asymmetric—even Prisoner’s Dilemmas (since we often derive at least slightly different utilities from the different outcomes). Does this create analogous difficulties even for ideal UCDT? Perhaps not if you can simulate the situation of the other player, but if you are just argmaxing over available actions in your own situation then you are not intervening on the output of the other player’s decision theory (because you are in different situations) and you could think that we get something that looks similar to previous situations where there is correlation but not (logi-)causation.
- But you can perhaps solve this by being updateless with respect to what side of the game you are playing (à la Rawls and Harsanyi).^[8] Specifically, the following policy achieves mutual cooperation between two ideal UCDTers: “if I am player 1, cooperate; if I am player 2, cooperate”. And this is achieved because the meta-game is symmetric, and you will determine the policy of the other player.
But I do nevertheless think that asymmetries create extra difficulties for getting mutual cooperation when the agents are doing approximations (and not the ideal)—but this holds for both UEDT and UCDT, I suppose.)

What about Newcomb’s?

As we know, the Twin Prisoner’s Dilemma is a Newcomb’s problem. That raises a question: does UDT/FDT with counterfactuals actually two-box under reasonable assumptions about Omega (just as I have argued that UCDT defects against another agent implementing UCDT under reasonable assumptions about the agents)?

I think this is a bit unclear and depends on what we think Omega is doing exactly (i.e. what are “reasonable assumptions”?): is she just a very good psychologist, or is she basing her prediction on a perfect simulation of you? In the former case, it seems we have the same exactness issues as before, and UCDT might two-box^[9]; and the latter case merely corresponds to the case where the twin in the Prisoner’s Dilemma is your exact copy, and thus you one-box.

Perhaps Omega is not directly approximating UCDT in her simulation, though, but rather approximating you. That is, approximating your approximation of UCDT. In that case, it seems like there is a good argument for saying that UCDT would one-box since Omega’s approximation is downstream of your approximation.

I don’t find this super interesting to discuss, and since the arguments in this post are based on thinking about realistic agents in the real world, I will set Newcomb’s aside and keep focusing on the Prisoner’s Dilemma. (Moreover, Prisoner’s Dilemma-like situations are more relevant for e.g. ECL.)

Why this is not surprising

In the MIRI/OP decision theory discussion, Scott Garrabrant suggests that we view different decision theories as locations in the following 2x2x2 grid:

Conditional probability vs. causalist counterfactuals (or ‘EDT vs. CDT’).
Updatefulness vs. updatelessness (or ‘from the perspective of what doxastic state am I making the decision, the prior or posterior?’).
Physicalist vs. algorithmic/logical agent ontology (or ‘an agent is just a particular configuration of matter doing physical things’ vs. ‘an agent is just an algorithm; relating inputs and outputs’).

This results in eight different decision theories, where we can think of UCDT/FDT as updateless CDT in the algorithmic ontology, as opposed to the physicalist.

To give an analogy for the problem I have attempted to explain in this post, consider two perfect updateful CDT copies in the Prisoner’s Dilemma. It is normally said that they will not cooperate because of dominance, no causal effects et cetera, but under one particular physicalist conception of who you are, this might not hold: setting aside issues around spatiotemporal locations^[10], we could say that an actual perfect copy of you is you, such that if you cooperate, your “copy” will also cooperate. (The ontology I have in mind here is one that says that “you” are just an equivalence class, where the relation is identity with respect to all physical [macro-]properties [modulo location-related issues], i.e. something like the ‘identity of indiscernibles’ principle—restricted to ‘agents’.)

Even if we accept this ontology, I would not say that this is a point in favour of (this version of) CDT since this decision problem is utterly unrealistic: even the slightest asymmetry (e.g. a slight difference in the colour of the rooms) would break the identity and thus break the cooperation.

The problem with mutual UCDT/FDT cooperation I have attempted to describe here is arguably completely analogous to the “problem” of how CDT agents do not achieve mutual cooperation in any realistic Prisoner’s Dilemma under this particular physicalist ontology. (The idea is that the algorithmic agent ontology is analogous to the equivalence class-type physicalist agent ontology in that from both perspectives “identity implies control”.)

(Some things are of course different. For example, to my understanding, the usual algorithmic conception of an ‘agent’ is arguably more “minimalistic” than the physicalist: the former does not include specific information about the configuration of matter et cetera; rather, all else equal, the algorithm in and of itself is me, independent of the substrate and its underlying structure: as long as it is implemented, it is me. This makes mutual [logi-]causalist cooperation somewhat more likely.)

Objections

Doing the ideal, partially

As said, the UDT node in the previous graphs is just the conjunction of state-to-action/policy mappings (statements of the form “ $U D T (s) = a$ ”), for all possible states, $S = {s_{1}, . . ., s_{k}}$ . But suppose we partition the state space into $S^{'}$ and $S^{''}$ —i.e. $S^{'} \cap S^{''} = \emptyset$ and $S^{'} \cup S^{''} = S$ —such that we get “UDT for $S^{'}$ ” and “UDT for $S^{''}$ ” (both arguably downstream of “UDT for $S$ ”). (We could of course make it even more fine-grained, perhaps even corresponding to the trivial partition ${{s_{1}}, . . ., {s_{k}}}$ .) Now, it might be the case that both players are actually implementing the ideal version of “UDT for $S^{'}$ ”, e.g. for $S^{'} = {p r o b l e m s t h a t r e q u i r e l e s s t h a n x c o m p u t e t o s o l v e}$ —where the standard Prisoner’s Dilemma perhaps is included—but that they approximate “UDT for $S^{''}$ ”.^[11] We then get the following graph:

When you now intervene on the leftmost decision node, e.g. in a Prisoner’s Dilemma, you then have a logi-causal effect on the action of the other player, and you can ensure mutual cooperation in a UCDT vs. UCDT situation.

On the face of it, this is arguably a somewhat realistic picture: for many decision problems (especially the very idealised ones), it is not that difficult to act according to the ideal theory.

But it seems like this is brittle in similar ways as before. A couple of points:

This depends on the players partitioning $S$ in the exact same way, since it is only then you can get the players to refer to the exact same logical statement. This is highly unlikely. (Recall that we are thinking of the decision theory nodes here as conjunctions of all of the statements corresponding to their action recommendations; i.e. very large tables. For example, this means that even if the partitions differ by even one state, you do not get mutual cooperation.)
- (Furthermore, they would respectively have to have a sufficiently high credence in that they are both doing the same partition.)
- Counter: perhaps the Schelling point is the trivial partition of the state space, and the agents will be able to cooperate and coordinate for some set of situations. But it is (i) not clear that there is a straightforward trivial partition (not at all if the state space is continuous); and (ii) agents (especially those who are approximating the decision theory in the first place) do not necessarily have access to the complete state space in all of its detail.
And why would it be the case, in the real world, that the agents would even be close to doing the same partition? We should probably expect some kinds of asymmetries, e.g. in terms of compute which would plausibly let one agent follow ideal UDT for a larger fraction of the situations.
We are not really interested in the idealised situations: reality is way more messy meaning most agents will never actually “do the ideal” with respect to almost any situation with anything close to probability one (at least the important ones).
- Counter: perhaps there are ways of factoring or partitioning complicated decision situations into smaller and simpler sub-situations. But again this arguably requires both agents to partition or factor the space in the same way, which seems unlikely.

Approximation Schelling points

We said before that the agents will approximate the decision theory differently (and this is the justification for drawing the graphs in the way we have). But perhaps there are Schelling points or general guidelines for how agents should approximate decision theories, such that there is some basis to saying that two agents will, in fact, approximate the theory in the ‘same’ way.

This line of thought seems plausible, but only further supports the point that UEDT is preferable to UCDT: (1) if there are these Schelling points, we should expect the correlation to be higher (and thus a higher probability of mutual cooperation in the UEDT vs. UEDT case); and (2) we still have the (seemingly insurmountable) ‘exactness issues’ in the case of a UCDT-approximator vs. a UCDT-approximator, where the Schelling point in question would have to be extremely, and implausibly, precise. Moreover, I think this implicitly assumes that the agents in question are equals in terms of resources and power, which of course is not realistic. For example, why would two agents with differing levels of compute approximate in the same way? The agent with more compute would then have to forego some expected accuracy in terms of coming closer to what the ideal theory would recommend.^[12]

Other notions of ‘approximation’

As briefly touched upon, I have been relying on a certain conception of how agents approximate a decision theory—something very ‘top-down’. This is not necessarily the most natural or realistic way agents approximate a decision theory.

Bottom-up approximations

As said, we could think that the agents start out with some plausible basic principles that could evolve into something more like the ideal theory as it gets more intelligent, has more resources and information. (See A theory of bounded inductive rationality by Oesterheld et al. for something similar.) Specifically, this might look like two developers deciding on what the initial principles—which of course could be inspired by what they regard as the ideal theory—should be when building their respective AIs, and then letting them out into the world where they proceed to grow more powerful and perhaps can self-modify into something closer to optimal. And then at some later point(s) in time they play some game(s) against each other. When you have two agents approximating UDT in this way, it is prima facie unclear whether you get the same results as before.

In particular, the correlation might in this case be much lower; for example, due to (slightly) different choices of initial principles on the part of the developers, or just mere contingencies in evolution. This means you do not necessarily get mutual cooperation even when what they are approximating is UEDT.

This seems plausible to me, but note that the argument here is merely that “it might be difficult to get mutual cooperation when you have two UEDT-approximators playing against each other as well”, i.e. this is not in and of itself a point in favour of UCDT. Au contraire: now it is even less clear whether there are any logi-causal effects considering the initial principles might differ, and how they might not be derived from the same ideal theory. Furthermore, the correlation might also be sufficient; and perhaps we should expect some high degree of convergence—not divergence—in the aforementioned evolution; and I suppose my point in this post partially amounts to saying that “we want the convergence in question to be UEDT”.

A hacky approach

Although agents do not know many of the action recommendations of the ideal theory for certain (this is why they are approximating the decision theory in the first place), there are structural features of the ideal theory that are known: for one, they know that top-down approximations are downstream of the ideal theory. Prima facie, this implies that insofar as agents are approximating UCDT by “trying to do what ideal UCDT would have recommended”, including the aforementioned structural feature, they will act as if they have a partial logi-causal effect (corresponding to the accuracy of the other agent’s approximation) on the other agent’s choice of action since the approximation of the other agent is downstream of the ideal theory. As such, if both agents reason in this “structural” way, they will both cooperate.^[13]

However, although this seems plausible on the face of it, it is unclear whether this is workable, principled, if the argument is sound to begin with and how this should be formalized. A couple of concrete issues and question marks:

Is there an actual logi-causal effect on the choice of the other player when the structural UCDT-approximator makes its decision in this way? I am not sure, but perhaps this would become clear with a proper formalisation of this type of approximation. If there is not an actual logi-causal effect, on the other hand, then the agents are really just pretending there is one, and it is unclear why they are not then defecting.
But insofar as we grant that there could be logi-causal effects going on here, we can split up the discussion corresponding to the following two games: (i) a structural-UCDT approximator playing the Prisoner’s Dilemma against a structure-less UCDT-approximator; and (ii) a structural UCDT-approximator playing against another structural UCDT-approximator.
- In the first case, the structural approximator imagines itself implementing ideal UCDT, and plays against an approximator who does not imagine itself doing so. Let us entertain this imaginary game. The situation is prima facie quite strange: on one hand, you would think that the ideal UCDT, being upstream of the choice of the approximator, could ensure mutual cooperation with some high probability by cooperating themselves. On the other hand, since the choice of the agent implementing ideal UCDT is not downstream of the approximator’s decision, the approximator will defect.
  - The issue here seems to be that since they are implementing different versions of the decision theory, we can not say that they are both facing the same decision situation. That is, the game is asymmetric, meaning there is not a logi-causal effect of the choice of the ideal UCDTer on the approximator. Illustration below (the ideal UCDTer intervenes on the choice node to the right, and there is no logi-causal effect on the choice node of the other player).^[14]
- Moving on to the second case, where two structural UCDT-approximators are playing against each other (with the knowledge that the other one is indeed approximating the theory in this way): the agents imagine themselves implementing ideal UCDT and playing against someone who approximates structurally. The agents know that such an approximator will also imagine themselves implementing ideal UCDT. This means that the ideal UCDTer will cooperate against the structural approximator if and only if this would make ideal UCDT cooperate against ideal UCDT. But this does not seem to be the case: an ideal UCDTer will just defect against structural approximators because the approximator (foolishly) imagines themselves implementing ideal UCDT and thus they find themselves in a game where both players are implementing ideal UCDT, in which case they will cooperate. That is, when playing against another structural approximator, the agents think that the other player will cooperate, and they will sneakily defect and think they got away with it. But this, of course, cuts both ways, and we will end up with mutual defection.^[15]

In sum, this hacky way of approximating logi-causalist decision theories does not ensure mutual cooperation in the Prisoner’s Dilemma.

Conclusions

An algorithm or decision theory corresponds to a logical statement that is very precise. This means that any slight difference in how agents implement an algorithm creates difficulties for having any subjunctive effect on the actions of other agents. In particular, when two agents are respectively approximating a decision theory, they are going to do this differently, and thus implement different algorithms, which in turn means that the agents do not determine the output of the other agent’s algorithm. In other words, logical causation is brittle, which means that mutual logi-causalist cooperation is brittle.

EDT-style correlational reasoning is not brittle in this way: for good or bad, you just need a sufficient degree of similarity between the agents (in terms of decision theory) as well as a sufficient level of game-theoretic symmetry (and no screening off), and that is it.

In light of this, the following passage from the FDT paper discussing the Twin Prisoner’s Dilemma (wherein FDT is also formulated with the counterfactual cashed out as conditioning on $d o (F D T (P, G) = a)$ ) seems misleading (bold emphasis mine):

…an FDT agent would cooperate, reasoning as follows: “My twin and I follow the same course of reasoning—this one. The question is how this very course of reasoning should conclude. If it concludes that cooperation is better, then we both cooperate and I get $1,000,000. If it concludes that defection is better, then we both defect and I get $1,000. I would be personally better off in the former case, so this course of reasoning hereby concludes cooperate.”

If “twin” is supposed to mean “perfect psychological copy controlling for chaotic effects et cetera” (which I think is the claim in the paper), then this seems true because you truly intervene on the output of both of your algorithms. But for the most part in the real world, they are not going to implement the exact same algorithm and use the exact same decision theory; and, as such, you do not get mutual cooperation.

And the following from Cheating Death in Damascus (also about the Twin Prisoner’s Dilemma, and also FDT with counterfactuals), seems to be incorrect (bold emphasis mine):

The FDT agent reasons as follows: If my decision algorithm were to output defect, then the other agent would output defect too because she’s running (a close approximation of) my algorithm. If my algorithm were to output cooperate, then the other agent would choose that as well. The latter results in a better outcome for me, so I cooperate.

As far as I can tell, no explanation for why “a close approximation” would suffice is given.^[16]

As a final note, I think it is important to not delude ourselves with terms like “success”, “failure”, “wins” and “preferable” (which I have used in this post) in relation to decision theories; UEDT and UCDT are both the “correct” decision theory by their own lights (just as all decision theories): the former maximises expected utility with conditionals from the earlier logical vantage point, and the latter does the same but with logical counterfactuals, and that is that—there is no objective performance metric. See The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld for more on this. Personally, I just want to take it as a primitive that a reasonable decision theory should recommend cooperation in the Prisoner’s Dilemma against a similar (but not necessarily identical) opponent.

So, in this post I made nothing but the following claim: insofar as you want your updateless decision theory to robustly cooperate in the Prisoner’s Dilemma against similar opponents, UEDT is all else equal preferable to UCDT.

^
I.e. we do policy selection instead of action or program selection as in the cases of UDT1.0 and UDT2.0. I think everything in this post should generalise to those theories as well though, mutatis mutandis.
^
Also, the following from MIRI/OP exchange about decision theory: “Wei Dai does not endorse FDT’s focus on causal-graph-style counterpossible reasoning; IIRC he’s holding out for an approach to counterpossible reasoning that falls out of evidential-style conditioning on a logically uncertain distribution.”.
^
Personally, I also think that the Smoking Lesion is unpersuasive: in cases where the premises of the Tickle Defence are satisfied, you should smoke; and when they are not satisfied, there is not really a choice, per se: either you smoke or you do not. See e.g. Ahmed (2014, ch.4) and Understanding the Tickle Defense in Decision Theory by Caspar Oesterheld for more on this.
^
I will return to what I take this to (not) mean in the last section
^
Note that we are using FDT-style causal graphs here, and cashing out the counterfactual of UCDT as a do-operator, despite the following from the MIRI/OP exchange about decision theory: “[O]ne reason we [MIRI decision theorists/NS & EY] do not call [FDT] UDT (or cite Wei Dai much) is that Wei Dai does not endorse FDT’s focus on causal-graph-style counterpossible reasoning; IIRC he’s holding out for an approach to counterpossible reasoning that falls out of evidential-style conditioning on a logically uncertain distribution.”.
^
I am merely saying here that there is something that the agents strive towards; i.e. not making the stronger claim that “the ideal” necessarily has to exist, or even that this notion is meaningful.
^
The thought here is that it is costly to compute all of the correlations, and in most cases just focusing on the causal effects (the ‘perfect correlations’, in one direction) will come close to the recommendations of EDT. Moreover, there is often a Tickle Defence lurking whenever you have some Newcomblike decision problem, meaning EDT and CDT will often recommend the same action. See e.g. Ahmed (2014) for more on this.
^
For more on this see section 2.9 of Caspar Oesterheld’s Multiverse-wide Cooperation via Correlated Decision Making, as well as footnote 15 in this post.
^
Perhaps a more confrontative title of this post would have been ‘UDT/FDT two-boxes’.
^
The issue here is that even if we copy-and-paste a clump of matter, it will not share the exact same location afterwards. And ‘location’ is clearly a property of a clump of matter, meaning the two clumps of matter will not be the same, strictly speaking.
^
h/t Caspar Oesterheld for this suggestion.
^
Perhaps this is solvable if you are updateless with respect to who has the most compute—i.e. we take the perspective of the original position. But (i) this relies on agents having a prior belief that there is a sufficient probability that in another closeby branch you are actually playing the other side of the game (which is not obvious); (ii) this seems like an instance of ‘mixed-upside updatelessness’, which arguably partially reduces to preferences in the way Paul Christiano describes here; and (iii) this means that you are not going to use all of your compute for your decisions, or not using your best approximation for $π$ , and it is unclear if this is something you want.
^
h/t Lukas Finnveden for this suggestion.
^
As touched upon before, you can arguably solve symmetry-issues by being updateless with respect to what side of the game you are playing, which in this case translates to being updateless with respect to whether you are the approximator or the ideal UCDTer. So, at the very best, it seems that you also (in imagining to implement the ideal theory) need to be updateless with respect to the sophistication of the very decision theory you are implementing. However, it is very unclear if this is something you want to be updateless about since this plausibly will influence the quality of your subsequent decisions. For example, perhaps you would need to forgo some amount of your compute for the expected utility calculations, or simply not use your best approximation for π. Additionally, and more importantly perhaps, this only works if both players—the approximator and the ideal UCDTer—are updateless about the sophistication of their decision theory, and it is of course questionable if this is a justified assumption. (Also, as usual, this depends on the priors in question.)
^
h/t Lukas Finnveden again.
^
I would appreciate it if anyone could fill in the blanks here. It could be that the authors think that two approximators will have the same decision theory with some sufficiently high probability, and thus that the agents will be sufficiently certain that they have a logi-causal effect on the action of the other player to warrant cooperation. But this is highly improbable as previously argued.

What links here?

SMK15 Sep 2022 8:55 UTC

42 points

1 comment16 min readLW link

Newcomb's Problem World Modeling Psychological Twin Prisoner's Dilemma Updateless Decision Theory Functional Decision Theory Prisoner's Dilemma Decision theory