One argument for CDT over EDT that you didn’t mention in this post: Suppose you live in a deterministic universe and know your own source code. Suppose you are deciding between taking a 5 dollar bill and a 10 dollar bill. Suppose your world model says you take the 5 dollar bill with 100% probability. Now conditioning on taking the 10 dollar bill gives you complete garbage, since you are conditioning on a probability 0 event. If you use EDT, then depending on other details of your decision procedure, this could lead to you always taking the 5 dollar bill. So then your world model would be accurate. (This is the “5 and 10” problem often discussed at MIRI; I don’t know if it has been written up anywhere)

CDT never generates undefined expected utility estimates like EDT does. It takes the 10 dollar bill in this problem. However, if it always takes the 10 dollar bill, then its counterfactual for taking the 5 dollar bill is strange because it is one in which a physical law is violated. The violation of a physical law could have important consequences other than which action the agent takes.

Both decision theories have trouble with this problem, but at least CDT always produces a defined answer.

Here’s another way of thinking about this problem. A fully Bayesian version of EDT must construct all possible worlds and then condition on taking a certain action. But each of these possible worlds contains a running copy of the EDT algorithm. So, absent some defined method for taking a fixed point, this leads to an infinite loop, and you can’t actually have a fully Bayesian version of EDT.

(What if you use reflective oracles to allow EDT to select some fixed point? We could specify that the reflective oracle returns arbitrary results when asked to condition on a probability 0 event (I think this is what the most natural way to emulate conditional queries on a reflective oracle results in, but I haven’t checked). Now there are multiple possible reflective oracles (i.e. fixed points); it’s possible to always take the 10 dollar bill and think bad things will happen conditional on taking the 5 dollar bill, and it’s also possible to always take the 5 dollar bill and think bad things will happen conditional on taking the 10 dollar bill.)

A fully Bayesian version of CDT must construct all possible counterfactuals. Each of these counterfactuals contains a running copy of CDT, so one might think the same problem applies. But in each of these counterfactuals, the output of the CDT algorithm is “thrown away”, since the agent’s action is controlled by a magic counterfactual intervention rather than its algorithm. So, if the CDT algorithm is sandboxed, the CDT’s world model can simply ignore the running CDT algorithm, as it has no effect. Thus, at least in single-agent problems (with no predictors etc), a fully Bayesian version of CDT is possible in principle, though obviously not in practice.

Some time during or after writing this comment, I noticed something: the equilibrium where the EDT agent thinks it always takes the 5 dollar bill, and therefore gets garbage (possibly low) estimates when considering taking the 10 dollar bill, and therefore never takes the 10 dollar bill, is extremely unstable. As soon as the agent assigns any probability at all to taking the 10 dollar bill, their conditional expected utility estimates are perfect. Can we use this fact to design a variant of EDT that always takes the 10 dollar bill?

Yes, yes we can.

Definitions and main theorem statements

Reflective conditional oracle distributions will be defined similar to in previous work such as reflective oracles and reflective oracle distributions. I recommend understanding reflective oracles before reading this post (understanding reflective oracle distributions is helpful but unnecessary).

Let M be some finite subset of the set of probabilistic Turing machines that may query a conditional oracle (defined in the next paragraph). They must each always halt and return either 0 or 1.

Definition 1: A conditional oracle is a function O:M4→{0,1}.

Intuitively, O(m1,n1,m2,n2) asks whether P(mO′1()|nO′1()) is greater or less than P(mO′2()|nO′2()), where O′ is taken from some distribution over conditional oracles.

We make the assumption that machines in M only ever query the oracle with an element of M4. All further definitions and lemmas/theorems are parameterized on some M satisfying this assumption.

Definition 2: A conditional oracle distribution (COD) is a distribution over conditional oracles, of type Δ(M4→{0,1}).

Definition 3: A COD is fully-mixed if it assigns nonzero probability to each possible conditional oracle.

Since the number of conditional oracles is finite, there are fully-mixed CODs.

Definition 4: A COD Qweakly leads to a COD Q′ iff, for all m1,n1,m2,n2∈M,p∈P:

(Here, the notation PQ(E) refers to the probability of event E when O∼Q.)

Intuitively, Q weakly leads to Q′ iff Q′ accurately answers conditional queries that are about Q. If nO1()=1 for some O and also nO2()=1 for some (possibly different) O and Q is fully-mixed, these conditions are equivalent to:

As notation, let StepWeak:Δ(M4→{0,1})→PowerSet(Δ(M4→{0,1})) map a COD to the set of CODs it weakly leads to. This and related functions will be interpreted as set-valued when referring to their graphs.

Unfortunately, these conditional queries are not sensible when PQ(nO1()=1)=0 or PQ(nO2()=1)=0; the oracle is allowed to return any answer. We will define a stricter notion of “leading to” that will require these answers to be sensible.

Let StepClosure:Δ(M4→{0,1})→PowerSet(Δ(M4→{0,1})) be a set-valued function whose graph is the closure of (the graph of StepWeak restricted to (Q,Q′) where Q is fully-mixed). It will turn out that StepClosure(Q)=StepWeak(Q) for fully-mixed Q.

Equivalently, Q′∈StepClosure(Q) iff there are infinite COD sequences Q1,Q2,… and Q′1,Q′2,… such that (a) each Qi is fully mixed, (b) each Qi weakly leads to Q′i, (c) {Qi}i limits to Q, and (d) {Q′i}i limits to Q′.

Intuitively, StepClosure is equivalent to StepWeak for fully-mixed Q and generalizes to non-fully-mixed Q through taking limits that involve only fully-mixed Q values.

Define StepHull(Q):=ConvexHull(StepClosure(Q)). It will turn out that StepHull(Q)=StepClosure(Q)=StepWeak(Q) for fully-mixed Q.

(Why take the convex hull? This is to make StepHull a Kakutani map. It might be the case that StepClosure(Q) is always convex but I haven’t proven this.)

Definition 5: A COD Qleads to a COD Q′ iff Q′∈StepHull(Q).

Due to Theorem 1, it will turn out that a fully-mixed COD Q leads to Q′ iff it strictly leads to Q′. Leading to is a stronger condition than weak leading to, as will be proven.

At this point we are ready to define a reflection condition:

Definition 6: A COD is reflective iff it leads to itself.

Intuitively, a COD is reflective iff it accurately answers queries that are about itself.

This post’s main results are the following (in addition to the proof that EDT beats 5 and 10):

Theorem 1: For any fully-mixed COD Q, StepWeak(Q)=StepClosure(Q)=StepHull(Q).

Theorem 2: If Q leads to Q′, then it weakly leads to Q′.

Theorem 3: There is a reflective COD.

These are proven at the end of the post.

Defining EDT

EDT can be defined using a reflective COD; this decision theory will be called COEDT. Let the decision problem be described by a Turing machine with an embedded agent, where this Turing machine including its agent may randomize and call a conditional oracle, and which returns either 0 or 1 to represent the agent’s utility (intermediate utilities can be emulated by randomizing). For example, for the 5 and 10 problem, we may define:

FiveTenO(⌈A⌉):=AO()

where ⌈A⌉ is the source code of the agent A.

COEDT is a function from the universe program (which already contains an embedded COEDT agent) to action. The following COEDT variant handles cases where there are only 2 actions:

It uses the conditional oracle to determine if it has a higher chance of winning conditional on taking action 1 or action 0, and takes the action that it is more likely to win conditional on taking.

What does COEDT do on FiveTen? We can take a fixed point:

UO():=FiveTenO(⌈COEDTO(⌈U⌉)⌉)

Let the set of machines considered be

M:={⌈UO()⌉,⌈COEDTO(⌈U⌉)⌉,⌈1−EDTO(⌈U⌉)}

Theorem 4: For any reflective COD Q on M, PQ(COEDTO(⌈U⌉)=1)=1.

Proof:

Informally, this is true because the agent must always take action 1 when queries are about any fully-mixed COD.

Q∈StepHull(Q), so it is a convex combination of CODs in StepClosure(Q). Let that convex combination be ∑kj=1λjQ′j.

Each Q′j∈StepClosure(Q), so we have COD sequences {Qji}i limiting to Q (with each being fully mixed) and {Q′ji}i limiting to Q′j such that each Q′ji∈StepWeak(Qji).

For each i,j, since COEDTO(⌈U⌉) and 1−COEDTO(⌈U⌉) both return 1 for some conditional oracles, and because Qji weakly leads to Q′ji, we have:

Obviously, the first conditional probability is 1 (since it is defined) and the second is 0. Therefore, each PQ′ji(COEDTO(⌈U⌉)=1)=1. This must then also be true for the limit
of {Q′ji}i, i.e. PQ′j(COEDTO(⌈U⌉)=1)=1. This must also be true for the convex combination, i.e. PQ(COEDTO(⌈U⌉)=1)=1.

□

Multiple actions

(This section can be skipped.)

The COEDT defined above only handles problems that have 2 actions. What if there are more than 2 actions? Then we can split the agent into multiple 2-action COEDT agents: the first chooses between taking the first action and passing control to the second agent, the second agent chooses between taking the second action and passing control to the third agent, and so on. For example, here is a 3-action construction of COEDT:

COEDT3O(⌈U⌉):=if COEDTO(⌈U⌉,0)=0 then 0 else 1+COEDTO(⌈U⌉,1)

(Why does COEDT2 have a second argument? This is so the two different COEDT2 agents know which they are and can take different actions.)

One might think this has problems when the first agent expects the second agent to always take a bad action, therefore never defers control to the second agent, and therefore the second agent has no incentive to take a good action (this happens in Nash equilibria in sequential games). However, since we consider fully-mixed oracles in the construction of COEDT, this is not a problem (EDIT: it is sometimes, see comment). To demonstrate this, consider the following 5 and 10 and 15 problem:

UO():=if COEDT3O(⌈U⌉)=0 then Flip(1/2) else COEDT3O(⌈U⌉)−1

where Flip(p) flips a coin and returns 1 with probability p and 0 with probability 1−p.

Theorem 5: For any reflective COD Q on M, PQ(COEDT3O(⌈U⌉)=2)=1.

Proof:

Q∈StepHull(Q), so it is a convex combination of CODs in StepClosure(Q). Let that convex combination be ∑kj=1λjQ′j.

Each Q′j∈StepClosure(Q), so we have COD sequences {Qji}i limiting to Q (with each being fully mixed) and {Q′ji}i limiting to Q′j such that each Q′ji∈StepWeak(Qji).

By the same logic as in Theorem 4, each PQ′ji(COEDT2O(⌈U⌉,1)=1)=1.

For each i,j, since COEDT2O(⌈U⌉,0) and 1−COEDT2O(⌈U⌉,0) both return 1 for some conditional oracles, and because Qji weakly leads to Q′ji, we have:

Obviously, the second conditional probability is ^{1}⁄_{2}. The first is 1 since PQ′ji(COEDT2O(⌈U⌉,1)=1)=1.
Therefore, each PQ′ji(COEDT2O(⌈U⌉,0)=1)=1.

As in Theorem 4, these conditions must then be true for the limits Q′j and the convex combination Q.

□

Conclusion and future research

I consider COEDT to be major progress in decision theory. Before COEDT, there were (as far as I know) 3 different ways to solve 5 and 10, all based on counterfactuals:

Causal counterfactuals (as in CDT), where counterfactuals are worlds where physical magic happens to force the agent’s action to be something specific.

Model-theoretic counterfactuals (as in modal UDT), where counterfactuals are models in which false statements are true, e.g. where PA is inconsistent.

Probabilistic conditionals (as in reinforcement learning and logical inductor based decision theories such as LIEDT/LICDT and asymptotic decision theory), where counterfactuals are possible worlds assigned a small but nonzero probability by the agent in which the agent takes a different action through “exploration”; note that ADT-style optimism is a type of exploration.

COEDT is a new way to solve 5 and 10. My best intuitive understanding is that, whereas ordinary EDT (using ordinary reflective oracles) seeks any equilibrium between beliefs and policy, COEDT specifically seeks a not-extremely-unstable equilibrium (though not necessarily one that is stable in the sense of dynamical systems), where the equilibrium is “justified” by the fact that there are arbitrarily close almost-equilibria. This is similar to trembling hand perfect equilibrium. To the extent that COEDT has counterfactuals, they are these worlds where the oracle distribution is not actually reflective but is very close to the actual oracle distribution, and in which the agent takes a suboptimal action with very small probability.

My sense is that the results in this post open up a wide new territory of open questions and further research. Here are some of them:

What kind of optimality result(s) does COEDT have for single-player problems?

Do infinite reflective CODs exist, as with reflective oracles?

There is a lot of low-hanging fruit here, and I am posting this now before immediately picking the low-hanging fruit in the hope that discussion will be helpful.

Proofs of theorems 1-3 follow.

Proving Theorem 1 and Theorem 2

First we will show that StepWeak is a Kakutani map; this will also help with Theorem 2 (as Theorem 2 will be proven by applying Kakutani’s fixed point theorem to StepHull).

Lemma 1: For each Q, StepWeak(Q) is nonempty and convex.

Proof:

Informally, this is true because for a fixed Q, the constraints on Q′ are convex and consistent.

Let m1,n1,m2,n2∈M. There are 3 cases:

If PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)>PQ(mO2()=1∧nO2()=1)PQ(nO()=1), then the constraint corresponding to (m1,n1,m2,n2) is PQ′(O(m1,n1,m2,n2)=1)=1.

If PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)<PQ(mO2()=1∧nO2()=1)PQ(nO()=1), then the constraint corresponding to (m1,n1,m2,n2) is PQ′(O(m1,n1,m2,n2)=1)=0.

If PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)=PQ(mO2()=1∧nO2()=1)PQ(nO()=1), then there is no constraint corresponding to (m1,n1,m2,n2).

Q′∈StepWeak(Q) iff Q′ satisfies these constraints for all (m1,n1,m2,n2).

Clearly, each of these constraints is convex, since it picks out some hyperplane. Their intersection must then also be convex. So StepWeak(Q) is convex.

To show that StepWeak(Q) is nonempty, we need only consider Q′ that are fully factorized (i.e. the distinct random variables of the form O(m1,n1,m2,n2) are independent). For each (m1,n1,m2,n2), if there is no constraint then set PQ′(O(m1,n1,m2,n2)=1)=1/2. If there is a constraint, set PQ′(O(m1,n1,m2,n2)=1) to the value determined by this constraint. This yields a Q′∈StepWeak(Q).

□

Lemma 2: The graph of StepWeak is closed.

Proof:

Informally, this is true because each constraint on (Q,Q′) is closed.

Let m1,n1,m2,n2∈M,p∈P. Consider the set of (Q,Q′) (not necessarily fully-mixed) satisfying the two constraints:

It is simple to see that the set of (Q,Q′) satisfying the first constraint is closed, because (a) the set of (Q,Q′) satisfying the antecedent of the implication is open, (b) the set of (Q,Q′) satisfying the consequent of the implication is closed, and (c) the union of two closed sets is closed. By similar logic, so is the set of (Q,Q′) satisfying the second constraint. If we take the intersection for all m1,n1,m2,n2, the result is still closed, and this is the graph of StepWeak.

□

Theorem 1: For any fully-mixed COD Q, StepWeak(Q)=StepClosure(Q)=StepHull(Q).

Proof:

Let Q be fully-mixed. First we will show StepWeak(Q)=StepClosure(Q). By Lemma 2, StepWeak‘s graph equals its closure intersected with the set of (Q,Q′) for which Q is fully-mixed. Since StepClosure‘s graph is simply the closure of StepWeak’s graph, they coincide on Q.

Now we will show StepClosure(Q)=StepHull(Q). By Lemma 1, StepClosure(Q) is convex, so it equals its convex hull StepHull(Q).

□

Theorem 2: If a COD Q leads to Q′, then it weakly leads to Q′.

Proof:

Since StepWeak‘s graph is closed, and StepClosure‘s graph is the closure of a subset of StepWeak’s graph, StepClosure‘s graph is a subset of StepWeak’s graph.

Since StepWeak(Q) is a convex superset of StepClosure(Q) for all Q, it must also be a superset of the convex hull StepHull(Q).

□

Proving Theorem 3

Now it is time to show that StepHull is a Kakutani map.

Lemma 3:StepHull has a closed graph.

Proof:

Informally, this is true because a limit point of StepHull‘s graph is the limit of a convergent sequence of convex combinations of points in StepClosure‘s graph (which is closed), which is itself equal to a convex combination of limits of sequences of points in StepClosure‘s graph, and StepClosure’s graph is closed.

Trivially, StepClosure has a closed graph, since its graph is the closure of a set.

Consider a limit point (Q,Q′) of StepHull’s graph. That means we have sequences {Qi}i limiting to Q and {Q′i} limiting to Q′ such that each Q′i∈StepHull(Qi).

Let k=|2M4|. We consider CODs as elements of Rk. Since each Q′i∈ConvexHull(StepClosure(Qi)), it must
by definition be a finite convex combination of CODs in StepClosure(Qi). We can assume without loss of generality that this is a combination of exactly k+1 CODs, by
Carathéodory’s theorem.

We will now name these convex combinations. For each i we have:

A list Q′1i,…,Q′k+1i, each in StepClosure(Qi), and

weights λ1i,…,λk+1i, each in [0,1] and which sum to 1,

such that each Q′i=∑k+1j=1λjQ′ji.

We may now consider Xi:=Q′1i,…,Q′k+1i,λ1i,…,λk+1i as a vector in R(k+1)2+k+1. The sequence of these vectors has a convergent
subsequence by the Bolzano-Weierstrass theorem.

Define Q′1,…,Q′k+1,λ1,…,λk+1 to be the limit of the convergent subsequence, and also call this full vector X. Let the convergent subsequence be {Xi}i. Since StepClosure’s graph is closed, we must have
each Q′j∈StepClosure(Q). Therefore, their convex combination Q′∗:=∑k+1j=1λjQ′j is in StepHull(Q). We will now show Q′=Q′∗.

Consider a function α(q1,…,qk+1,λ1,…,λk+1):=∑k+1j=1λjqj. Each Q′i=α(Xi), so clearly α(Xi) limits to Q′. Additionally, α is continuous, so, α(Xi) limits to α(X)=Q′∗. Therefore Q′=Q′∗.

We have at this point demonstrated Q′∈StepHull(Q), since we already had Q′∗∈StepHull(Q).

□

Lemma 4: For each Q, StepHull(Q) is nonempty and convex.

Proof:

StepHull(Q) is convex since it is the convex hull of a set, so we need only show that it is nonempty. To do this we will show StepClosure(Q) is nonempty; this is sufficient since StepClosure(Q)⊂StepHull(Q).

Let Q0 be any fully-mixed COD. For t∈[0,1), define β(t):=t⋅Q+(1−t)⋅Q0. β is a curve proceeding from Q0 to Q, and every COD in its image is fully-mixed.

Define Qi:=β(1−1/i) for natural i≥1. Define Q′i to be an arbitrary COD satisfying Q′i∈StepWeak(Qi); the sequence {Q′i}i can be defined using the axiom of choice. By the Bolzano-Weierstrass theorem, {Q′i}i has a convergent subsequence; let Q′ be the limit of this subsequence. Clearly, Q′∈StepClosure(Q), since by construction (Q,Q′) is a limit point of the graph of StepWeak. This proves that StepClosure(Q) is nonempty.

□

The proof of Theorem 3 is now trivial:

Theorem 3: There is a reflective COD.

Proof:

StepHull is a Kakutani map by Lemma 3 and Lemma 4. Therefore by Kakutani’s fixed point theorem, it has a fixed point, i.e. a COD Q∈StepHull(Q). By definition Q is reflective.

I realized there is a problem when you have a 3-action problem, where the first 2-action agent chooses between 0 utility and passing control to the second 2-action agent, and the second 2-action agent chooses between ^{1}⁄_{2} and 1 expected utility.

The problem is that there’s a stable equilibrium where the first agent passes off control and the second agent always chooses ^{1}⁄_{2} expected utility. The second agent makes this choice because they think that, if they choose 1, then the first agent will choose 0. The probability that the first agent chooses 0 is 0 but, in some sequence of CODs leading up to the actual COD, the first agent is more likely to choose 0 if the second agent chooses 1.

Basically, irrational threats can be relevant to COEDT even though they happen with probability 0.

We could fix this with a direct construction of queries that can return more than 2 results (as in reflective oracle distributions), but in any case this is a serious problem for sequential decision problems and multi-player common-payoff games.

I’m having trouble following this step of the proof of Theorem 4: “Obviously, the first conditional probability is 1”. Since the COD isn’t necessarily reflective, couldn’t the conditional be anything?

So the oracle is a black box which is always right, but that’s not enough—it’s also a limit of black boxes that are slightly wrong. Good work! (Modulo my usual skepticism about parameterizing decision theory on some black box, which is my personal hangup and shouldn’t stop you from exploring this direction.)

Suppose your world model says you take the 5 dollar bill with 100% probability. Now conditioning on taking the 10 dollar bill gives you complete garbage, since you are conditioning on a probability 0 event.

You can make the agent take the 10 dollar bill in that case, in effect blackmailing the world model: “if you want to stay sound, don’t prove that action A has probability 0”. I’d love to know if that covers all cases. In other words, if there’s an N symbol proof of a “garbage” conditional, must there be an M<N symbol proof of the form “action A has probability 0”?

The agents in this post aren’t proof-based. Proof-based issues have some issues with weird counterfactuals.
Perhaps the only worlds where you take some specific action are ones where PA is inconsistent. (COEDT also has issues since the nearby oracles are not reflective, but it’s a different set of issues)

In general queries to reflective-oracle-like world models that have forms like “is the probability of this exactly 0?” are problematic, since they are vulnerable to liar’s paradoxes. What if you take action A iff the probability of taking action A is exactly 0? So to the extent that this works for proof-based agents, it’s because they’re not complete in some sense.

I am not sure that if there is a proof of a garbage conditional then there will be a similarly-short proof that the conditional is garbage. Say “proving a garbage conditional P(A|B)” means proving P(A^B)=cP(B) for some c such that in fact P(B)=0. I could imagine a case where it is easy to prove the event A equivalent to the event B, but hard to prove P(B)=0. Then you could prove P(A^B)=P(B) easily but not P(B)=0. (This case might not be problematic from a decision theory perspective, which is interesting, but it’s still an invalid conditional)

Yeah, my comment was more about proof-based agents rather than oracle-based, sorry about going off topic. In a proof-based setting, conditionals with P(B)=0 aren’t necessarily garbage, some of them are intended and we want the agent to find them. Your example might be one of those. The hard part is defining which is which.

It seems to me like the conditional oracle’s definition could be made more elegant by taking only m and n as a parameter, both of which take an action as a parameter. The oracle would then implement argmaxaP(mO′(a)|nO′(a)).

## EDT solves 5 and 10 with conditional oracles

## Introduction and motivation

The starting point for this post is a comment I wrote on Paul’s post on EDT vs CDT:

Some time during or after writing this comment, I noticed something: the equilibrium where the EDT agent thinks it always takes the 5 dollar bill, and therefore gets garbage (possibly low) estimates when considering taking the 10 dollar bill, and therefore never takes the 10 dollar bill, is

extremelyunstable. As soon as the agent assigns any probability at all to taking the 10 dollar bill, their conditional expected utility estimates are perfect. Can we use this fact to design a variant of EDT that always takes the 10 dollar bill?Yes, yes we can.

## Definitions and main theorem statements

Reflective conditional oracle distributions will be defined similar to in previous work such as reflective oracles and reflective oracle distributions. I recommend understanding reflective oracles before reading this post (understanding reflective oracle distributions is helpful but unnecessary).

Let M be some finite subset of the set of probabilistic Turing machines that may query a conditional oracle (defined in the next paragraph). They must each always halt and return either 0 or 1.

Definition 1:Aconditional oracleis a function O:M4→{0,1}.Intuitively, O(m1,n1,m2,n2) asks whether P(mO′1()|nO′1()) is greater or less than P(mO′2()|nO′2()), where O′ is taken from some distribution over conditional oracles.

We make the assumption that machines in M only ever query the oracle with an element of M4. All further definitions and lemmas/theorems are parameterized on some M satisfying this assumption.

Definition 2:Aconditional oracle distribution(COD) is a distribution over conditional oracles, of type Δ(M4→{0,1}).Definition 3:A COD isfully-mixedif it assigns nonzero probability to each possible conditional oracle.Since the number of conditional oracles is finite, there are fully-mixed CODs.

Definition 4:A COD Qweakly leads toa COD Q′ iff, for all m1,n1,m2,n2∈M,p∈P:PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)>PQ(mO2()=1∧nO2()=1)PQ(nO()=1)→PQ′(O(m1,n1,m2,n2)=1)=1

PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)<PQ(mO2()=1∧nO2()=1)PQ(nO()=1)→PQ′(O(m1,n1,m2,n2)=0)=1

(Here, the notation PQ(E) refers to the probability of event E when O∼Q.)

Intuitively, Q weakly leads to Q′ iff Q′ accurately answers conditional queries that are about Q. If nO1()=1 for some O and also nO2()=1 for some (possibly different) O and Q is fully-mixed, these conditions are equivalent to:

PQ(mO1()=1∣nO1()=1)>PQ(m−oO()=1∣nO2()=1)→PQ′(O(m1,n1,m2,n2)=1)=1

PQ(mO1()=1∣nO1()=1)<PQ(m−oO()=1∣nO2()=1)→PQ′(O(m1,n1,m2,n2)=0)=1

As notation, let StepWeak:Δ(M4→{0,1})→PowerSet(Δ(M4→{0,1})) map a COD to the set of CODs it weakly leads to. This and related functions will be interpreted as set-valued when referring to their graphs.

Unfortunately, these conditional queries are not sensible when PQ(nO1()=1)=0 or PQ(nO2()=1)=0; the oracle is allowed to return any answer. We will define a stricter notion of “leading to” that will require these answers to be sensible.

Let StepClosure:Δ(M4→{0,1})→PowerSet(Δ(M4→{0,1})) be a set-valued function whose graph is the closure of (the graph of StepWeak restricted to (Q,Q′) where Q is fully-mixed). It will turn out that StepClosure(Q)=StepWeak(Q) for fully-mixed Q.

Equivalently, Q′∈StepClosure(Q) iff there are infinite COD sequences Q1,Q2,… and Q′1,Q′2,… such that (a) each Qi is fully mixed, (b) each Qi weakly leads to Q′i, (c) {Qi}i limits to Q, and (d) {Q′i}i limits to Q′.

Intuitively, StepClosure is equivalent to StepWeak for fully-mixed Q and generalizes to non-fully-mixed Q through taking limits that involve only fully-mixed Q values.

Define StepHull(Q):=ConvexHull(StepClosure(Q)). It will turn out that StepHull(Q)=StepClosure(Q)=StepWeak(Q) for fully-mixed Q.

(Why take the convex hull? This is to make StepHull a Kakutani map. It might be the case that StepClosure(Q) is always convex but I haven’t proven this.)

Definition 5:A COD Qleads toa COD Q′ iff Q′∈StepHull(Q).Due to Theorem 1, it will turn out that a fully-mixed COD Q leads to Q′ iff it strictly leads to Q′. Leading to is a stronger condition than weak leading to, as will be proven.

At this point we are ready to define a reflection condition:

Definition 6:A COD is reflective iff it leads to itself.Intuitively, a COD is reflective iff it accurately answers queries that are about itself.

This post’s main results are the following (in addition to the proof that EDT beats 5 and 10):

Theorem 1:For any fully-mixed COD Q, StepWeak(Q)=StepClosure(Q)=StepHull(Q).Theorem 2:If Q leads to Q′, then it weakly leads to Q′.Theorem 3:There is a reflective COD.These are proven at the end of the post.

## Defining EDT

EDT can be defined using a reflective COD; this decision theory will be called COEDT. Let the decision problem be described by a Turing machine with an embedded agent, where this Turing machine including its agent may randomize and call a conditional oracle, and which returns either 0 or 1 to represent the agent’s utility (intermediate utilities can be emulated by randomizing). For example, for the 5 and 10 problem, we may define:

FiveTenO(⌈A⌉):=AO()

where ⌈A⌉ is the source code of the agent A.

COEDT is a function from the universe program (which already contains an embedded COEDT agent) to action. The following COEDT variant handles cases where there are only 2 actions:

COEDTO(⌈U⌉):=O(⌈UO()⌉,⌈EDTO(⌈U⌉)⌉,⌈UO()⌉,⌈1−EDTO(⌈U⌉)⌉)

It uses the conditional oracle to determine if it has a higher chance of winning conditional on taking action 1 or action 0, and takes the action that it is more likely to win conditional on taking.

What does COEDT do on FiveTen? We can take a fixed point:

UO():=FiveTenO(⌈COEDTO(⌈U⌉)⌉)

Let the set of machines considered be

M:={⌈UO()⌉,⌈COEDTO(⌈U⌉)⌉,⌈1−EDTO(⌈U⌉)}

Theorem 4:For any reflective COD Q on M, PQ(COEDTO(⌈U⌉)=1)=1.Proof:Informally, this is true because the agent must always take action 1 when queries are about any fully-mixed COD.

Q∈StepHull(Q), so it is a convex combination of CODs in StepClosure(Q). Let that convex combination be ∑kj=1λjQ′j.

Each Q′j∈StepClosure(Q), so we have COD sequences {Qji}i limiting to Q (with each being fully mixed) and {Q′ji}i limiting to Q′j such that each Q′ji∈StepWeak(Qji).

For each i,j, since COEDTO(⌈U⌉) and 1−COEDTO(⌈U⌉) both return 1 for some conditional oracles, and because Qji weakly leads to Q′ji, we have:

PQji(UO()=1∣COEDTO(⌈U⌉)=1)>PQji(UO()=1∣COEDTO(⌈U⌉)=0)→PQ′ji(COEDTO()=1)=1

Obviously, the first conditional probability is 1 (since it is defined) and the second is 0. Therefore, each PQ′ji(COEDTO(⌈U⌉)=1)=1. This must then also be true for the limit of {Q′ji}i, i.e. PQ′j(COEDTO(⌈U⌉)=1)=1. This must also be true for the convex combination, i.e. PQ(COEDTO(⌈U⌉)=1)=1.

□

## Multiple actions

(This section can be skipped.)

The COEDT defined above only handles problems that have 2 actions. What if there are more than 2 actions? Then we can split the agent into multiple 2-action COEDT agents: the first chooses between taking the first action and passing control to the second agent, the second agent chooses between taking the second action and passing control to the third agent, and so on. For example, here is a 3-action construction of COEDT:

COEDT2O(⌈U⌉,k):=O(⌈UO()⌉,⌈COEDTO(⌈U⌉,k)=1⌉,⌈UO()⌉,⌈COEDTO(⌈U⌉,k)=1⌉)

COEDT3O(⌈U⌉):=if COEDTO(⌈U⌉,0)=0 then 0 else 1+COEDTO(⌈U⌉,1)

(Why does COEDT2 have a second argument? This is so the two different COEDT2 agents know which they are and can take different actions.)

One might think this has problems when the first agent expects the second agent to always take a bad action, therefore never defers control to the second agent, and therefore the second agent has no incentive to take a good action (this happens in Nash equilibria in sequential games). However, since we consider fully-mixed oracles in the construction of COEDT, this is not a problem (EDIT: it is sometimes, see comment). To demonstrate this, consider the following 5 and 10 and 15 problem:

UO():=if COEDT3O(⌈U⌉)=0 then Flip(1/2) else COEDT3O(⌈U⌉)−1

where Flip(p) flips a coin and returns 1 with probability p and 0 with probability 1−p.

Let the set of machines considered be

M:={⌈UO()⌉,⌈COEDT2O(⌈U⌉,0)⌉,⌈1−COEDT2O(⌈U⌉,0),⌈COEDT2O(⌈U⌉,1)⌉,⌈1−COEDT2O(⌈U⌉,1)}

Theorem 5:For any reflective COD Q on M, PQ(COEDT3O(⌈U⌉)=2)=1.Proof:Q∈StepHull(Q), so it is a convex combination of CODs in StepClosure(Q). Let that convex combination be ∑kj=1λjQ′j.

Each Q′j∈StepClosure(Q), so we have COD sequences {Qji}i limiting to Q (with each being fully mixed) and {Q′ji}i limiting to Q′j such that each Q′ji∈StepWeak(Qji).

By the same logic as in Theorem 4, each PQ′ji(COEDT2O(⌈U⌉,1)=1)=1.

For each i,j, since COEDT2O(⌈U⌉,0) and 1−COEDT2O(⌈U⌉,0) both return 1 for some conditional oracles, and because Qji weakly leads to Q′ji, we have:

PQji(UO()=1∣COEDT2O(⌈U⌉,0)=1)>PQji(UO()=1∣COEDT2O(⌈U⌉,0)=0)→PQ′ji(COEDT2O()=1)=1

Obviously, the second conditional probability is

^{1}⁄_{2}. The first is 1 since PQ′ji(COEDT2O(⌈U⌉,1)=1)=1. Therefore, each PQ′ji(COEDT2O(⌈U⌉,0)=1)=1.As in Theorem 4, these conditions must then be true for the limits Q′j and the convex combination Q.

□

## Conclusion and future research

I consider COEDT to be major progress in decision theory. Before COEDT, there were (as far as I know) 3 different ways to solve 5 and 10, all based on counterfactuals:

Causal counterfactuals (as in CDT), where counterfactuals are worlds where physical magic happens to force the agent’s action to be something specific.

Model-theoretic counterfactuals (as in modal UDT), where counterfactuals are models in which false statements are true, e.g. where PA is inconsistent.

Probabilistic conditionals (as in reinforcement learning and logical inductor based decision theories such as LIEDT/LICDT and asymptotic decision theory), where counterfactuals are possible worlds assigned a small but nonzero probability by the agent in which the agent takes a different action through “exploration”; note that ADT-style optimism is a type of exploration.

COEDT is a new way to solve 5 and 10. My best intuitive understanding is that, whereas ordinary EDT (using ordinary reflective oracles) seeks any equilibrium between beliefs and policy, COEDT specifically seeks a not-extremely-unstable equilibrium (though not necessarily one that is stable in the sense of dynamical systems), where the equilibrium is “justified” by the fact that there are arbitrarily close almost-equilibria. This is similar to trembling hand perfect equilibrium. To the extent that COEDT has counterfactuals, they are these worlds where the oracle distribution is not actually reflective but is very close to the actual oracle distribution, and in which the agent takes a suboptimal action with very small probability.

My sense is that the results in this post open up a wide new territory of open questions and further research. Here are some of them:

What kind of optimality result(s) does COEDT have for single-player problems?

Do infinite reflective CODs exist, as with reflective oracles?

Is the set of reflective CODs convex (as the set of reflective oracle distributions is)?

Can this approach be integrated with logical uncertainty (e.g. logical inductors)?

What happens in games with more than one COEDT? What is the equilibrium concept?

Are there optimality results for common-payoff games, or Pareto-optimality results for non-common-payoff games?

Can COEDT be attacked with a “troll bridge” problem similar to the one for LIEDT/LICDT?

There is a lot of low-hanging fruit here, and I am posting this now before immediately picking the low-hanging fruit in the hope that discussion will be helpful.

Proofs of theorems 1-3 follow.

## Proving Theorem 1 and Theorem 2

First we will show that StepWeak is a Kakutani map; this will also help with Theorem 2 (as Theorem 2 will be proven by applying Kakutani’s fixed point theorem to StepHull).

Lemma 1: For each Q, StepWeak(Q) is nonempty and convex.Proof:Informally, this is true because for a fixed Q, the constraints on Q′ are convex and consistent.

Let m1,n1,m2,n2∈M. There are 3 cases:

If PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)>PQ(mO2()=1∧nO2()=1)PQ(nO()=1), then the constraint corresponding to (m1,n1,m2,n2) is PQ′(O(m1,n1,m2,n2)=1)=1.

If PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)<PQ(mO2()=1∧nO2()=1)PQ(nO()=1), then the constraint corresponding to (m1,n1,m2,n2) is PQ′(O(m1,n1,m2,n2)=1)=0.

If PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)=PQ(mO2()=1∧nO2()=1)PQ(nO()=1), then there is no constraint corresponding to (m1,n1,m2,n2).

Q′∈StepWeak(Q) iff Q′ satisfies these constraints for all (m1,n1,m2,n2).

Clearly, each of these constraints is convex, since it picks out some hyperplane. Their intersection must then also be convex. So StepWeak(Q) is convex.

To show that StepWeak(Q) is nonempty, we need only consider Q′ that are fully factorized (i.e. the distinct random variables of the form O(m1,n1,m2,n2) are independent). For each (m1,n1,m2,n2), if there is no constraint then set PQ′(O(m1,n1,m2,n2)=1)=1/2. If there is a constraint, set PQ′(O(m1,n1,m2,n2)=1) to the value determined by this constraint. This yields a Q′∈StepWeak(Q).

□

Lemma 2: The graph of StepWeak is closed.Proof:Informally, this is true because each constraint on (Q,Q′) is closed.

Let m1,n1,m2,n2∈M,p∈P. Consider the set of (Q,Q′) (not necessarily fully-mixed) satisfying the two constraints:

PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)>PQ(mO2()=1∧nO2()=1)PQ(nO()=1)→PQ′(O(m1,n1,m2,n2)=1)=1

PQ(mO1()=1∧nO1()=1)PQ(nO2()=1)<PQ(mO2()=1∧nO2()=1)PQ(nO()=1)→PQ′(O(m1,n1,m2,n2)=0)=1

It is simple to see that the set of (Q,Q′) satisfying the first constraint is closed, because (a) the set of (Q,Q′) satisfying the antecedent of the implication is open, (b) the set of (Q,Q′) satisfying the consequent of the implication is closed, and (c) the union of two closed sets is closed. By similar logic, so is the set of (Q,Q′) satisfying the second constraint. If we take the intersection for all m1,n1,m2,n2, the result is still closed, and this is the graph of StepWeak.

□

Theorem 1:For any fully-mixed COD Q, StepWeak(Q)=StepClosure(Q)=StepHull(Q).Proof:Let Q be fully-mixed. First we will show StepWeak(Q)=StepClosure(Q). By Lemma 2, StepWeak‘s graph equals its closure intersected with the set of (Q,Q′) for which Q is fully-mixed. Since StepClosure‘s graph is simply the closure of StepWeak’s graph, they coincide on Q.

Now we will show StepClosure(Q)=StepHull(Q). By Lemma 1, StepClosure(Q) is convex, so it equals its convex hull StepHull(Q).

□

Theorem 2:If a COD Q leads to Q′, then it weakly leads to Q′.Proof:Since StepWeak‘s graph is closed, and StepClosure‘s graph is the closure of a subset of StepWeak’s graph, StepClosure‘s graph is a subset of StepWeak’s graph.

Since StepWeak(Q) is a convex superset of StepClosure(Q) for all Q, it must also be a superset of the convex hull StepHull(Q).

□

## Proving Theorem 3

Now it is time to show that StepHull is a Kakutani map.

Lemma 3:StepHull has a closed graph.Proof:Informally, this is true because a limit point of StepHull‘s graph is the limit of a convergent sequence of convex combinations of points in StepClosure‘s graph (which is closed), which is itself equal to a convex combination of limits of sequences of points in StepClosure‘s graph, and StepClosure’s graph is closed.

Trivially, StepClosure has a closed graph, since its graph is the closure of a set.

Consider a limit point (Q,Q′) of StepHull’s graph. That means we have sequences {Qi}i limiting to Q and {Q′i} limiting to Q′ such that each Q′i∈StepHull(Qi).

Let k=|2M4|. We consider CODs as elements of Rk. Since each Q′i∈ConvexHull(StepClosure(Qi)), it must by definition be a finite convex combination of CODs in StepClosure(Qi). We can assume without loss of generality that this is a combination of exactly k+1 CODs, by Carathéodory’s theorem.

We will now name these convex combinations. For each i we have:

A list Q′1i,…,Q′k+1i, each in StepClosure(Qi), and

weights λ1i,…,λk+1i, each in [0,1] and which sum to 1,

such that each Q′i=∑k+1j=1λjQ′ji.

We may now consider Xi:=Q′1i,…,Q′k+1i,λ1i,…,λk+1i as a vector in R(k+1)2+k+1. The sequence of these vectors has a convergent subsequence by the Bolzano-Weierstrass theorem.

Define Q′1,…,Q′k+1,λ1,…,λk+1 to be the limit of the convergent subsequence, and also call this full vector X. Let the convergent subsequence be {Xi}i. Since StepClosure’s graph is closed, we must have each Q′j∈StepClosure(Q). Therefore, their convex combination Q′∗:=∑k+1j=1λjQ′j is in StepHull(Q). We will now show Q′=Q′∗.

Consider a function α(q1,…,qk+1,λ1,…,λk+1):=∑k+1j=1λjqj. Each Q′i=α(Xi), so clearly α(Xi) limits to Q′. Additionally, α is continuous, so, α(Xi) limits to α(X)=Q′∗. Therefore Q′=Q′∗.

We have at this point demonstrated Q′∈StepHull(Q), since we already had Q′∗∈StepHull(Q).

□

Lemma 4:For each Q, StepHull(Q) is nonempty and convex.Proof:StepHull(Q) is convex since it is the convex hull of a set, so we need only show that it is nonempty. To do this we will show StepClosure(Q) is nonempty; this is sufficient since StepClosure(Q)⊂StepHull(Q).

Let Q0 be any fully-mixed COD. For t∈[0,1), define β(t):=t⋅Q+(1−t)⋅Q0. β is a curve proceeding from Q0 to Q, and every COD in its image is fully-mixed.

Define Qi:=β(1−1/i) for natural i≥1. Define Q′i to be an arbitrary COD satisfying Q′i∈StepWeak(Qi); the sequence {Q′i}i can be defined using the axiom of choice. By the Bolzano-Weierstrass theorem, {Q′i}i has a convergent subsequence; let Q′ be the limit of this subsequence. Clearly, Q′∈StepClosure(Q), since by construction (Q,Q′) is a limit point of the graph of StepWeak. This proves that StepClosure(Q) is nonempty.

□

The proof of Theorem 3 is now trivial:

Theorem 3:There is a reflective COD.Proof:StepHull is a Kakutani map by Lemma 3 and Lemma 4. Therefore by Kakutani’s fixed point theorem, it has a fixed point, i.e. a COD Q∈StepHull(Q). By definition Q is reflective.

□

I realized there is a problem when you have a 3-action problem, where the first 2-action agent chooses between 0 utility and passing control to the second 2-action agent, and the second 2-action agent chooses between

^{1}⁄_{2}and 1 expected utility.The problem is that there’s a stable equilibrium where the first agent passes off control and the second agent always chooses

^{1}⁄_{2}expected utility. The second agent makes this choice because they think that, if they choose 1, then the first agent will choose 0. The probability that the first agent chooses 0 is 0 but, in some sequence of CODs leading up to the actual COD, the first agent is more likely to choose 0 if the second agent chooses 1.Basically, irrational threats can be relevant to COEDT even though they happen with probability 0.

We could fix this with a direct construction of queries that can return more than 2 results (as in reflective oracle distributions), but in any case this is a serious problem for sequential decision problems and multi-player common-payoff games.

I’m having trouble following this step of the proof of Theorem 4: “Obviously, the first conditional probability is 1”. Since the COD isn’t necessarily reflective, couldn’t the conditional be anything?

By definition UO()=FiveTenO(⌈COEDTO(⌈U⌉)⌉)=COEDTO(⌈U⌉), regardless of O. (The subscript Qji to P only affects the distribution of O)

EDIT: clarified notation in the post

So the oracle is a black box which is always right, but that’s not enough—it’s also a limit of black boxes that are slightly wrong. Good work! (Modulo my usual skepticism about parameterizing decision theory on some black box, which is my personal hangup and shouldn’t stop you from exploring this direction.)

You can make the agent take the 10 dollar bill in that case, in effect blackmailing the world model: “if you want to stay sound, don’t prove that action A has probability 0”. I’d love to know if that covers all cases. In other words, if there’s an N symbol proof of a “garbage” conditional, must there be an M<N symbol proof of the form “action A has probability 0”?

The agents in this post aren’t proof-based. Proof-based issues have some issues with weird counterfactuals. Perhaps the only worlds where you take some specific action are ones where PA is inconsistent. (COEDT also has issues since the nearby oracles are not reflective, but it’s a different set of issues)

In general queries to reflective-oracle-like world models that have forms like “is the probability of this exactly 0?” are problematic, since they are vulnerable to liar’s paradoxes. What if you take action A iff the probability of taking action A is exactly 0? So to the extent that this works for proof-based agents, it’s because they’re not complete in some sense.

I am not sure that if there is a proof of a garbage conditional then there will be a similarly-short proof that the conditional is garbage. Say “proving a garbage conditional P(A|B)” means proving P(A^B)=cP(B) for some c such that in fact P(B)=0. I could imagine a case where it is easy to prove the event A equivalent to the event B, but hard to prove P(B)=0. Then you could prove P(A^B)=P(B) easily but not P(B)=0. (This case might not be problematic from a decision theory perspective, which is interesting, but it’s still an invalid conditional)

Yeah, my comment was more about proof-based agents rather than oracle-based, sorry about going off topic. In a proof-based setting, conditionals with P(B)=0 aren’t necessarily garbage, some of them are intended and we want the agent to find them. Your example might be one of those. The hard part is defining which is which.

It seems to me like the conditional oracle’s definition could be made more elegant by taking only m and n as a parameter, both of which take an action as a parameter. The oracle would then implement argmaxaP(mO′(a)|nO′(a)).