Summary: Reflective oracles correspond to Nash equilibria. A correlated version of reflective oracles exists and corresponds to correlated equilibria. The set of these objects is convex, which is useful.

(thanks to Sam for a lot of help with coming up with constructions)

Motivation

Reflective oracles assign a probability to each query. The set of reflective oracles is not convex. For example, consider a machine MO():=O(M,0.5) (i.e. a machine that asks whether the probability that it returns 1 is at least 0.5). There are reflective oracles that assign probabilities 0, 0.5, and 1 to the query O(M,0.5), and there aren’t reflective oracles that assign other probabilities. So the set of possible reflective oracles isn’t convex. This is for the same reason that the set of Nash equilibria isn’t always convex.

Related to the non-convexity, there isn’t in general a continuous way of mapping the numerical parameters of some Turing machines to a reflective oracle for those machines (or doing so as a Kakutani map). (Like there isn’t a way of Kakutani-mapping the parameters of a game to an ϵ-approximate Nash equilibrium for that game).

This makes decision problems involving reflective oracles harder to analyze: the function mapping agents’ policies to the resulting reflective oracle will be discontinuous.

So to analyze these decision problems, it might be useful to construct a convex set analogous to the non-convex set of reflective oracles. Luckily, the set of correlated equilibria in a game is convex. This post presents an analogue of them in reflective oracle land.

Setup

An oracle (mapping from queries to answers) will be selected at random from some distribution. A query can ask about the distribution over oracles, conditional on the answer to that query.

Definitions: machines, oracles, queries

Let M be some finite set of Turing machines that ZFC-provably halt on every input. (We could in principle deal with both infinite sets of Turing machines and possibly-non-halting Turing machines, as ordinary reflective oracles do, but this complicates things somewhat).

An oracle O maps each machine in M to a natural number. The set of oracles is O:=M→N.

An oracle distribution D is a probability distribution over oracles. The set of oracle distributions is D:=ΔO.

A query to an oracle is a way of asking about an oracle distribution in an “argmax” fashion. A query q is represented as a list of functions q1,q2,...,ql(q):O→R. Write q(D):=argmaxi∈{1,2,...,kq}ED[qi(O)]; in words, the query asks for some i such that qi has maximum expectation on the oracle distribution. Note that:

q(D) is non-empty for each D

for each i∈{1,2,...,kq}, {D|i∈q(D)} is convex

Let Q be the set of queries. Roughly, the set of allowed queries are those that partition the set of oracle distributions into a bunch of convex polygons; this allows for arbitrarily fine-precision queries about the distribution.

Let us interpret the output of each Turing machine (on the empty input) as a query. Some encoding scheme is necessary; I don’t think it matters much. So assume we have a map Eval:M→Q. (The main reason for representing queries using Turing machines is to allow quining).

We will impose one additional restriction: for any machine M, the query q it outputs must not depend on the distribution over O(M) (i.e. the query is actually only a function of the joint distribution over each O(M′) value other than O(M) itself). This is to avoid liar’s paradoxes.

Reflectivity

An oracle distribution D is reflective if, for each M∈M, and each a such that D(O(M)=a)>0:

a∈Eval(M)(Condition(D,M,a))

where Condition(D,M,a) is an oracle distribution formed by conditioning the oracle distribution D on the event O(M)=a.

In words, an oracle distribution is reflective if each possible answer to each query is correct thing to say about the distribution over oracles conditional on the answer to that query.

Theorem 1: A reflective joint oracle exists.

Proof:

This oracle distribution will independently roll a die for each query to produce O. Specifically, it is formed by a mapping ~O:M→ΔN. Let the function d map ~O to an oracle distribution through independent sampling.

Define f(~O):=×M∈MΔ(Eval(M)(d(~O)), where × indicates a Cartesian product, and ΔS is the set of probability distributions over the set S (which is finite in this case). Note that f(~O) is always non-empty and that f is upper-hemicontinuous (by Berge’s Maximum Theorem). So by Kakutani’s Fixed Point Theorem, there is some mapping ~O such that ~O∈f(~O).

Consider the oracle distribution d(~O). For any M∈M, and O∈Support(d(~O)), we have

where the second equality follows from (a) the fact that Eval(M) does not depend on the distribution over O(M) and (b) the fact that all O(M) values are independent under d(~O). So d(O∗) is reflective.

□

Theorem 2: The set of reflective oracle distributions is convex.

Proof:

Let D0,D1 be reflective oracle distributions. Let θ∈(0,1). Define Dθ:=θD0+(1−θ)D1. This proof will show that Dθ is reflective.

Let 1≤i≤n. Let M∈M. Let a be such that Dθ(O(M)=a)>0. Consider 3 cases:

D0(O(M)=a)>0,D1(O(M)=a)=0. Then a∈Eval(M)(Condition(D0,M,a)). But since D1(O(M)=a)=0, we have Condition(Dθ,M,a)=Condition(D0,M,a). So a∈Eval(M)(Condition(Dθ,M,a)) as desired.

D0(O(M)=a)=0,D1(O(M)=a)>0. This is analogous to case 1.

D0(O(M)=a)>0,D1(O(M)=a)>0. We have a∈Eval(M)(Condition(D0,M,a)) and a∈Eval(M)(Condition(D1,M,a)). Note that Condition(Dθ,M,a) is a convex combination of Condition(D0,M,a) and Condition(D1,M,a) (this is a basic property of mixture distributions: the conditioning of a mixture of components is a mixture of the conditionings of the components). The set {D|a∈Eval(M)(D)} is convex (from Eval(M) being a query); therefore a∈Eval(M)(Condition(Dθ,M,a)), as desired.

□

Correspondence with correlated equilibria

Reflective oracle distributions can be used to find correlated equilibria. Say we have a normal-form game with n players, where each player i selects an action Ai∈Ai, and receives utility Ui(A1,...,An). Define queries as follows:

where the machine Mi computes a representation of the query qi (the recursion works through mutual quining). It’s easy to show that reflective oracle distributions (for the set of machines {M1,...,Mn}) correspond exactly with correlated equilibria in the game.

Thus, as ordinary reflective oracles naturally yield Nash equilibria when causal decision theorists using them play games with each other, so do reflective oracle distributions naturally yield correlated equilibria when causal decision theorists using them play games with each other.

ϵ-approximate correlated equilibria can be computed in polynomial time (see lectures 17+18 here). I conjecture that ϵ-approximate reflective oracle distributions over a finite set of queries can also be found in polynomial time, perhaps by reducing the problem of finding ϵ-approximate reflective oracle distributions to the problem of finding ϵ-approximate correlated equilibria.

## A correlated analogue of reflective oracles

Summary: Reflective oracles correspond to Nash equilibria. A correlated version of reflective oracles exists and corresponds to correlated equilibria. The set of these objects is convex, which is useful.

(thanks to Sam for a lot of help with coming up with constructions)

## Motivation

Reflective oracles assign a probability to each query. The set of reflective oracles is not convex. For example, consider a machine MO():=O(M,0.5) (i.e. a machine that asks whether the probability that it returns 1 is at least 0.5). There are reflective oracles that assign probabilities 0, 0.5, and 1 to the query O(M,0.5), and there aren’t reflective oracles that assign other probabilities. So the set of possible reflective oracles isn’t convex. This is for the same reason that the set of Nash equilibria isn’t always convex.

Related to the non-convexity, there isn’t in general a continuous way of mapping the numerical parameters of some Turing machines to a reflective oracle for those machines (or doing so as a Kakutani map). (Like there isn’t a way of Kakutani-mapping the parameters of a game to an ϵ-approximate Nash equilibrium for that game).

This makes decision problems involving reflective oracles harder to analyze: the function mapping agents’ policies to the resulting reflective oracle will be discontinuous.

So to analyze these decision problems, it might be useful to construct a convex set analogous to the non-convex set of reflective oracles. Luckily, the set of correlated equilibria in a game is convex. This post presents an analogue of them in reflective oracle land.

## Setup

An oracle (mapping from queries to answers) will be selected at random from some distribution. A query can ask about the distribution over oracles, conditional on the answer to that query.

## Definitions: machines, oracles, queries

Let M be some finite set of Turing machines that ZFC-provably halt on every input. (We could in principle deal with both infinite sets of Turing machines and possibly-non-halting Turing machines, as ordinary reflective oracles do, but this complicates things somewhat).

An oracle O maps each machine in M to a natural number. The set of oracles is O:=M→N.

An oracle distribution D is a probability distribution over oracles. The set of oracle distributions is D:=ΔO.

A query to an oracle is a way of asking about an oracle distribution in an “argmax” fashion. A query q is represented as a list of functions q1,q2,...,ql(q):O→R. Write q(D):=argmaxi∈{1,2,...,kq}ED[qi(O)]; in words, the query asks for some i such that qi has maximum expectation on the oracle distribution. Note that:

q(D) is non-empty for each D

for each i∈{1,2,...,kq}, {D|i∈q(D)} is convex

Let Q be the set of queries. Roughly, the set of allowed queries are those that partition the set of oracle distributions into a bunch of convex polygons; this allows for arbitrarily fine-precision queries about the distribution.

Let us interpret the output of each Turing machine (on the empty input) as a query. Some encoding scheme is necessary; I don’t think it matters much. So assume we have a map Eval:M→Q. (The main reason for representing queries using Turing machines is to allow quining).

We will impose one additional restriction: for any machine M, the query q it outputs must not depend on the distribution over O(M) (i.e. the query is actually only a function of the joint distribution over each O(M′) value other than O(M) itself). This is to avoid liar’s paradoxes.

## Reflectivity

An oracle distribution D is reflective if, for each M∈M, and each a such that D(O(M)=a)>0:

a∈Eval(M)(Condition(D,M,a))

where Condition(D,M,a) is an oracle distribution formed by conditioning the oracle distribution D on the event O(M)=a.

In words, an oracle distribution is reflective if each possible answer to each query is correct thing to say about the distribution over oracles conditional on the answer to that query.

Theorem 1:A reflective joint oracle exists.Proof:This oracle distribution will independently roll a die for each query to produce O. Specifically, it is formed by a mapping ~O:M→ΔN. Let the function d map ~O to an oracle distribution through independent sampling.

Define f(~O):=×M∈MΔ(Eval(M)(d(~O)), where × indicates a Cartesian product, and ΔS is the set of probability distributions over the set S (which is finite in this case). Note that f(~O) is always non-empty and that f is upper-hemicontinuous (by Berge’s Maximum Theorem). So by Kakutani’s Fixed Point Theorem, there is some mapping ~O such that ~O∈f(~O).

Consider the oracle distribution d(~O). For any M∈M, and O∈Support(d(~O)), we have

O(M)∈Eval(M)(d(~O))=Eval(M)(Condition(d(~O),M,O(M)))

where the second equality follows from (a) the fact that Eval(M) does not depend on the distribution over O(M) and (b) the fact that all O(M) values are independent under d(~O). So d(O∗) is reflective.

□

Theorem 2:The set of reflective oracle distributions is convex.Proof:Let D0,D1 be reflective oracle distributions. Let θ∈(0,1). Define Dθ:=θD0+(1−θ)D1. This proof will show that Dθ is reflective.

Let 1≤i≤n. Let M∈M. Let a be such that Dθ(O(M)=a)>0. Consider 3 cases:

D0(O(M)=a)>0,D1(O(M)=a)=0. Then a∈Eval(M)(Condition(D0,M,a)). But since D1(O(M)=a)=0, we have Condition(Dθ,M,a)=Condition(D0,M,a). So a∈Eval(M)(Condition(Dθ,M,a)) as desired.

D0(O(M)=a)=0,D1(O(M)=a)>0. This is analogous to case 1.

D0(O(M)=a)>0,D1(O(M)=a)>0. We have a∈Eval(M)(Condition(D0,M,a)) and a∈Eval(M)(Condition(D1,M,a)). Note that Condition(Dθ,M,a) is a convex combination of Condition(D0,M,a) and Condition(D1,M,a) (this is a basic property of mixture distributions: the conditioning of a mixture of components is a mixture of the conditionings of the components). The set {D|a∈Eval(M)(D)} is convex (from Eval(M) being a query); therefore a∈Eval(M)(Condition(Dθ,M,a)), as desired.

□

## Correspondence with correlated equilibria

Reflective oracle distributions can be used to find correlated equilibria. Say we have a normal-form game with n players, where each player i selects an action Ai∈Ai, and receives utility Ui(A1,...,An). Define queries as follows:

qi(D):=argmaxa∈AiED[Ui(O(M1),...,O(Mi−1),a,O(Mi+1),...,O(Mn))]

where the machine Mi computes a representation of the query qi (the recursion works through mutual quining). It’s easy to show that reflective oracle distributions (for the set of machines {M1,...,Mn}) correspond exactly with correlated equilibria in the game.

Thus, as ordinary reflective oracles naturally yield Nash equilibria when causal decision theorists using them play games with each other, so do reflective oracle distributions naturally yield correlated equilibria when causal decision theorists using them play games with each other.

ϵ-approximate correlated equilibria can be computed in polynomial time (see lectures 17+18 here). I conjecture that ϵ-approximate reflective oracle distributions over a finite set of queries can also be found in polynomial time, perhaps by reducing the problem of finding ϵ-approximate reflective oracle distributions to the problem of finding ϵ-approximate correlated equilibria.