MIRIx Stanford report

This is a report from the second MIRIx at Stanford, which happened on Saturday May 9, 2015. The participants were Jessica Taylor, Alex Richard, Nate Thomas, James Cook, Helen Jiang, and Greg Nisbet.

We started by going over the reflective oracles/​game theory paper. Then we tried to find useful ways to apply game theory concepts to reflective oracles. Specificially, we looked at various game theory equilibria other than Nash equilibria and tried to see what they corresponded with in the setting of reflective oracles.

Trembling hand perfect equilibrium

Trembling hand perfect equilibrium is a variant of Nash equilibrium that forbids certain equilibria. Roughly, one could imagine that each player has a tiny chance of taking a random action, and require players’ actions to be rational even in this hypothetical. More formally, a Nash equilibrium is a trembling hand perfect equilibrium if there is a sequence of perturbed games (variants of the original game where each strategy must be played with at least probability) converging to the original game whose Nash equilibria converge to the original Nash equilibrium. It is very straightforward to define the same notion for reflective oracles: we can define a sequence of “perturbed programs” that add a small amount of noise to oracle calls, with reflective oracles for each perturbed program converging to a trembling hand reflective oracle.

Evidential game theory

It is possible to define evidential game theory as a natural extension of evidential decision theory to game theory. Given players and a distribution over pure strategy profiles, let and define the utility for player taking action as .

Let be considered an -evidential equilibrium when, for each player, .

We can say that is an evidential equilibrium if there is a sequence of -evidential equilibria converging to as converges to 0. The use of is necessary to avoid in conditional probabilities.

Mutual cooperation in the prisoner’s dilemma is actually a valid evidential equilibrium. To see why this is true, let be a distribution that assigns probability to mutual defection and probability to mutual cooperation. Since the expected utility given that you cooperate is higher than the expected utility given that you defect, cooperating is the optimal action. We assign higher probability to both players cooperating as increases, so we can accordingly make go to 0 in the limit, making this sequence converge to a distribution where the players always mutually cooperate.

Also note that every Nash equilibrium is an evidential equilibrium. This is because players’ actions are independent in a Nash equilibrium, so causal counterfactuals are the same as evidential counterfactuals.

While evidential game theory is not quite the UDT game theory that we would really like, it does seem to be related in that it allows an agent to take into account non-causal interactions between their action and others’ actions.

Correlated oracles

We attempted to define the equivalent of correlated equilibrium in the setting of reflective oracles.

An oracle could be described as a function of type . Alternatively, we could define an oracle distribution to be of type , a distribution over deterministic oracles. The reflection principle can be stated almost exactly as before. Let a distribution over deterministic oracles be considered reflective iff where is a random variable.

If we set up game theory the same way we did in the original reflective oracles paper, what resulting equilibrium do we get? This is not actually correlated equilibrium; rather, it’s an even more basic extension of Nash equilibrium. I don’t know if this has been studied, but I’ll call it a “joint Nash equilibrium” for the purposes of this post.

Let be a distribution over pure strategy profiles. is a joint Nash equilibrium iff, for each player , any action in the support of maximizes expected utility given that other players’ actions are distributed according to . Because we construct causal counterfactuals for each of our actions, we ignore correlation between our actions and other players’ actions, but we still take into account correlation between other players’ actions.

This suggests that we may get additional equilibria (evidential equilibria, correlated equilibria, subjective correlated equilibria, coarse correlated equilibria) by asking different counterfactuals. The counterfactuals must not be causal counterfactuals, because this would lead to equilibria that ignore the correlation between our action and others’ actions. We ran out of time before figuring out how to construct these counterfactuals.