# Conceptual Problems with UDT and Policy Selection

# Abstract

UDT doesn’t give us conceptual tools for dealing with multiagent coordination problems. There may have initially been some hope, because a UDT player can select a policy which incentivises others to cooperate, or because UDT can reason (EDT-style) that other UDTs are more likely to cooperate if it cooperates itself, or other lines of thought. However, it now appears that this doesn’t pan out, at least not without other conceptual breakthroughs (which I suggest won’t look that much like UDT). I suggest this is connected with UDT’s difficulties handling logical uncertainty.

# Introduction

I tend to mostly think of UDT as the ideal, with other decision theories being of interest primarily because we don’t yet know how to generalize UDT beyond the simplistic domain where it definitely makes sense. This perspective has been increasingly problematic for me, however, and I now feel I can say some relatively concrete things about UDT being wrong *in spirit* rather than only in detail.

Relatedly, in late 2017 I made a post titled Policy Selection Solves Most Problems. Policy selection is the compromise solution which does basically what UDT is supposed to do, without the same degree of conceptual elegance, and without providing the hoped-for clarity which was a major motivation for studying these foundational problems. The current post can be seen as a follow-up to that, giving an idea of the sort of thing which policy selection doesn’t seem to solve.

The argument can also be thought of as an argument against veil-of-ignorance morality of a certain kind.

I don’t think any of this will be really surprising to people who have been thinking about this for a while, but, my thoughts seem to have recently gained clarity and definiteness.

## Terminology Notes/References

UDT 1.0, on seeing observation , takes the action which maximizes the expected utility of “my code outputs action on seeing observation ”, with expected value evaluated according to the prior.

UDT 1.1, on seeing observation , takes the action which *the globally optimal policy *(according to the prior) maps to. This produces the same result as UDT 1.0 in many cases, but ensures that the agent can hunt stag with itself.

UDT 2 is like UDT 1.1, except it (1) represents policies as programs rather than input-output mappings, and (2) dynamically decides how much time to spend thinking about the optimal policy.

What I’m calling “policy selection” is similar to UDT 2. It has a fixed (small) amount of time to choose a policy before thinking more. However, it could always choose the policy of waiting until it has thought longer before it really chooses a strategy, so that’s not so different from dynamically deciding when to choose a policy.

# Two Ways UDT Hasn’t Generalized

## Logical Uncertainty

UDT 2 tries to tackle the issue of “thinking longer”, which is the issue of logical uncertainty. This is a conceptual problem for UDT, because thinking longer is a kind of updating. UDT is supposed to avoid updating. UDT 2 doesn’t really solve the problem in a nice way.

The problem with thinking for only a short amount of time is that you get bad results. Logical induction, the best theory of logical uncertainty we have at the moment, gives essentially no guarantees about the quality of beliefs at short times. For UDT 2 to work well, it would need early beliefs to at least be good enough to avoid selecting a policy quickly—early beliefs should at least correctly understand how poor-quality they are.

The ideal for UDT is that early beliefs reflect all the possibilities inherent in later updates, so that a policy optimized according to early beliefs reacts appropriately to later computations. Thin priors are one way of thinking of this. So far, nothing like this has been found.

## Game Theory

The second way UDT has failed to generalize, and the main topic of this post, is to game theory (ie, multi-agent scenarios). Cousin_it noted that single-player extensive-form games provided a toy model of UDT. The cases where he says that the toy model breaks down are the cases where I am *now* saying the concept of UDT *itself* breaks down. Extensive-form games represent the situations where UDT makes real sense: those with no logical uncertainty (or at least, no non-Bayesian phenomena in the logical uncertainty), and, only one agent.

What’s the conceptual problem with extending UDT to multiple agents?

When dealing with *updateful* agents, UDT has the upper hand. For example, in Chicken-like games, a UDT agent can be a bully, or commit not to respond to bullies. Under the usual game-theoretic assumption that players can determine what strategies each other have selected, the updateful agents are forced to respond optimally to the updateless ones, IE give in to UDT bullies / not bully the un-bullyable UDT.

Put simply, UDT makes its decisions “before” other agents. (The “before” is in logical time, though, not necessarily really before.)

When dealing with *other UDT agents,* however, the UDT agents have to make a decision “at the same time”.

Naively, the coordination mechanism “write your decisions on slips of paper simultaneously—no peeking!” is a bad one. But this is the whole idea of UDT—writing down its strategy under a “no peeking!” condition.

Other decision theories also have to make decisions “at the same time” in game-theoretic situations, but they don’t operate under the “no peeking”. Guessing the behavior of the other players could be difficult, but the agent can draw on past experience to help solve this problem. UDT doesn’t have this advantage.

Furthermore, we’re asking more of UDT agents. When faced with a situation involving other UDT agents, UDT is supposed to “handshake”—Löbian handshakes being at least a toy model—and find a cooperative solution (to the extent that there is one).

So far, models of how handshakes could occur have been limited to special cases or unrealistic assumptions. (I’d like to write a full review—I think there’s some non-obvious stuff going on—but for this post I think I’d better focus on what I see as the fundamental problem.) I’d like to see better models, but, I suspect that significant departures from UDT will be required.

Even if you don’t try to get UDT agents to cooperate with each other, though, the conceptual problem remains—UDT is going in blind. It has a much lower ability to determine what equilibrium it is in.

I think there is a deep relationship between the issue with logical uncertainty and the issue with game theory. A simple motivating example is Agent Simulates Predictor, which appears to be strongly connected to both issues.

# How does Equilibrium Selection Work?

The problem I’m pointing to is the problem of equilibrium selection. How are two UDT agents supposed to predict each other? How can they trust each other?

There are many different ways to think about agents ending up in game-theoretic equilibria. Most of them, as I understand it, rely on iterating the game so that the agents can learn about it. This iteration can be thought of as really occurring, or as occurring in the imagination of the players (an approach called “fictitious play”). Often, these stories result in agents playing correlated equilibria, rather than Nash equilibria. However, that’s not a very big difference for our purposes here—correlated equilibria only allow the DD outcome in Prisoner’s Dilemma, just like Nash.

There’s something absurd about using iterated play to learn single-shot strategies, a problem Yoav Shoham et al discuss in If multi-agent learning is the answer, what is the question?. If the game is iterated, what stops agents from taking advantage of its iterated nature?

That’s the essence of my argument in In Logical Time, All Games are Iterated Games—in order to learn to reason about each other, agents use fictitious play, or something similar. But this turns the game into an iterated game.

Turning a game into an iterated game can create a lot of opportunity for coordination, but the Folk Theorem says that it also creates a very large equilibrium selection problem. The Folk Theorem indicates that rational players can end up in very bad outcomes. Furthermore, we’ve found this difficult to avoid in decision algorithms we know how to write down. How can we eliminate the “bad” equilibria and keep only the “good” possibilities?

What we’ve accomplished is the reduction of the “handshake” problem to the problem of avoiding bad equilibria. (We could say this turns prisoner’s dilemma into stag hunt.)

Handshake or no handshake, however, the “fictitious play” view suggests that equilibrium selection requires *learning*. You can get agents into equilibria without learning, but the setups seem artificial (so far as I’ve seen). This requires updateful reasoning in some sense. (Although, it can be a logical update only; being empirically updateless still seems wiser).

# Logical Uncertainty & Games

Taking this idea a little further, we can relate logical uncertainty and games via the following idea:

*Our uncertain expectations are a statistical summary of how things have gone in similar situations in the (logical) past. The way we react to what we see can be thought of as an iterated strategy which depends on the overall statistics of that history (rather than a single previous round).*

I’m not confident this analogy is a good one—in particular, the way policies have to depend on statistical summaries of the history rather than on specific previous rounds is a bit frustrating. However, the analogy goes deeper than I’m going to spell out here. (Perhaps in a different post.)

One interesting point in favor of this analogy: it also works for modal agents. The proof operator, , is like a “prediction”: proofs are how modal agents thinks about the world in order to figure out what to do. So is like “the agent thinks ”. If you look at how modal agents are actually computed in the MIRI guide to Löb’s theorem, it looks like an iterated game, and looks like a simple kind of summary of the game so far. On any round, is true if and only if has been true in every previous round. So, you can think of as ” has held up so far”—as soon as turns out to be false once, is never true again.

In this interpretation, FairBot (the strategy of cooperating if and only if the other player provably cooperates) becomes the “Grim Trigger” strategy: cooperate on the first round, and cooperate on every subsequent round so long as the other player has cooperated so far. If the other player ever defects, switch to defecting, and never cooperates again.

A take-away for the broader purpose of this post could be: *one of the best models we have of the UDT “handshake” is the Grim Trigger strategy in disguise. *This sets the tone nicely for what follows.

My point in offering this analogy, however, is to drive home the idea that game-theoretic reasoning requires learning. Even logic-based agents can be understood as running simple learning algorithms, “updating” on “experience” from (counter)logical possibilities. UDT can’t dance with its eyes closed.

This is far from a proof of anything; I’m just conveying intuitions here.

# What UDT Wants

One way to look at what UDT is trying to do is to think of it as always trying to win a “most meta” competition. UDT doesn’t want to look at any information until it has determined the best way to use that information. UDT doesn’t want to make any decisions directly; it wants to find optimal policies. UDT doesn’t want to participate in the usual game-theoretic setup where it (somehow) knows all other agent’s policies and has to react; instead, it wants to understand *how* those policies come about, and act in a way which maximally shapes that process to its benefit.

It wants to move first in every game.

Actually, that’s not right: it wants the *option* of moving first. Deciding earlier is always better, if one of the options is to decide later.

It wants to announce its binding commitments before anyone else has a chance to, so that everyone has to react to the rules it sets. It wants to set the equilibrium as it chooses. Yet, at the same time, it wants to understand how everyone else will react. It would like to understand all other agents in detail, their behavior a function of itself.

So, what happens if you put two such agents in a room together?

Both agents race to decide how to decide first. Each strives to understand the other agent’s behavior as a function of its own, to select the best policy for dealing with the other. Yet, such examination of the other needs to itself be done in an updateless way. It’s a race to make the most uninformed decision.

I claim this isn’t a very good coordination strategy.

One issue is that jumping up a meta-level increases the complexity of a decision. Deciding a single action is much easier than deciding on a whole policy. Some kind of race to increasing meta-levels makes decisions increasingly challenging.

At the same time, the desire for your policy to be logically earlier than everyone else, so that they account for your commitments in making their decisions, means you have to make your decisions faster and in simpler, more predictable ways.

The expanding meta-space and the contracting time do not seem like a good match. You have to make a more complicated decision via less-capable means.

Two people trying to decide policies early are just like two people trying to decide actions late, but with more options and less time to think. It doesn’t seem to solve the fundamental coordination problem.

The race for most-meta is only one possible intuition about what UDT is trying to be. Perhaps there is a more useful one, which could lead to better generalizations.

# Veils of Ignorance

UDT tries to coordinate with itself by stepping behind a veil. In doing so, it fails to coordinate with others.

Veil-of-ignorance moral theories describe multiagent coordination resulting from stepping behind a veil. But there is a serious problem. How can everyone step behind the same veil? You can’t tell what veil everyone else stepped behind if you stay behind your own veil.

UDT can successfully self-coordinate in this way because it is very reasonable to use the common prior assumption with a single agent. There is no good reason to suppose this in the multiagent case. In practice, the common prior assumption is a good approximation of reality because everyone has dealt with essentially the same reality for a long time and has learned a lot about it. But if we have everyone step behind a veil of ignorance, there is no reason to suppose they know how to construct the *same *veil as each other—they’re ignorant!

# Is UDT Almost Right, Nonetheless?

I find myself in an awkward position. I still think UDT gets a lot of things right. Certainly, it still seems worth being updateless about *empirical* uncertainty. It doesn’t seem to make sense for *logical* uncertainty… but treating logical and empirical uncertainty in such different ways is quite uncomfortable. My intuition is that there should not be a clean division between the two.

One possible reaction to all this is to try to learn to be updateless. IE, don’t actually try to be updateless, but do try to get the problems right which UDT got right. Don’t expect everything to go well with a fixed Bayesian prior, but try to specify the learning-theoretic properties which approximate that ideal.

Would such an approach do anything to help multiagent coordination? Unclear. Thermodynamic self-modification hierarchies might work with this kind of approach.

In terms of veil-of-ignorance morality, it seems potentially helpful. Take away everything we’ve learned, and we don’t know how to cooperate from behind our individual veils of ignorance. But if we each have a veil of ignorance which is carefully constructed, a learned-updateless view which accurately reflects the possibilities in some sense, they seem more likely to match up and enable coordination.

Or perhaps a more radical departure form the UDT ontology is needed.

- AI Alignment 2018-19 Review by 28 Jan 2020 2:19 UTC; 140 points) (
- The “Commitment Races” problem by 23 Aug 2019 1:58 UTC; 75 points) (
- 13 Sep 2019 21:38 UTC; 70 points) 's comment on A Critique of Functional Decision Theory by (
- MIRI’s 2019 Fundraiser by 3 Dec 2019 1:16 UTC; 61 points) (
- Partial Agency by 27 Sep 2019 22:04 UTC; 53 points) (
- 4 Sep 2019 5:52 UTC; 25 points) 's comment on Embedded Agency via Abstraction by (
- [AN #82]: How OpenAI Five distributed their training computation by 15 Jan 2020 18:20 UTC; 20 points) (
- MIRI’s 2019 Fundraiser by 7 Dec 2019 0:30 UTC; 19 points) (EA Forum;
- 13 Sep 2019 1:30 UTC; 6 points) 's comment on Formalising decision theory is hard by (
- Counterfactual Mugging: Why should you pay? by 17 Dec 2019 22:16 UTC; 5 points) (
- 13 Nov 2019 22:52 UTC; 2 points) 's comment on The Credit Assignment Problem by (

I agree with Two Ways UDT Hasn’t Generalized and What UDT Wants, and am still digesting the other parts. (I think a lot of this is really fuzzy and hard to talk/write about, and I feel like giving you some positive reinforcement for making the attempt and doing as well as you are. :)

It seems like one meta level above what even UDT tries to be is decision theory (as a philosophical subject) and one level above that is metaphilosophy, and my current thinking is that it seems bad (potentially dangerous or regretful) to put any significant (i.e., superhuman) amount of computation into anything

exceptdoing philosophy, partly for reasons given in this post.To put it another way, any decision theory that we come up with might have some kind of flaw that other agents can exploit, or just a flaw in general, such as in how well it cooperates or negotiates with or exploits other agents (which might include how quickly/cleverly it can make the necessary commitments). Wouldn’t it be better to put computation into trying to find and fix such flaws (in other words, coming up with better decision theories) than into any particular object-level decision theory, at least until the superhuman philosophical computation itself decides to start doing the latter?

I think other agents cannot exploit us thinking more. UDT swerves against a madman in chicken. If a smart opponent knows this and therefore becomes a madman to exploit UDT, then UDT is playing against a madman that used to be smart, rather than just a madman. This is a different case because the decision UDT makes here influences both the direct outcome and the thought process of the smart opponent. By deliberately crashing into the formerly smart madman, UDT can retroactively erase the situation.

Doing what you describe requires something like logical updatelessness, which UDT doesn’t do, and which we don’t know how to do in general. I think this was described in the post. Also, even if thinking more doesn’t allow someone to exploit you, it might cause you to miss a chance to exploit someone else, or to cooperate with someone else, because it makes you too hard to predict.

I don’t know what logical updatelessness means, and I don’t see where the article describes this, but I’ll just try to formalize what I describe, since you seem to imply that would be novel.

Let UDT:=argmaxamaxu□("UDT=a⟹utility≥u") . Pitted against itself in modal combat, it’ll get at least the utility of (C,C) because “UDT = C ⇒ utility is that of (C,C)” is provable. In chicken, UDT is “swerve unless the opponent provably will”. UDT will swerve against itself (though that’s not provable), against the madman “never swerve” and against “swerve unless the opponent swerves against a madman” (which requires UDT’s opponent to disentangle UDT from its environment). UDT won’t swerve against “swerve if the opponent doesn’t” or “swerve if it’s provable the opponent doesn’t”.

Attempting to

exploitthe opponent sounds to me like “self-modify into a madman if it’s provable that that will make the opponent swerve”, but that’s just UDT.Suppose you (as a human) are playing chicken against this version of UDT, which has vastly more computing power than you and could simulate your decisions in its proofs. Would you swerve?

I wouldn’t, because I would reason that if I didn’t swerve, UDT would simulate that and conclude that not swerving leads to the highest utility. You said “By deliberately crashing into the formerly smart madman, UDT can retroactively erase the situation.” but I don’t see how

this versionof UDT does that.You’re right, the post kind of just obliquely mentions it and assumes the reader already knows the concept, in this paragraph:

Not sure what’s a good reference for logical updatelessness. Maybe try some of these posts? The basic idea is just that even if you manage to prove that your opponent doesn’t swerve, you perhaps shouldn’t “update” on that and then make your own decision while assuming that as a fixed fact that can’t be changed.

If I didn’t assume PA is consistent, I would swerve because I wouldn’t know whether UDT might falsely prove that I swerve. Since PA is consistent and I assume this, I am in fact better at predicting UDT than UDT is at predicting itself, and it swerves while I don’t. Can you find a strategy that beats UDT, doesn’t disentangle its opponent from the environment, swerves against itself and “doesn’t assume UDT’s proof system is consistent”?

It sounds like you mentioned logical updatelessness because my version of UDT does not trust a proof of “u = …”, it wants the whole set of proofs of “u >= …”. I’m not yet convinced that there are any other proofs it must not trust.

I think there’s no best player of multiplayer games. Or rather, choosing the best player depends on what other players exist in the world, and that goes all the way down (describing the theory of choosing the best player also depends on what other players exist, and so on).

Of course that doesn’t mean UDT is the best we can do. We cannot solve the whole problem, but UDT carves out a chunk, and we can and should try to carve out a bigger chunk.

For me the most productive way has been to come up with crisp toy problems and try to solve them. (Like ASP, or my tiling agents formulation.) Your post makes many interesting points; I’d love to see crisp toy problems for each of them!

I think it might be worth noting that there’s a trivial no-free-lunch theorem we can state about multiplayer games that can formalize your intuition.

(In at least a large class of cases) where there are multiple nash-equilibria, if different players aim for different equilibria, the best strategy depends on the strategy of the player you face. I think that’s all we need to say to show there is no best player.

True, but, I think that’s a bad way of thinking about game theory:

The Nash equilibrium model assumes that players somehow know what equilibrium they’re in. Yet, it gives rise to an equilibrium selection problem due to the non-uniqueness of equilibria. This casts doubt on the assumption of common knowledge which underlies the definition of equilibrium.

Nash equilibria also assume a naive best-response pattern. If an agent faces a best-response agent and we assume that the Nash-equilibrium knowledge structure somehow makes sense (there is some way that agents successfully coordinate on a fixed point), then it would make more sense for an agent to select its response function (to, possibly, be something other than argmax), based on what gets the best response from the (more-naive) other player. This is similar to the UDT idea. Of course you can’t have both players do this or you’re stuck in the same situation again (ie there’s yet another meta level which a player would be better off going to).

Going to the meta-level like that seems likely to make the equilibrium selection problem worse rather than better, but, that’s not my point. My point is that Nash equilibria aren’t the end of the story; they’re a somewhat weird model. So it isn’t obvious whether a similar no-free-lunch idea applies to a better model of game theory.

Correlated equilibria are an obvious thing to mention here. They’re a more sensible model in a few ways. I think there are still some unjustified and problematic assumptions there, though.

Agreed that it’s insufficient, but I think it shows that there’s no way to specify strategies that work regardless of other players’ strategies, and I agree that this generalizes to better solution concepts, which I agree “make the equilibrium selection problem worse”.

I’d also point out an oft-noted critical failure of Nash Equilibria, which is that they assume infinite computation, and (therefore) no logical uncertainty. A game can pay out the seventeenth digit of the BB(200) to player 1 and the eighteenth digit to player 2, and we must assume these are known, and can be used to find the NE. I haven’t thought through the following through completely, but it seems obvious that this issue can be used to show why NE is not generally a useful/valid solution concept for embedded agents, because they would need models of themselves and other agents their own size to predict goals / strategies.

I’m saying that non-uniqueness of the solution is

part of the conceptual problem with Nash equilibria.Decision theory doesn’t exactly provide a “unique solution”—it’s a theory of rational constraints on subjective belief, so, you can believe and do whatever you want within the confines of those rationality constraints. And of course classical decision theory also has problems of its own (such as logical omniscience). But there is a sense in which it is better than game theory about this, since game theory gives rationality constraints which depend on the other player in ways that are difficult to make real.

I’m not saying there’s some strategy which works regardless of the other player’s strategy. In single-player decision theory, you can still say “there’s no optimal strategy due to uncertainty about the environment”—but, you get to say “but there’s an optimal strategy given our uncertainty about the environment”, and this ends up being a fairly satisfying analysis. The nash-equilibrium picture of game theory lacks a similarly satisfying analysis. But this does not seem essential to game theory.

Pretty sure we’re agreeing here. I was originally just supporting

cousin_it’s claim, not claiming that Nash Equilibria are a useful-enough solution concept. I was simply noting that—while they are weaker than a useful-enough concept would be—they can show the issue with non-uniqueness clearly.For any strategy in modal combat, there is another strategy that tries to defect exactly against the former.

I don’t want to claim there’s a best way, but I do think there are certain desirable properties which it makes sense to shoot for. But this still sort of points at the wrong problem.

A “naturalistic” approach to game theory is one in which game theory is an application of decision theory (not an extension) -- there should be no special reasoning which applies only to other agents. (I don’t know a better term for this, so let’s use naturalistic for now.)

Standard approaches to game theory lack this (to varying degrees). So, one frame is that we would like to come up with an approach to game theory which is naturalistic. Coming from the other side, we can attempt to apply existing decision theory to games. This ends up being more confusing and unsatisfying than one might hope. So, we can think of game theory as an especially difficult stress-test for decision theory.

So it isn’t that there should be some best strategy in multiplayer games, or even that I’m interested in a “better” player despite the lack of a notion of “best” (although I am interested in that). It’s more that UDT doesn’t give me a way to think about games. I’d like to have a way to think about games which makes sense to me, and which preserves as much as possible what seems good about UDT.

Desirable properties such as coordination are important in themselves, but are also playing an illustrative role—pointing at the problem. (It could be that coordination just shouldn’t be expected, and so, is a bad way of pointing at the problem of making game theory “make sense”—but I currently think better coordination should be possible, so, think it is a good way to point at the problem.)

But game theory doesn’t require such special reasoning! It doesn’t care how players reason. They might not reason at all, like the three mating variants of the side-blotched lizard. And when they do reason, game theory still shows they can’t reason their way out of a situation unilaterally, no matter if their decision theory is “naturalistic” or not. So I think of game theory as an upper bound on all possible decision theories, not an application of some future decision theory.

I think UDT1.1 have two fundamentally wrong assumptions built in.

1) Complete prior: UDT1.1 follows the policy that is optimal according to it’s prior. This is incommutable in general settings and will have to be approximated some how. But even an approximation of UDT1.1 assumes that UDT1.1 is at least well defined. However in some multi agent settings or when the agent is being fully simulated by the environment, or any other setting where the environment is

necessarybigger than the agent, then UDT1.1 is ill defined.2) Free will: In the problem Agent Simulates Predictor, the environment is smaller than the agent, so it is falls outside the above point. Here instead I think the problem is that the agent assumes that it has free will, when in fact it behaves in a deterministic manner.

The problem of free will in Decision Problems is even clearer in the smoking lesion problem:

You want to smoke and you don’t want Cancer. You know that people who smoke are more likely get cancer, but you also know that smoking does not cause cancer. Instead, there is a common cause, some gene, that happens to both increase the risk of cancer and make it more likely that a person with this gene are more likely to choose to smoke. You can not test if you have the gene.

Say that you decide to smoke, becasue ether you have the gene or not so you might as well enjoy smoking. But what if everyone though like this? Then there would be no correlation between the cancer gene and smoking. So where did the statistics about smokers getting cancer come from (in this made up version of reality).

If you are the sort of person who smokes no mater what, then ether:

a) You are sufficiently different from most people such that the statistics does not apply to you.

or

b) The cancer gene is correlated with being the sort of person that has a decision possess that leads to smoking.

If b is correct, then maybe you should be the sort of algorithm that decides not to smoke, as to increase the chance of being implemented into a brain that lives in a body with less risk of cancer. But if you start thinking like that, then you are also giving up your hope at affecting the universe, and resign to just choosing where you might find yourself, and I don’t think that is what we want from a decision theory.

But there also seems to be no good way of thinking about how to steer the universe with out pretending to have free will. But since that is actually a falls assumption, there will be weird edge cases where you’re reasoning breaks down.