I saw an earlier draft of this, and hope to write an extensive response at some point. For now, the short version:

As I understand it, FDT was intended as an umbrella term for MIRI-style decision theories, which illustrated the critical points without making too many commitments. So, the vagueness of FDT was partly by design.

I think UDT is a more concrete illustration of the most important points relevant to this discussion.

The optimality notion of UDT is clear. “UDT gets the most utility” means “UDT gets the highest expected value with respect to its own prior”. This seems quite well-defined, hopefully addressing your (VII).

There are problems applying UDT to realistic situations, but UDT

*makes perfect sense and is optimal in a straightforward sense*for the case of single-player extensive form games. That doesn’t address multi-player games or logical uncertainty, but it is enough for much of Will’s discussion.FDT focused on the weird logical cases, which is in fact a major part of the motivation for MIRI-style decision theory. However, UDT for single-player extensive-form games actually gets at a lot of what MIRI-style decision theory wants, without broaching the topic of logical counterfactuals or proving-your-own-action directly.

The problems which create a deep indeterminacy seem, to me, to be problems for other decision theories than FDT as well. FDT is trying to face them head-on. But there are big problems for applying EDT to agents who are physically instantiated as computer programs and can prove too much about their own actions.

This also hopefully clarifies the sense in which I don’t think the decisions pointed out in (III) are bizarre. The decisions are optimal

*according to the very probability distribution used to define the decision problem.*There’s a subtle point here, though, since Will describes the decision problem from an updated perspective—you already know the bomb is in front of you. So UDT “changes the problem” by evaluating “according to the prior”. From my perspective, because

*the very statement of the*suggests that there were also other possible outcomes, we can rightly insist to evaluate expected utility in terms of those chances.**Bomb**problemPerhaps this sounds like an unprincipled rejection of the

problem as you state it. My principle is as follows:**Bomb***you should not state a decision problem without having in mind a well-specified way to predictably put agents into that scenario*. Let’s call the way-you-put-agents-into-the-scenario the “construction”. We then evaluate agents on how well they deal with the construction.For examples like

, the construction gives us the overall probability distribution—this is then used for the expected value which UDT’s optimality notion is stated in terms of.**Bomb**For other examples, as discussed in Decisions are for making bad outcomes inconsistent, the construction simply breaks when you try to put certain decision theories into it. This can also be a good thing; it means the decision theory makes certain scenarios altogether impossible.

The point about “constructions” is possibly a bit subtle (and hastily made); maybe a lot of the disagreement will turn out to be there. But I *do hope* that the basic idea of UDT’s optimality criterion is actually clear—“evaluate expected utility of policies according to the prior”—and clarifies the situation with FDT as well.

UDT was a fairly simple and workable idea in classical Bayesian settings with logical omniscience (or with some simple logical uncertainty treated as if it were empirical uncertainty), but it was always intended to utilize logical uncertainty at its core. Logical induction, our current-best theory of logical uncertainty, doesn’t turn out to work very well with UDT so far. The basic problem seems to be that UDT required “updates” to be represented in a fairly explicit way: you have a prior which already contains all the potential things you can learn, and an update is just selecting certain possibilities. Logical induction, in contrast, starts out “really ignorant” and adds structure, not just content, to its beliefs over time. Optimizing via the early beliefs doesn’t look like a very good option, as a result.

FDT requires a notion of logical causality, which hasn’t appeared yet.

Taking logical uncertainty into account, all games become iterated games in a significant sense, because players can reason about each other by looking at what happens in very close situations. If the players have T seconds to think, they can simulate the same game but given t<<T time to think, for many t. So, they can learn from the sequence of “smaller” games.

This might seem like a good thing. For example, single-shot prisoner’s dilemma has just a Nash equilibrium of defection. Iterated play cas cooperative equilibria, such as tit-for-tat.

Unfortunately, the folk theorem of game theory implies that there are a whole lot of fairly bad equilibria for iterated games as well. It is

possiblethat each player enforces a cooperative equilibrium via tit-for-tat-like strategies. However, it is just as possible for players to end up in a mutual blackmail double bind, as follows:Both players initially have some suspicion that the other player is following strategy X: “cooperate 1% of the time if and only if the other player is playing consistently with strategy X; otherwise, defect 100% of the time.” As a result of this suspicion, both players play via strategy X in order to get the 1% cooperation rather than 0%.

Ridiculously bad “coordination” like that can be avoided via cooperative oracles, but that requires everyone to somehow have access to such a thing. Distributed oracles are more realistic in that each player can compute them just by reasoning about the others, but players using distributed oracles can be exploited.

So, how do you avoid supremely bad coordination in a way which isn’t too badly exploitable?

The problem of specifying good counterfactuals sort of wraps up any and all other problems of decision theory into itself, which makes this a bit hard to answer. Different potential decision theories may lean more or less heavily on the counterfactuals. If you lead toward EDT-like decision theories, the problem with counterfactuals is mostly just the problem of making UDT-like solutions work. For CDT-like decision theories, it is the other way around; the problem of getting UDT to work is mostly about getting the right counterfactuals!

The mutual-blackmail problem I mentioned in my “coordination” answer is a good motivating example. How do you ensure that the agents don’t come to think “I have to play strategy X, because if I don’t, the other player will cooperate 0% of the time?”