# abramdemski

Karma: 5,177

NewTop

# An Untrollable Mathematician Illustrated

# Embedded Agents

# Toward a New Technical Explanation of Technical Explanation

# Subsystem Alignment

# Robust Delegation

# Machine Learning Analogy for Meditation (illustrated)

# Co-Proofs

# Learn Bayes Nets!

# Decision Theory

# In Logical Time, All Games are Iterated Games

# Bayes’ Law is About Multiple Hypothesis Testing

# Embedded World-Models

# [Question] What makes people intellectually active?

# Track-Back Meditation

# Combat vs Nurture & Meta-Contrarianism

# CDT=EDT=UDT

Doing a bunch of line editing on the post is very nice of you, but also comes off as possibly passive-agressive in the context of you not having said anything nice about the post… most of the edit suggestions just seem helpful, but I’m left feeling like your goal is to prove that the post is bad rather than improve it (especially since you say “If those were all solved, more might be visible” rather than something encouraging).

All I’m saying is I’m a bit weirded out. Maybe I’m mis-reading bluntness as hostility.

Anyway, I’ll probably try and incorporate some of the suggested edits soon.

I don’t think this is quite right, for reasons related to this post.

Sometimes a hypothesis can be “too strong” or “too weak”. Sometimes hypotheses can just be different. You mention the 2-4-6 task and the soda task. In the soda task, Hermoine makes a prediction which is “too strong” in that it predicts

*anything*spilled on the robe will vanish; but also “too weak” in that it predicts the soda will*not*vanish if spilled on the floor. Actually, I’m not even sure if that is right. What does “too strong” mean? What is a maximally strong or weak hypothesis? Is it based on the entropy of the hypothesis?I think this mis-places the difficulty in following Eliezer’s “twisty thinking” advice. The problem is that trying to disconfirm a hypothesis

*is not a specification of a computation you can just carry out.*It sort of points in a direction; but, it relies on my ingenuity to picture the scenario where my hypothesis is false. What does this really mean? It means*coming up with a second-best hypothesis*and then finding a test which differentiates between the best and second best. Similarly, your “too strong” heuristic points in the direction of coming up with alternate hypotheses to test. But, I claim, it’s not really about being “too strong”.What I would say instead is

*your test should differentiate between hypotheses*(the best hypotheses you can think of; formally, your test should have maximal VIO). The*bias*is to test your cherished hypothesis against hypotheses which already have a fairly low probability (such as the null hypothesis, perhaps), rather than testing it against the most plausible alternatives.

UDT was a fairly simple and workable idea in classical Bayesian settings with logical omniscience (or with some simple logical uncertainty treated as if it were empirical uncertainty), but it was always intended to utilize logical uncertainty at its core. Logical induction, our current-best theory of logical uncertainty, doesn’t turn out to work very well with UDT so far. The basic problem seems to be that UDT required “updates” to be represented in a fairly explicit way: you have a prior which already contains all the potential things you can learn, and an update is just selecting certain possibilities. Logical induction, in contrast, starts out “really ignorant” and adds structure, not just content, to its beliefs over time. Optimizing via the early beliefs doesn’t look like a very good option, as a result.

FDT requires a notion of logical causality, which hasn’t appeared yet.

Taking logical uncertainty into account, all games become iterated games in a significant sense, because players can reason about each other by looking at what happens in very close situations. If the players have T seconds to think, they can simulate the same game but given t<<T time to think, for many t. So, they can learn from the sequence of “smaller” games.

This might seem like a good thing. For example, single-shot prisoner’s dilemma has just a Nash equilibrium of defection. Iterated play cas cooperative equilibria, such as tit-for-tat.

Unfortunately, the folk theorem of game theory implies that there are a whole lot of fairly bad equilibria for iterated games as well. It is

possiblethat each player enforces a cooperative equilibrium via tit-for-tat-like strategies. However, it is just as possible for players to end up in a mutual blackmail double bind, as follows:Both players initially have some suspicion that the other player is following strategy X: “cooperate 1% of the time if and only if the other player is playing consistently with strategy X; otherwise, defect 100% of the time.” As a result of this suspicion, both players play via strategy X in order to get the 1% cooperation rather than 0%.

Ridiculously bad “coordination” like that can be avoided via cooperative oracles, but that requires everyone to somehow have access to such a thing. Distributed oracles are more realistic in that each player can compute them just by reasoning about the others, but players using distributed oracles can be exploited.

So, how do you avoid supremely bad coordination in a way which isn’t too badly exploitable?

The problem of specifying good counterfactuals sort of wraps up any and all other problems of decision theory into itself, which makes this a bit hard to answer. Different potential decision theories may lean more or less heavily on the counterfactuals. If you lead toward EDT-like decision theories, the problem with counterfactuals is mostly just the problem of making UDT-like solutions work. For CDT-like decision theories, it is the other way around; the problem of getting UDT to work is mostly about getting the right counterfactuals!

The mutual-blackmail problem I mentioned in my “coordination” answer is a good motivating example. How do you ensure that the agents don’t come to think “I have to play strategy X, because if I don’t, the other player will cooperate 0% of the time?”