What are the biggest issues that haven’t been solved for UDT or FDT?
UDT was a fairly simple and workable idea in classical Bayesian settings with logical omniscience (or with some simple logical uncertainty treated as if it were empirical uncertainty), but it was always intended to utilize logical uncertainty at its core. Logical induction, our current-best theory of logical uncertainty, doesn’t turn out to work very well with UDT so far. The basic problem seems to be that UDT required “updates” to be represented in a fairly explicit way: you have a prior which already contains all the potential things you can learn, and an update is just selecting certain possibilities. Logical induction, in contrast, starts out “really ignorant” and adds structure, not just content, to its beliefs over time. Optimizing via the early beliefs doesn’t look like a very good option, as a result.
FDT requires a notion of logical causality, which hasn’t appeared yet.
What is a co-ordination problem that hasn’t been solved?
Taking logical uncertainty into account, all games become iterated games in a significant sense, because players can reason about each other by looking at what happens in very close situations. If the players have T seconds to think, they can simulate the same game but given t<<T time to think, for many t. So, they can learn from the sequence of “smaller” games.
This might seem like a good thing. For example, single-shot prisoner’s dilemma has just a Nash equilibrium of defection. Iterated play cas cooperative equilibria, such as tit-for-tat.
Unfortunately, the folk theorem of game theory implies that there are a whole lot of fairly bad equilibria for iterated games as well. It is possible that each player enforces a cooperative equilibrium via tit-for-tat-like strategies. However, it is just as possible for players to end up in a mutual blackmail double bind, as follows:
Both players initially have some suspicion that the other player is following strategy X: “cooperate 1% of the time if and only if the other player is playing consistently with strategy X; otherwise, defect 100% of the time.” As a result of this suspicion, both players play via strategy X in order to get the 1% cooperation rather than 0%.
Ridiculously bad “coordination” like that can be avoided via cooperative oracles, but that requires everyone to somehow have access to such a thing. Distributed oracles are more realistic in that each player can compute them just by reasoning about the others, but players using distributed oracles can be exploited.
So, how do you avoid supremely bad coordination in a way which isn’t too badly exploitable?
And what still isn’t known about counterfactuals?
The problem of specifying good counterfactuals sort of wraps up any and all other problems of decision theory into itself, which makes this a bit hard to answer. Different potential decision theories may lean more or less heavily on the counterfactuals. If you lead toward EDT-like decision theories, the problem with counterfactuals is mostly just the problem of making UDT-like solutions work. For CDT-like decision theories, it is the other way around; the problem of getting UDT to work is mostly about getting the right counterfactuals!
The mutual-blackmail problem I mentioned in my “coordination” answer is a good motivating example. How do you ensure that the agents don’t come to think “I have to play strategy X, because if I don’t, the other player will cooperate 0% of the time?”
Doing a bunch of line editing on the post is very nice of you, but also comes off as possibly passive-agressive in the context of you not having said anything nice about the post… most of the edit suggestions just seem helpful, but I’m left feeling like your goal is to prove that the post is bad rather than improve it (especially since you say “If those were all solved, more might be visible” rather than something encouraging).
All I’m saying is I’m a bit weirded out. Maybe I’m mis-reading bluntness as hostility.
Anyway, I’ll probably try and incorporate some of the suggested edits soon.
I don’t think this is quite right, for reasons related to this post.
Sometimes a hypothesis can be “too strong” or “too weak”. Sometimes hypotheses can just be different. You mention the 2-4-6 task and the soda task. In the soda task, Hermoine makes a prediction which is “too strong” in that it predicts anything spilled on the robe will vanish; but also “too weak” in that it predicts the soda will not vanish if spilled on the floor. Actually, I’m not even sure if that is right. What does “too strong” mean? What is a maximally strong or weak hypothesis? Is it based on the entropy of the hypothesis?
I think this mis-places the difficulty in following Eliezer’s “twisty thinking” advice. The problem is that trying to disconfirm a hypothesis is not a specification of a computation you can just carry out. It sort of points in a direction; but, it relies on my ingenuity to picture the scenario where my hypothesis is false. What does this really mean? It means coming up with a second-best hypothesis and then finding a test which differentiates between the best and second best. Similarly, your “too strong” heuristic points in the direction of coming up with alternate hypotheses to test. But, I claim, it’s not really about being “too strong”.
What I would say instead is your test should differentiate between hypotheses (the best hypotheses you can think of; formally, your test should have maximal VIO). The bias is to test your cherished hypothesis against hypotheses which already have a fairly low probability (such as the null hypothesis, perhaps), rather than testing it against the most plausible alternatives.
[Edit: this comment starts off on a critical tone. After reading more comments which are very critical, I wanted to edit my comment to first at least indicate that I think you are communicating about it as best you can and am somewhat annoyed with those suggesting otherwise. Nonetheless, my comment focuses on a single paragraph in which you make a decision about how to communicate which I disagree with. This is neither a criticism of the central point of the essay, nor a criticism of the overall way in which you try to make your point here.]
[Edit 2: I think I get what Looking is now; see my reply to Moral Of Story’s comment.]
Another way I could try to say the “it’s okay” thing is something like, “The world is real in your immediate experience before you think about it. Set aside your interpretations and just look.” The trouble is, most people’s thinking system can grab statements like this and try to interpret them: if you think something like “Oh, that’s the map/territory distinction”, then all I can say is you are still looking at your phone.
There’s something very frustrating about this. Explaining in-parable: if you’re trying to tell me to look up, and I send a diagram of person-phone-gaze theory with anatomical markings indicating what I think you mean by “look up”, I know that understanding the graphic is not the same as “looking up”. What I want from you is any corrections you might have to the graphic. This may not actually help me to look up, but it may help—and more likely, it’ll help me know roughly the sort of thing I’m missing even if I can’t move my eyes as a result.
If someone doesn’t yet get the map-territory relation, you wouldn’t keep trying to show them the territory. It would help to make a map of the way the map-territory relation works, even though the ultimate goal is to help them look past maps in a sense. It could also help to show them some places where reality doesn’t work the way they think it does, to remind them that there’s a difference between what they think and what’s real. But if they don’t yet see an alternative they’ll just think you’re being mean, and say things like “just because I’m wrong sometimes doesn’t mean I should stop trying”.
Out-of-parable: my understanding is that there are two different things you’re getting at: the Kenshō itself, and the epistemic operation of Looking. The paragraph quoted above makes me think that the two are closely related, though.
It seems to me (based on this and other posts / interaction with you) like Looking has to do with the idea that we normally parse the world in pre-set ontologies, but there has to be a thing which builds the ontologies in the first place. Here, “ontology” does not mean the build-in framework of the hardware we run on (like telling a computer to look past bits and bytes to the thing which created bits and bytes in the first place—something it can’t physically do). Rather, “ontology” refers to provisional frameworks developed through experience. “Looking” intentionally engages the facilities which are involved in ontology-building-and-shifting.
For example, it used to be that when I would hear arguments for intuitionistic logic, I would interpret them in the framework of classical logic. This felt like using the best tools I had to evaluate proposed alternatives. Similarly, I still evaluate alternative ethical frameworks with a basically utilitarian lense. However, at some point, I gained the ability to evaluate arguments for intuitionism on their own. I think this was both an example of Looking and an insight which had to do with the nature of Looking (because it had to do with refactoring the map-territory relation). This instance of “Looking” seemed to basically require a lot of time with the subject—if there’s something earlier-me could have done to stop evaluating things through a purely classical-logic lense, I don’t know what it would be.
You’re claiming, as I understand it, that there’s a skill of Looking which can be immediate—not necessarily coming to the right conclusions immediately, I suppose, but immediately getting out of the evaluate-through-current-ontology trap. I mostly of believe you, and I can guess at mental motions which you might mean, but they are things like “use your inner sim rather than your inner narrative machine” or “try to look at a chair without seeing a chair, only splotches of light; then generalize this” or “turn your thoughts to what put the current ontology there in the first place” or “seperate your thinking about whether things ‘make sense’ from your thinking about whether they connect with evidence, so that you can notice when something has explanatory power even if it doesn’t fit with your preconceptions”. I don’t know which of these things you mean, if any.
Now, more speculatively, the Kenshō itself:
I have an intuition that whereas Looking is epistemic, the Kenshō is instrumental. Just as we can set up an ontology which becomes so familiar that we forget our basic ability to look, we can set up a way of doing and being which becomes so familiar that we forget the place it comes from. (So far, this is essentially your CFAR-level-two class, The Machine of You.) The connection between “It’s okay” and “The world is real in your immediate experience before you think about it. Set aside your interpretations and just look” is, on this understanding: once you engage your core rather than existing in your constructed way of being, there’s something supremely silly about worrying all the time or using guilt-driven motivation.
Rationality realism seems like a good thing to point out which might be a crux for a lot of people, but it doesn’t seem to be a crux for me.
I don’t think there’s a true rationality out there in the world, or a true decision theory out there in the world, or even a true notion of intelligence out there in the world. I work on agent foundations because there’s still something I’m confused about even after that, and furthermore, AI safety work seems fairly hopeless while still so radically confused about the-phenomena-which-we-use-intelligence-and-rationality-and-agency-and-decision-theory-to-describe. And, as you say, “from a historical point of view I’m quite optimistic about using maths to describe things in general”.