Some more thoughts: we can portray the process of choosing a successor policy as the iterative process of making more and more commitments over time. But what does it actually look like to make a commitment? Well, consider an agent that is made of multiple subagents, that each get to vote on its decisions. You can think of a commitment as basically saying “this subagent still gets to vote, but no longer gets updated”—i.e. it’s a kind of stop-gradient.
Two interesting implications of this perspective:
The “cost” of a commitment can be measured both in terms of “how often does the subagent vote in stupid ways?”, and also “how much space does it require to continue storing this subagent?” But since we’re assuming that agents get much smarter over time, probably the latter is pretty small.
There’s a striking similarity to the problem of trapped priors in human psychology. Parts of our brains basically are subagents that still get to vote but no longer get updated. And I don’t think this is just a bug—it’s also a feature. This is true on the level of biological evolution (you need to have a strong fear of death in order to actually survive) and also on the level of cultural evolution (if you can indoctrinate kids in a way that sticks, then your culture is much more likely to persist).
The (somewhat provocative) way of phrasing this is that trauma is evolution’s approach to implementing UDT. Someone who’s been traumatized into conformity by society when they were young will then (in theory) continue obeying society’s dictates even when they later have more options. Someone who gets very angry if mistreated in a certain way is much harder to mistreat in that way. And of course trauma is deeply suboptimal in a bunch of ways, but so too are UDT commitments, because they were made too early to figure out better alternatives.
This is clearly only a small component of the story but the analogy is definitely a very interesting one.
More thoughts: what’s the difference between paying in a counterfactual mugging based on:
Whether the millionth digit of pi (5) is odd or even
Whether or not there are an infinite number of primes?
In the latter case knowing the truth is (near-)inextrictably entangled with a bunch of other capabilities, like the ability to do advanced mathematics. Whereas in the former it isn’t. Suppose that before you knew either fact you were told that one of them was entangled in this way—would you still want to commit to paying out in a mugging based on it?
Well… maybe? But it means that the counterlogical of “if there hadn’t been an infinite number of primes” is not very well-defined—it’s hard to modify your brain to add that belief without making a bunch of other modifications. So now Omega doesn’t just have to be (near-)omniscient, it also needs to have a clear definition of the counterlogical that’s “fair” according to your standards; without knowing that it has that, paying up becomes less tempting.
Individually logical counterfactuals don’t seem very coherent. This is related to the “I’m an algorithm” vs. “I’m a physical object” distinction of FDT. When you are an algorithm considering a decision, you want to mark all sites of intervention/influence in the world where the world depends on your behavior. If you only mark some of them, then you later fail at the step where you ask what happens if you act differently, you obtain a broken counterfactual world where only some instances of the fact of your behavior have been replaced and not others.
So I think it makes a bit more sense to ask where specifically your brain depends on a fact, to construct an exhausive dependence of your brain on the fact, before turning to particular counterfactual content for that fact to be replaced with. That is, dependence of a system on a fact, the way it varies with the fact, seems potentially clearer than individual counterfactuals of how that system works if the fact is set to be a certain way. (To make a somewhat hopeless analogy, fibration instead of individual fibers, and it shouldn’t be a problem that all fibers are different from each other. Any question about a counterfactual should be reformulated into a question about a dependence.)
Some more thoughts: we can portray the process of choosing a successor policy as the iterative process of making more and more commitments over time. But what does it actually look like to make a commitment? Well, consider an agent that is made of multiple subagents, that each get to vote on its decisions. You can think of a commitment as basically saying “this subagent still gets to vote, but no longer gets updated”—i.e. it’s a kind of stop-gradient.
Two interesting implications of this perspective:
The “cost” of a commitment can be measured both in terms of “how often does the subagent vote in stupid ways?”, and also “how much space does it require to continue storing this subagent?” But since we’re assuming that agents get much smarter over time, probably the latter is pretty small.
There’s a striking similarity to the problem of trapped priors in human psychology. Parts of our brains basically are subagents that still get to vote but no longer get updated. And I don’t think this is just a bug—it’s also a feature. This is true on the level of biological evolution (you need to have a strong fear of death in order to actually survive) and also on the level of cultural evolution (if you can indoctrinate kids in a way that sticks, then your culture is much more likely to persist).
The (somewhat provocative) way of phrasing this is that trauma is evolution’s approach to implementing UDT. Someone who’s been traumatized into conformity by society when they were young will then (in theory) continue obeying society’s dictates even when they later have more options. Someone who gets very angry if mistreated in a certain way is much harder to mistreat in that way. And of course trauma is deeply suboptimal in a bunch of ways, but so too are UDT commitments, because they were made too early to figure out better alternatives.
This is clearly only a small component of the story but the analogy is definitely a very interesting one.
More thoughts: what’s the difference between paying in a counterfactual mugging based on:
Whether the millionth digit of pi (5) is odd or even
Whether or not there are an infinite number of primes?
In the latter case knowing the truth is (near-)inextrictably entangled with a bunch of other capabilities, like the ability to do advanced mathematics. Whereas in the former it isn’t. Suppose that before you knew either fact you were told that one of them was entangled in this way—would you still want to commit to paying out in a mugging based on it?
Well… maybe? But it means that the counterlogical of “if there hadn’t been an infinite number of primes” is not very well-defined—it’s hard to modify your brain to add that belief without making a bunch of other modifications. So now Omega doesn’t just have to be (near-)omniscient, it also needs to have a clear definition of the counterlogical that’s “fair” according to your standards; without knowing that it has that, paying up becomes less tempting.
Individually logical counterfactuals don’t seem very coherent. This is related to the “I’m an algorithm” vs. “I’m a physical object” distinction of FDT. When you are an algorithm considering a decision, you want to mark all sites of intervention/influence in the world where the world depends on your behavior. If you only mark some of them, then you later fail at the step where you ask what happens if you act differently, you obtain a broken counterfactual world where only some instances of the fact of your behavior have been replaced and not others.
So I think it makes a bit more sense to ask where specifically your brain depends on a fact, to construct an exhausive dependence of your brain on the fact, before turning to particular counterfactual content for that fact to be replaced with. That is, dependence of a system on a fact, the way it varies with the fact, seems potentially clearer than individual counterfactuals of how that system works if the fact is set to be a certain way. (To make a somewhat hopeless analogy, fibration instead of individual fibers, and it shouldn’t be a problem that all fibers are different from each other. Any question about a counterfactual should be reformulated into a question about a dependence.)