2TDT-1CDT—If there’s a population of mostly TDT/UDT agents and few CDT agents (and nobody knows who the CDT agents are) and they’re randomly paired up to play one-shot PD, then the CDT agents do better. What does this imply?
I don’t think that’s the case unless you have really weird assumptions. If the other party can’t tell what the TDT/UDT agent will pick, they’ll defect, won’t they? It seems strange that the other party would be able to tell what the TDT/UDT agent will pick but not whether they’re TDT/UDT or CDT.
Edit: OK, I see the idea is that the TDT/UDT agents have known, fixed code, which can, e.g., randomly mutate into CDT. They can’t voluntarily change their code. Being able to trick the other party about your code is an advantage—I don’t see that as a TDT/UDT problem.
Nobody is being tricked though. Everyone knows there’s a CDT agent among the population, just not who, and we can assume they have a correct amount of uncertainty about what the other agent’s decision theory / source code is. The CDT agent still has an advantage in that case. And it is a problem because it means CDT agents don’t always want to become more UDT-like (it seems like there are natural or at least not completely contrived situations, like Omega punishing UDT agents just for using UDT, where they don’t), which takes away a major argument in its favor.
I think this is also a rather contrived scenario, because if the UDT agents could change their own code (silently) cooperation would immediately break down, so it is reliant on the CDT agents being able to have different code from the most common and thus expected code silently, and the UDT agents not.
I’m not sure why you say “if the UDT agents could change their own code (silently) cooperation would immediately break down”, because in my view a UDT agent would reason that if it changed its code (to something like CDT for example), that logically implies other UDT agents also changing their code to do the same thing, so the expected utility of changing its code would be evaluated as lower than not changing its code. So it would remain a UDT agent and still cooperate with other UDT agents or when the probability of the other agent being UDT is high enough.
To me this example is about a CDT agent not wanting to become UDT-like if it found itself in a situation with many other UDT agents, which just seems puzzling if your previous perspective was that UDT is a clear advancement in decision theory and everyone should adopt UDT or become more UDT-like.
I think, if you had several UDT agents with the same source code, and then one UDT agent with slightly different source code, you might see the unique agent defect.
I think the CDT agent has an advantage here because it is capable of making distinct decisions from the rest of the population—not because it is CDT.
The general hope is that slight differences in source code (or even large differences, as long as they’re all using UDT or something close to it) wouldn’t be enough to make a UDT agents defect against another UDT agent (i.e. the logical correlation between their decisions would be high enough), otherwise “UDT agents cooperate with each other in one-shot PD” would be false or not have much practical implications, since why would all UDT agents have the exact same source code?
There are at least two potential sources of cooperation: symmetry and mutual source code knowledge; symmetry should be fragile to small changes in source code (I expect) as well as asymmetry between the situations of the different parties while mutual source code knowledge doesn’t require those sorts of symmetry at all (but does require knowledge).
Edit: for some reason my intuition expects cooperation from similarity to be less fragile in the Newcomb’s problem/code knowledge case (similarity to simulation) than if the similarity is just plain similarity to another, non-simulation agent. I need to think about why and if this has any connection to what would actually happen.
I did not realize that the UDT agents were assumed to behave identically; I was thinking that the cooperation was maintained, not by symmetry, but by mutual source code knowledge.
If it’s symmetry, well, if you can sneak a different agent into a clique without getting singled out, that’s an advantage. Again not a problem with UDT as such.
Edit: of course they do behave identically because they did have identical code (which was the source of the knowledge). (Though I don’t expect agents in the same decision theory class to be identical in the typical case).
I don’t think that’s the case unless you have really weird assumptions. If the other party can’t tell what the TDT/UDT agent will pick, they’ll defect, won’t they? It seems strange that the other party would be able to tell what the TDT/UDT agent will pick but not whether they’re TDT/UDT or CDT.
Edit: OK, I see the idea is that the TDT/UDT agents have known, fixed code, which can, e.g., randomly mutate into CDT. They can’t voluntarily change their code. Being able to trick the other party about your code is an advantage—I don’t see that as a TDT/UDT problem.
Nobody is being tricked though. Everyone knows there’s a CDT agent among the population, just not who, and we can assume they have a correct amount of uncertainty about what the other agent’s decision theory / source code is. The CDT agent still has an advantage in that case. And it is a problem because it means CDT agents don’t always want to become more UDT-like (it seems like there are natural or at least not completely contrived situations, like Omega punishing UDT agents just for using UDT, where they don’t), which takes away a major argument in its favor.
I think this is also a rather contrived scenario, because if the UDT agents could change their own code (silently) cooperation would immediately break down, so it is reliant on the CDT agents being able to have different code from the most common and thus expected code silently, and the UDT agents not.
I’m not sure why you say “if the UDT agents could change their own code (silently) cooperation would immediately break down”, because in my view a UDT agent would reason that if it changed its code (to something like CDT for example), that logically implies other UDT agents also changing their code to do the same thing, so the expected utility of changing its code would be evaluated as lower than not changing its code. So it would remain a UDT agent and still cooperate with other UDT agents or when the probability of the other agent being UDT is high enough.
To me this example is about a CDT agent not wanting to become UDT-like if it found itself in a situation with many other UDT agents, which just seems puzzling if your previous perspective was that UDT is a clear advancement in decision theory and everyone should adopt UDT or become more UDT-like.
I think, if you had several UDT agents with the same source code, and then one UDT agent with slightly different source code, you might see the unique agent defect.
I think the CDT agent has an advantage here because it is capable of making distinct decisions from the rest of the population—not because it is CDT.
The general hope is that slight differences in source code (or even large differences, as long as they’re all using UDT or something close to it) wouldn’t be enough to make a UDT agents defect against another UDT agent (i.e. the logical correlation between their decisions would be high enough), otherwise “UDT agents cooperate with each other in one-shot PD” would be false or not have much practical implications, since why would all UDT agents have the exact same source code?
There are at least two potential sources of cooperation: symmetry and mutual source code knowledge; symmetry should be fragile to small changes in source code (I expect) as well as asymmetry between the situations of the different parties while mutual source code knowledge doesn’t require those sorts of symmetry at all (but does require knowledge).
Edit: for some reason my intuition expects cooperation from similarity to be less fragile in the Newcomb’s problem/code knowledge case (similarity to simulation) than if the similarity is just plain similarity to another, non-simulation agent. I need to think about why and if this has any connection to what would actually happen.
I mean, that’s a thing you might hope to be true. I’m not sure if it actually is true.
I did not realize that the UDT agents were assumed to behave identically; I was thinking that the cooperation was maintained, not by symmetry, but by mutual source code knowledge.
If it’s symmetry, well, if you can sneak a different agent into a clique without getting singled out, that’s an advantage. Again not a problem with UDT as such.
Edit: of course they do behave identically because they did have identical code (which was the source of the knowledge). (Though I don’t expect agents in the same decision theory class to be identical in the typical case).