If there’s a population of mostly TDT/UDT agents and few CDT agents (and nobody knows who the CDT agents are) and they’re randomly paired up to play one-shot PD, then the CDT agents do better. What does this imply?
How does a CDT agent itself know that it is one of the CDT agents? Wouldn’t they be uncertain about it? Even if CDTness is hypothetically caused by a known / testable genetic mutation, wouldn’t there be uncertainty around the truth of that fact?
If you think you might be a CDT agent but you’re unsure about it, you have to be careful not to stumble over an infohazard (including via introspection of your own mind) which proves or suggests that actually you are an UDT agent.
UDT agents might also be uncertain about their UDTness, but at least under UDT you don’t have to worry about infohazards lurking in your own mind, right?
So one possible answer is that, in return for doing worse in this setup, UDT agents get more freedom to think and operate their own minds without worrying that they will be (further) disadvantaged by their own thoughts. Depending on how contrived and uncommon this kind of setup is throughout the multiverse, that might be a tradeoff worth making.
Also, it seems plausible that at least some humans might already be over the threshold where we don’t really get a choice anyway—we’ve already had the thoughts that inevitably lead us down the path of being or delegating to UDT agents, whether we like it or not. I’m not sure whether this is true, but if it is, that actually seems like good news about the likely distribution of CDT vs. UDT agents across the multiverse—perhaps the supermajority of logically possible agents above a certain low-ish capabilities threshold will end up as UDT agents.
Why would they be uncertain about whether they’re a CDT agent? Being a CDT agent surely just means by definition they evaluate decisions based on causal outcomes. It feels confused to say that they have to be uncertain about/reflect on which decision theory they have and then apply it, rather than their being a CDT agent being an ex hypothesis fact about how they behave
What kind of decision theory the agent will use and how it will behave in specific circumstances is a proposition about the agent and about the world which can be true or false, just like any other proposition. Within any particular mind (the agent’s own mind, Omega’s, ours) propositions of any kind come with differing degrees of uncertainty attached, logical or otherwise.
You can of course suppose for the purposes of the thought experiment that the agent itself has an overwhelmingly strong prior belief about its own behavior, either because it can introspect and knows its own mind well, or because it has been told how it will behave by a trustworthy and powerful source such as Omega. You could also stipulate that the entirety of the agent is a formally specified computer program, perhaps one that is incapable of reflection or forming beliefs about itself at all.
However, any such suppositions would be additional constraints on the setup, which makes it weirder and less likely to be broadly applicable in real life (or anywhere else in the multiverse).
Also, even if the agent starts with 99.999%+ credence that it is a CDT agent and will behave accordingly, perhaps there’s some chain of thoughts the agent could think which provide strong enough evidence to overcome this prior (and thus change the agent’s behavior). Perhaps this chain of thoughts is very unlikely to actually occur to the agent, or for the purposes of the thought experiment, we only consider agents which flatly aren’t capable of such introspection or belief-updating. But that may (or may not!) mean restricting the applicability of the thought experiment to a much smaller and rather impoverished class of minds.
I understand it’s a proposition like any other, I don’t see why an agent would reflect on it/use it in their deliberation to decide what to do. The fact that they’re a CDT agent is a fact about how they will act in the decision, not a fact that they need to use in their deliberation
Analogous to preferences, whether or not an agent prefers A or B is a proposition like any other, but I don’t think it’s a natural way to model them as first consult the credences they have assigned to “I prefer A to B” etc. Rather, they will just choose A ex hypothesis because that’s what having the preference means.
They don’t have to, and for the purposes of the thought experiment you could specify that they simply don’t. But humans are often pretty uncertain about their own preferences, and about what kind of decision theory they can or should use. Many of these humans are pretty strongly inclined to deliberate, reflect, and argue about these beliefs, and take into account their own uncertainty in them when making decisions.
So I’m saying that if you stipulate that no such reflection or deliberation occurs, you might be narrowing the applicability of the thought experiment to exclude human-like minds, which may be a rather large and important class of all possible minds.
How does a CDT agent itself know that it is one of the CDT agents? Wouldn’t they be uncertain about it? Even if CDTness is hypothetically caused by a known / testable genetic mutation, wouldn’t there be uncertainty around the truth of that fact?
If you think you might be a CDT agent but you’re unsure about it, you have to be careful not to stumble over an infohazard (including via introspection of your own mind) which proves or suggests that actually you are an UDT agent.
UDT agents might also be uncertain about their UDTness, but at least under UDT you don’t have to worry about infohazards lurking in your own mind, right?
So one possible answer is that, in return for doing worse in this setup, UDT agents get more freedom to think and operate their own minds without worrying that they will be (further) disadvantaged by their own thoughts. Depending on how contrived and uncommon this kind of setup is throughout the multiverse, that might be a tradeoff worth making.
Also, it seems plausible that at least some humans might already be over the threshold where we don’t really get a choice anyway—we’ve already had the thoughts that inevitably lead us down the path of being or delegating to UDT agents, whether we like it or not. I’m not sure whether this is true, but if it is, that actually seems like good news about the likely distribution of CDT vs. UDT agents across the multiverse—perhaps the supermajority of logically possible agents above a certain low-ish capabilities threshold will end up as UDT agents.
Why would they be uncertain about whether they’re a CDT agent? Being a CDT agent surely just means by definition they evaluate decisions based on causal outcomes. It feels confused to say that they have to be uncertain about/reflect on which decision theory they have and then apply it, rather than their being a CDT agent being an ex hypothesis fact about how they behave
What kind of decision theory the agent will use and how it will behave in specific circumstances is a proposition about the agent and about the world which can be true or false, just like any other proposition. Within any particular mind (the agent’s own mind, Omega’s, ours) propositions of any kind come with differing degrees of uncertainty attached, logical or otherwise.
You can of course suppose for the purposes of the thought experiment that the agent itself has an overwhelmingly strong prior belief about its own behavior, either because it can introspect and knows its own mind well, or because it has been told how it will behave by a trustworthy and powerful source such as Omega. You could also stipulate that the entirety of the agent is a formally specified computer program, perhaps one that is incapable of reflection or forming beliefs about itself at all.
However, any such suppositions would be additional constraints on the setup, which makes it weirder and less likely to be broadly applicable in real life (or anywhere else in the multiverse).
Also, even if the agent starts with 99.999%+ credence that it is a CDT agent and will behave accordingly, perhaps there’s some chain of thoughts the agent could think which provide strong enough evidence to overcome this prior (and thus change the agent’s behavior). Perhaps this chain of thoughts is very unlikely to actually occur to the agent, or for the purposes of the thought experiment, we only consider agents which flatly aren’t capable of such introspection or belief-updating. But that may (or may not!) mean restricting the applicability of the thought experiment to a much smaller and rather impoverished class of minds.
I understand it’s a proposition like any other, I don’t see why an agent would reflect on it/use it in their deliberation to decide what to do. The fact that they’re a CDT agent is a fact about how they will act in the decision, not a fact that they need to use in their deliberation
Analogous to preferences, whether or not an agent prefers A or B is a proposition like any other, but I don’t think it’s a natural way to model them as first consult the credences they have assigned to “I prefer A to B” etc. Rather, they will just choose A ex hypothesis because that’s what having the preference means.
They don’t have to, and for the purposes of the thought experiment you could specify that they simply don’t. But humans are often pretty uncertain about their own preferences, and about what kind of decision theory they can or should use. Many of these humans are pretty strongly inclined to deliberate, reflect, and argue about these beliefs, and take into account their own uncertainty in them when making decisions.
So I’m saying that if you stipulate that no such reflection or deliberation occurs, you might be narrowing the applicability of the thought experiment to exclude human-like minds, which may be a rather large and important class of all possible minds.