(I’ll review some motivations for decision theories in the context of Counterfactual Mugging, leading to the answer.)
Precommitment in the past, where it’s allowed, was a CDT-style solution to problems like this. You’d try making the most general possible precommitment as far in the past as possible that would respond to any possible future observations. This had two severe problems: it’s not always possible to be far enough in the past to make precommitments that would coordinate all relevant future events, and you have to plan every possible detail of future events in advance.
TDT partially resolves such problems by implementing coordinated decisions among the instances of the agent within agent’s current worlds (permitted by observations so far) that share the same epistemic state (or its aspects relevant to the decision) and decide for all of themselves together, so arrive at the same decision. (It makes sense for the decision to be a strategy that then can take into account additional information differentiating the instances of the agent.) This is enough for Newcomb’s problem and (some versions of) Prisoner’s Dilemma, but where coordination of agents in mutually exclusive counterfactuals are concerned, some of the tools break down.
Counterfactual Mugging both concerns agents located in mutually exclusive counterfactuals, and explicitly forbids the agent to be present in the past to make a precommitment, so TDT fails to apply. In this case, UDT (not relying on causal graphs) can define a common decision problem shared by the agents from different counterfactuals, if these agents can be first reduced to a shared epistemic state, so that all of them would arrive at the same decision (which takes the form of a strategy), which is then given each agent’s particular additional knowledge that differentiates it from the other agents within the group that makes the coordinated decision.
In the most general case, where we attempt to coordinate among all UDT agents, these agents arrive, without using any knowledge other than what can be generated by pure inference (assumed common among these agents), at a single global strategy that specifies the moves of all agents (depending on each agent’s particular knowledge and observations). However, when applied to a simple situation like Counterfactual Mugging, an agent only needs to purge itself of one bit of knowledge (identifying an agent) and select a simple coordinated strategy (for both agents) that takes that bit back as input to produce a concrete action.
So this takes us the whole circle, from deciding in a moment, to deciding (on a precommitment) in advance, and to deciding (on a coordinated strategy) in the present (of each instance). However, the condition for producing a coordinated strategy in the present is different from that for producing a precommitment in the past: all we need is shared state of knowledge among the to-be-coordinated agents, and not the state of knowledge they could’ve shared in the past, if they were to attempt a precommitment.
So for this problem, in coordinating with the other player (which let’s assume abstractly exists, even if with measure 0), you can use your knowledge of the millionth digit of pi, since both players share it. And using this shared knowledge, the strategy you both arrive at would favor the world that’s permitted by that value, in this case the paperclip world, the other world doesn’t matter, contrary to what would be the case with a coin toss instead of the accessible abstract fact. And since the other player has nothing of value to offer, you take the whole pie.
Suppose you’re currently running a decision theory that would “take the whole pie” in this situation. Now what if Omega first informed you of the setup without telling you what the millionth digit of pi is, and gave you a chance to self-modify? And suppose you don’t have enough computing power to compute the digit yourself at this point. Doesn’t it seems right to self-modify into someone who would give control of the universe to the staples maximizer, since that gives you 1⁄2 “logical” probability of 10^20 paperclips instead of 1⁄2 “logical” probability of 10^10 paperclips? What is wrong with this reasoning? And if it is wrong, both UDT1 and UDT2 are wrong since UDT1 would self-modify and UDT2 would give control to the staples maximizer without having to self-modify, so what’s the right decision theory?
And suppose you don’t have enough computing power to compute the digit yourself at this point. Doesn’t it seems right to self-modify into someone who would give control of the universe to the staples maximizer, since that gives you 1⁄2 “logical” probability of 10^20 paperclips instead of 1⁄2 “logical” probability of 10^10 paperclips?
Do you mean that I won’t have enough computing power also later, after the staple maximizer’s proposal is stated, or that there isn’t enough computing power just during the thought experiment? (In the latter case, I make the decision to think long enough to compute the digit of pi before making a decision.)
What does it mean to self-modify if no action is being performed, that is any decision regarding that action could be computed later without any preceding precommitments?
(One way in which a “self-modification” might be useful is when you won’t have enough computational power in the future to waste what computational power you have currently, and so you must make decisions continuously that take away some options from the future (perhaps by changing instrumental priority rather than permanently arresting opportunity to reconsider) and thereby simplify the future decision-making at the cost of making it less optimal. Another is where you have to signal precommitment to other players that wouldn’t be able to follow your more complicated future reasoning.)
Do you mean that I won’t have enough computing power also later, after the staple maximizer’s proposal is stated, or that there isn’t enough computing power just during the thought experiment?
You will have enough computing power later.
What does it mean to self-modify if no action is being performed, that is any decision regarding that action could be computed later without any preceding precommitments?
I mean suppose Omega gives you the option (now, when you don’t have enough computing power to compute the millionth digit of pi) of replacing yourself with another AI that has a different decision theory, one that would later give control of the universe to the staples maximizer. Should you take this option? If not, what decision theory would refuse it? (Again, from your current perspective, taking the option gives you 1⁄2 “logical” probability of 10^20 paperclips instead of 1⁄2 “logical” probability of 10^10 paperclips. How do you justify refusing this?)
I’ve changed my mind back. The 10^20 are only on the table for the loser, and can be given by the winner. When the winner/loser status is unknown, a winner might cooperate, since it allows the possibility of being a loser and receiving the prize. But if the winner knows own status, it can’t receive that prize, and the loser has no leverage. So there is nothing problematic about 10^20 becoming inaccessible: it is only potentially accessible to the loser, when the winner is weak (doesn’t know own status), while an informed winner won’t give it away, so that doesn’t happen. Resolving logical uncertainty makes the winner stronger, makes the loser weaker, and so the prize for the loser becomes smaller.
You’ve succeeded in convincing me that I’m confused about this problem, and don’t know how to make decisions in problems like this.
There’re two types of players in this game: those that win the logical lottery and those that lose (here, paperclip maximizer is a winner, and staple maximizer is a loser). A winner can either cooperate or defect against its loser opponent, with cooperation giving the winner 0 and loser 10^20, and defection giving the winner 10^10 and loser 0.
If a player doesn’t know whether it’s a loser or a winner, coordinating cooperation with its opponent has higher expected utility than coordinating defection, with mixed strategies presenting options for bargaining (the best coordinated strategy for a given player is to defect, with opponent cooperating). Thus, we have a full-fledged Prisoner’s Dilemma.
On the other hand, obtaining information about your identity (loser or winner) transforms the problem into one where you seemingly have only the choice between 0 and 10^10 (if you’re a winner), or always 0 with no ability to bargain for more (if you’re a loser). Thus, it looks like knowledge of a fact turns a problem into one of lower expected utility, irrespective of what the fact turns out to be, and takes away the incentives that would’ve made a higher win (10^20) possible. This doesn’t sound right, there should be a way of making the 10^20 accessible.
It’s like an instance of the problem involves not two, but four agents that should coordinate: a possible winner/loser pair, and a corresponding impossible pair. The impossible pair has a bizarre property that they know themselves to be impossible, like self-defeating theories PA+NOT(Con(PA)) (except that we’re talking about agent-provability and not provability), which doesn’t make them unable to reason. These four agents could form a coordinated decision, where the coordinated decision problem is obtained by throwing away the knowledge that’s not common between these four agents, in particular the digit of pi and winner/loser identity. After the decision is made, they plug back their particular information.
You’ve convinced me that I’m confused. I don’t know what is the correct decision in this situation anymore, or how to think about such decisions.
If you cooperate in such situations, this makes the value of the outcome of such thought experiments higher, and that applies for all individual instances of the thought experiments as well. The problem has ASP-ish feel to it, you’re punished for taking too much information into account, even though from the point of view of having taken that information into account, your resulting decision seems correct.
I don’t know what is the correct decision in this situation anymore, or how to think about such decisions.
Good, I’m in a similar state. :)
The problem has ASP-ish feel to it, you’re punished for taking too much information into account, even though from the point of view of having taken that information into account, your resulting decision seems correct.
Yes, I noticed the similarity as well, except in the ASP case it seems clearer what the right thing to do is.
(Grandparent was my comment, deleted while I was trying to come up with a clearer statement of my confusion, before I saw the reply. The new version is here.)
So you would also keep the money in Counterfactual Mugging with a logical coin? I don’t see how that can be right. About half of logical coins fall heads, so given a reasonable prior over Omegas, it makes more sense for the agent to always pay up, both in Counterfactual Mugging and in Wei’s problem. But of course using a prior over Omegas is cheating...
Then you’d be coordinating with players of other CM setups, not just with your own counterfactual opponent, you’d be breaking out of your thought experiment, and that’s against the rules! (Whatever “logical coin” is, the primary condition is for it to be shared among and accessible to all coordinating agents. If that’s so, like here, then I keep the money, assuming the thought experiment doesn’t leak control.)
assuming the thought experiment doesn’t leak control
:/ The whole point of thought experiments is that they leak control. ;P
“I seem to have found myself in a trolley problem! This is fantastically unlikely. I’m probably in some weird moral philosophy thought experiment and my actions are likely mostly going to be used as propaganda supporting the ‘obvious’ conclusions of one side or the other… oh and if I try to find a clever third option I’ll probably make myself counterfactual in most contexts. Does the fact that I’m thinking these thoughts affect what contexts I’m in? /brainasplodes”
I’m still not sure. You can look at it as cooperating with players of other CM setups, or as trying to solve the meta-question “what decision theory would be good at solving problems like this one?” Saying “50% of logical coins fall heads” seems to capture the intent of the problem class quite well, no?
The decision algorithm that takes the whole pie is good at solving problems like this one: for each specific pie it gets it whole. Making the same action is not good for solving the different problem of dividing all possible pies simultaneously, but then the difference is reflected in the problem statement, and so the reasons that make it decide correctly for individual problems won’t make it decide incorrectly for the joint problem.
I think it’s right to cooperate in this thought experiment only to the extent that we accept the impossibility of isolating this thought experiment from its other possible instances, but then it should just motivate restating the thought experiment so as to make its expected actual scope explicit.
I think it’s right to cooperate in this thought experiment only to the extent that we accept the impossibility of isolating this thought experiment from its other possible instances, but then it should just motivate restating the thought experiment so as to make its expected actual scope explicit.
(I’ll review some motivations for decision theories in the context of Counterfactual Mugging, leading to the answer.)
Precommitment in the past, where it’s allowed, was a CDT-style solution to problems like this. You’d try making the most general possible precommitment as far in the past as possible that would respond to any possible future observations. This had two severe problems: it’s not always possible to be far enough in the past to make precommitments that would coordinate all relevant future events, and you have to plan every possible detail of future events in advance.
TDT partially resolves such problems by implementing coordinated decisions among the instances of the agent within agent’s current worlds (permitted by observations so far) that share the same epistemic state (or its aspects relevant to the decision) and decide for all of themselves together, so arrive at the same decision. (It makes sense for the decision to be a strategy that then can take into account additional information differentiating the instances of the agent.) This is enough for Newcomb’s problem and (some versions of) Prisoner’s Dilemma, but where coordination of agents in mutually exclusive counterfactuals are concerned, some of the tools break down.
Counterfactual Mugging both concerns agents located in mutually exclusive counterfactuals, and explicitly forbids the agent to be present in the past to make a precommitment, so TDT fails to apply. In this case, UDT (not relying on causal graphs) can define a common decision problem shared by the agents from different counterfactuals, if these agents can be first reduced to a shared epistemic state, so that all of them would arrive at the same decision (which takes the form of a strategy), which is then given each agent’s particular additional knowledge that differentiates it from the other agents within the group that makes the coordinated decision.
In the most general case, where we attempt to coordinate among all UDT agents, these agents arrive, without using any knowledge other than what can be generated by pure inference (assumed common among these agents), at a single global strategy that specifies the moves of all agents (depending on each agent’s particular knowledge and observations). However, when applied to a simple situation like Counterfactual Mugging, an agent only needs to purge itself of one bit of knowledge (identifying an agent) and select a simple coordinated strategy (for both agents) that takes that bit back as input to produce a concrete action.
So this takes us the whole circle, from deciding in a moment, to deciding (on a precommitment) in advance, and to deciding (on a coordinated strategy) in the present (of each instance). However, the condition for producing a coordinated strategy in the present is different from that for producing a precommitment in the past: all we need is shared state of knowledge among the to-be-coordinated agents, and not the state of knowledge they could’ve shared in the past, if they were to attempt a precommitment.
So for this problem, in coordinating with the other player (which let’s assume abstractly exists, even if with measure 0), you can use your knowledge of the millionth digit of pi, since both players share it. And using this shared knowledge, the strategy you both arrive at would favor the world that’s permitted by that value, in this case the paperclip world, the other world doesn’t matter, contrary to what would be the case with a coin toss instead of the accessible abstract fact. And since the other player has nothing of value to offer, you take the whole pie.
Suppose you’re currently running a decision theory that would “take the whole pie” in this situation. Now what if Omega first informed you of the setup without telling you what the millionth digit of pi is, and gave you a chance to self-modify? And suppose you don’t have enough computing power to compute the digit yourself at this point. Doesn’t it seems right to self-modify into someone who would give control of the universe to the staples maximizer, since that gives you 1⁄2 “logical” probability of 10^20 paperclips instead of 1⁄2 “logical” probability of 10^10 paperclips? What is wrong with this reasoning? And if it is wrong, both UDT1 and UDT2 are wrong since UDT1 would self-modify and UDT2 would give control to the staples maximizer without having to self-modify, so what’s the right decision theory?
Do you mean that I won’t have enough computing power also later, after the staple maximizer’s proposal is stated, or that there isn’t enough computing power just during the thought experiment? (In the latter case, I make the decision to think long enough to compute the digit of pi before making a decision.)
What does it mean to self-modify if no action is being performed, that is any decision regarding that action could be computed later without any preceding precommitments?
(One way in which a “self-modification” might be useful is when you won’t have enough computational power in the future to waste what computational power you have currently, and so you must make decisions continuously that take away some options from the future (perhaps by changing instrumental priority rather than permanently arresting opportunity to reconsider) and thereby simplify the future decision-making at the cost of making it less optimal. Another is where you have to signal precommitment to other players that wouldn’t be able to follow your more complicated future reasoning.)
You will have enough computing power later.
I mean suppose Omega gives you the option (now, when you don’t have enough computing power to compute the millionth digit of pi) of replacing yourself with another AI that has a different decision theory, one that would later give control of the universe to the staples maximizer. Should you take this option? If not, what decision theory would refuse it? (Again, from your current perspective, taking the option gives you 1⁄2 “logical” probability of 10^20 paperclips instead of 1⁄2 “logical” probability of 10^10 paperclips. How do you justify refusing this?)
(continuing from here)
I’ve changed my mind back. The 10^20 are only on the table for the loser, and can be given by the winner. When the winner/loser status is unknown, a winner might cooperate, since it allows the possibility of being a loser and receiving the prize. But if the winner knows own status, it can’t receive that prize, and the loser has no leverage. So there is nothing problematic about 10^20 becoming inaccessible: it is only potentially accessible to the loser, when the winner is weak (doesn’t know own status), while an informed winner won’t give it away, so that doesn’t happen. Resolving logical uncertainty makes the winner stronger, makes the loser weaker, and so the prize for the loser becomes smaller.
Edit: Nope, I changed my mind back.
You’ve succeeded in convincing me that I’m confused about this problem, and don’t know how to make decisions in problems like this.
There’re two types of players in this game: those that win the logical lottery and those that lose (here, paperclip maximizer is a winner, and staple maximizer is a loser). A winner can either cooperate or defect against its loser opponent, with cooperation giving the winner 0 and loser 10^20, and defection giving the winner 10^10 and loser 0.
If a player doesn’t know whether it’s a loser or a winner, coordinating cooperation with its opponent has higher expected utility than coordinating defection, with mixed strategies presenting options for bargaining (the best coordinated strategy for a given player is to defect, with opponent cooperating). Thus, we have a full-fledged Prisoner’s Dilemma.
On the other hand, obtaining information about your identity (loser or winner) transforms the problem into one where you seemingly have only the choice between 0 and 10^10 (if you’re a winner), or always 0 with no ability to bargain for more (if you’re a loser). Thus, it looks like knowledge of a fact turns a problem into one of lower expected utility, irrespective of what the fact turns out to be, and takes away the incentives that would’ve made a higher win (10^20) possible. This doesn’t sound right, there should be a way of making the 10^20 accessible.
It’s like an instance of the problem involves not two, but four agents that should coordinate: a possible winner/loser pair, and a corresponding impossible pair. The impossible pair has a bizarre property that they know themselves to be impossible, like self-defeating theories PA+NOT(Con(PA)) (except that we’re talking about agent-provability and not provability), which doesn’t make them unable to reason. These four agents could form a coordinated decision, where the coordinated decision problem is obtained by throwing away the knowledge that’s not common between these four agents, in particular the digit of pi and winner/loser identity. After the decision is made, they plug back their particular information.
You’ve convinced me that I’m confused. I don’t know what is the correct decision in this situation anymore, or how to think about such decisions.
If you cooperate in such situations, this makes the value of the outcome of such thought experiments higher, and that applies for all individual instances of the thought experiments as well. The problem has ASP-ish feel to it, you’re punished for taking too much information into account, even though from the point of view of having taken that information into account, your resulting decision seems correct.
Good, I’m in a similar state. :)
Yes, I noticed the similarity as well, except in the ASP case it seems clearer what the right thing to do is.
(Grandparent was my comment, deleted while I was trying to come up with a clearer statement of my confusion, before I saw the reply. The new version is here.)
So you would also keep the money in Counterfactual Mugging with a logical coin? I don’t see how that can be right. About half of logical coins fall heads, so given a reasonable prior over Omegas, it makes more sense for the agent to always pay up, both in Counterfactual Mugging and in Wei’s problem. But of course using a prior over Omegas is cheating...
Then you’d be coordinating with players of other CM setups, not just with your own counterfactual opponent, you’d be breaking out of your thought experiment, and that’s against the rules! (Whatever “logical coin” is, the primary condition is for it to be shared among and accessible to all coordinating agents. If that’s so, like here, then I keep the money, assuming the thought experiment doesn’t leak control.)
:/ The whole point of thought experiments is that they leak control. ;P
“I seem to have found myself in a trolley problem! This is fantastically unlikely. I’m probably in some weird moral philosophy thought experiment and my actions are likely mostly going to be used as propaganda supporting the ‘obvious’ conclusions of one side or the other… oh and if I try to find a clever third option I’ll probably make myself counterfactual in most contexts. Does the fact that I’m thinking these thoughts affect what contexts I’m in? /brainasplodes”
This is exactly what my downscale copy thinks the first 3-5 times I try to run any though experiment. Often it’s followed by “**, I’m going to die!”
I don’t run though experiments containing myself at any level of detail if I can avoid it any more.
I’m still not sure. You can look at it as cooperating with players of other CM setups, or as trying to solve the meta-question “what decision theory would be good at solving problems like this one?” Saying “50% of logical coins fall heads” seems to capture the intent of the problem class quite well, no?
The decision algorithm that takes the whole pie is good at solving problems like this one: for each specific pie it gets it whole. Making the same action is not good for solving the different problem of dividing all possible pies simultaneously, but then the difference is reflected in the problem statement, and so the reasons that make it decide correctly for individual problems won’t make it decide incorrectly for the joint problem.
I think it’s right to cooperate in this thought experiment only to the extent that we accept the impossibility of isolating this thought experiment from its other possible instances, but then it should just motivate restating the thought experiment so as to make its expected actual scope explicit.
Agreed.