Personally I don’t see the examples given as flaws in Causal Decision Theory at all. The flaw is in the problem statements not CDT.
In the alien predictor example, the key question is “when does the agent set its strategy?”. If the agent’s strategy is set before the prediction is made, then CDT works fine. The agent decides in advance to commit itself to opening one box, the alien realises that, the agent follows through with it and gets $1,000,000. Which is exactly how humans win that game as well. If on the other hand the agent’s strategy is not set until after the prediction, well I ask you what is the alien actually predicting? The alien cannot prediction the agent’s choice, because we’ve just said the agent’s strategy is not defined yet. However, what the alien can predict is the process by which the agent’s strategy will be set. In that case, there is a meta-agent, which has a strategy of “Force the agent to use CDT” (or something like that). In that case, the alien is really predicting the meta-agent, and the meta-agent has made a bad decision. The reality in that case is that the meta-agent is the one playing the game. There’s no $1,000,000 in the box not because of anything the agent did, but because the meta-agent made a poor decision to commit to creating an agent that would open both boxes. By the time the agent starts to operate there’s already no money in the box and it’s actually doing the correct thing to at least get $1,000 anyway.
The confusion comes from the problem being framed in terms of the subgame of “one box or two”, when the real game being played is “pick a box-opening strategy for the alien to predict”, and CDT can solve that one perfectly well.
A similar problem occurs in the prisoner’s dilemna. Again the problem is “when is the agent’s strategy set?”, or in this case in particular “when is the opponent’s strategy set?”. If the opponent’s strategy is already set before the agent makes its decision, then defecting is correct. However, in the example you give the agent supposedly knows whether it’s playing against itself. The implication is that by changing its strategy it implicitly changes its opponent’s strategy as well. If the agent does not know this, then it has simply been misled. If it does know this, then CDT is perfectly capable of telling it to co-operate with itself to acheive the best payout. Again, the metagame is “pick a strategy that optimises the prisoner’s dilemna where you might be playing against yourself, or against other agents who are aware of your strategy etc.”. Once again CDT is perfectly capable of handling this metagame.
I’ve got nothing against the idea of decision theories that are robust to misleading problem statements, or that are “extended” to include awareness of the fact that they’re in a metagame, but the examples given here don’t demonstrate any flaws in CDT to me as such. They seem akin to a chess-playing agent that isn’t told the opponent has a queen, they’re simply getting the wrong answers because they’re solving the wrong problems.
I might be missing the point about the differences between these things, but since my understanding of the terminology (I was familiar with the concepts already) is based on this article it’s still a problem with the article. Or my reading comprehension skills I suppose, but I’ll stick with blaming the article for now.
In the alien predictor example, the key question is “when does the agent set its strategy?”. If the agent’s strategy is set before the prediction is made, then CDT works fine.
What if the prediction was made before the agent was created?
I effectively already go on to address that question in the rest of the paragraph. I see no meaningful difference between “strategy not set” and “agent not created”, if there is a difference to you please elborate.
At risk of repeating myself, to answer your question anyway, I ask: How can the alien successfully predict the strategy of something which has yet to even be created? If the alien is just guessing then opening both boxes is clearly correct anyway. Otherwise, the alien must know something about the process by which the agent is created. In this case, as I explain in the original comment, there is a meta-agent which is whatever creates the agent, and that is also what the alien is predicting the behaviour of. If there’s no money in the box it’s due to a poor meta-agent strategy which the agent then has no means to rectify. CDT seems to me perfectly capable of generating the correct strategies for both the agent and the meta-agent in this case.
In this case, as I explain in the original comment, there is a meta-agent which is whatever creates the agent, and that is also what the alien is predicting the behaviour of. If there’s no money in the box it’s due to a poor meta-agent strategy which the agent then has no means to rectify.
There doesn’t have to be any “meta-agent”. Humans evolved from non-agent stuff.
How can the alien successfully predict the strategy of something which has yet to even be created?
Assume that the agent deterministically originates from initial conditions known to the predictor, but the initial conditions don’t yet constitute an agent.
I see no meaningful difference between “strategy not set” and “agent not created”, if there is a difference to you please elborate.
If the agent that eventually appears, but wasn’t present at the outset, follows something like TDT, it wins Newcomb’s problem, even though it didn’t have an opportunity to set an initial strategy (make a precommitment).
This is what I meant in the grandparent comment: “When does the agent set its strategy?” is not a key question when there is no agent to set that strategy, and yet such situation isn’t hopeless, it can be controlled using considerations other than precommitment.
Ok, so firstly I now at least understand the difference you see between strategy-not-set and agent-not-created—that in only one case was there the clear potential for pre-commitment. I still think it’s beside the point, but that does require some explaining.
When I talk about a meta-agent, I don’t mean to imply the existence of any sort of intelligence or sentience for it, I simply mean there exists some process outside of the agent. The agent cannot gain or lose the $1,000,000 without changing that process, something which it has no control over. Whether this outside process is by way of an intelligent meta-agent that should have known better, the blind whims of chance, or the harshness of a deterministic reality is beside the point. Whether agents which choose one box do better than agents which choose both is a different question from whether it is correct to choose both boxes or not. When you switch a winning agent for a losing one, you simultaneously switch the situation that agent is presented with from a winning situation to a losing one.
It makes me think of the following paradox: Imagine that at some point in your life, God (or an alien or whatever) looks at whether you have been a perfect rationalist, and if so punches you in the nose. And assume of course that you are well aware of this. “Easy!” you think, “Just make one minor irrational decision and your nose will be fine”. But, that would of course be the perfectly rational thing to do, and so you still get punched in the nose. You get the idea. Should I now go on to say that rationalism doesn’t always win, and we therefore need to construct a new approach which does? (hint: of course not, but try and see the parallels here.)
In any case, rather than continue to argue over what is a fairly controvertial paradox even for humans let alone decision theories, let me take another tack here. If you are a firm believer in the predictive power of the alien then the problem is entirely equivalent to:
Choice A: Get $1,000
Choice B: Get $1,000,000
If CDT is presented with this problem, it would surely choose $1,000,000. The only way I see it wouldn’t is if CDT is defined as something like “Make the correct decision, except for being deliberately obtuse in insisting that causality is strictly temporal”, and then lo and behold it loses in paradoxes relating to non-temporal causality. If that’s what CDT effectively means, then fine, it loses. But to me, we don’t need a substantially different decision theory to resolve this paradox, we need to apply substantially the same decision theory to a different problem. To me, answering the question “what are the outcomes of my decisions” is part of defining the problem, not part of decision theory.
So the examples in the article still don’t motivate me to see a need for substantially new decision theories, just more clearly defined problems. If the other decision theories are all about saying “I’m solving the wrong problem” then that’s fine, and I can imagine potentially useful, but based on the examples given at least it still seems like the backwards way of going about things.
When I talk about a meta-agent, I don’t mean to imply the existence of any sort of intelligence or sentience for it, I simply mean there exists some process outside of the agent. The agent cannot gain or lose the $1,000,000 without changing that process, something which it has no control over.
An agent may have no control over what its source code is, but it does have control over what that source code does.
You can’t have it both ways. Either the agent’s behaviour is deterministic, or the alien cannot reliably predict it. If it is deterministic, what the source code is determines what the source code does, so it is contradictory to claim the agent can change one but not the other (if by “control” you mean “is responsible for” then that’s a different issue). If it is not deterministic, then aside from anything else the whole paradox falls apart.
Nope. See also the free will sequence. The decision is deterministic. The agent is the part of the deterministic structure that determines it, that controls what it actually is, the agent is the source code. The agent can’t change neither its source code, nor its decision, but it does determine its decision, it controls what the decision actually is without of course changing what it actually is, because it can be nothing else than what the agent decides.
Personally I don’t see the examples given as flaws in Causal Decision Theory at all. The flaw is in the problem statements not CDT.
In the alien predictor example, the key question is “when does the agent set its strategy?”. If the agent’s strategy is set before the prediction is made, then CDT works fine. The agent decides in advance to commit itself to opening one box, the alien realises that, the agent follows through with it and gets $1,000,000. Which is exactly how humans win that game as well. If on the other hand the agent’s strategy is not set until after the prediction, well I ask you what is the alien actually predicting? The alien cannot prediction the agent’s choice, because we’ve just said the agent’s strategy is not defined yet. However, what the alien can predict is the process by which the agent’s strategy will be set. In that case, there is a meta-agent, which has a strategy of “Force the agent to use CDT” (or something like that). In that case, the alien is really predicting the meta-agent, and the meta-agent has made a bad decision. The reality in that case is that the meta-agent is the one playing the game. There’s no $1,000,000 in the box not because of anything the agent did, but because the meta-agent made a poor decision to commit to creating an agent that would open both boxes. By the time the agent starts to operate there’s already no money in the box and it’s actually doing the correct thing to at least get $1,000 anyway.
The confusion comes from the problem being framed in terms of the subgame of “one box or two”, when the real game being played is “pick a box-opening strategy for the alien to predict”, and CDT can solve that one perfectly well.
A similar problem occurs in the prisoner’s dilemna. Again the problem is “when is the agent’s strategy set?”, or in this case in particular “when is the opponent’s strategy set?”. If the opponent’s strategy is already set before the agent makes its decision, then defecting is correct. However, in the example you give the agent supposedly knows whether it’s playing against itself. The implication is that by changing its strategy it implicitly changes its opponent’s strategy as well. If the agent does not know this, then it has simply been misled. If it does know this, then CDT is perfectly capable of telling it to co-operate with itself to acheive the best payout. Again, the metagame is “pick a strategy that optimises the prisoner’s dilemna where you might be playing against yourself, or against other agents who are aware of your strategy etc.”. Once again CDT is perfectly capable of handling this metagame.
I’ve got nothing against the idea of decision theories that are robust to misleading problem statements, or that are “extended” to include awareness of the fact that they’re in a metagame, but the examples given here don’t demonstrate any flaws in CDT to me as such. They seem akin to a chess-playing agent that isn’t told the opponent has a queen, they’re simply getting the wrong answers because they’re solving the wrong problems.
I might be missing the point about the differences between these things, but since my understanding of the terminology (I was familiar with the concepts already) is based on this article it’s still a problem with the article. Or my reading comprehension skills I suppose, but I’ll stick with blaming the article for now.
What if the prediction was made before the agent was created?
I effectively already go on to address that question in the rest of the paragraph. I see no meaningful difference between “strategy not set” and “agent not created”, if there is a difference to you please elborate.
At risk of repeating myself, to answer your question anyway, I ask: How can the alien successfully predict the strategy of something which has yet to even be created? If the alien is just guessing then opening both boxes is clearly correct anyway. Otherwise, the alien must know something about the process by which the agent is created. In this case, as I explain in the original comment, there is a meta-agent which is whatever creates the agent, and that is also what the alien is predicting the behaviour of. If there’s no money in the box it’s due to a poor meta-agent strategy which the agent then has no means to rectify. CDT seems to me perfectly capable of generating the correct strategies for both the agent and the meta-agent in this case.
There doesn’t have to be any “meta-agent”. Humans evolved from non-agent stuff.
Assume that the agent deterministically originates from initial conditions known to the predictor, but the initial conditions don’t yet constitute an agent.
If the agent that eventually appears, but wasn’t present at the outset, follows something like TDT, it wins Newcomb’s problem, even though it didn’t have an opportunity to set an initial strategy (make a precommitment).
This is what I meant in the grandparent comment: “When does the agent set its strategy?” is not a key question when there is no agent to set that strategy, and yet such situation isn’t hopeless, it can be controlled using considerations other than precommitment.
Ok, so firstly I now at least understand the difference you see between strategy-not-set and agent-not-created—that in only one case was there the clear potential for pre-commitment. I still think it’s beside the point, but that does require some explaining.
When I talk about a meta-agent, I don’t mean to imply the existence of any sort of intelligence or sentience for it, I simply mean there exists some process outside of the agent. The agent cannot gain or lose the $1,000,000 without changing that process, something which it has no control over. Whether this outside process is by way of an intelligent meta-agent that should have known better, the blind whims of chance, or the harshness of a deterministic reality is beside the point. Whether agents which choose one box do better than agents which choose both is a different question from whether it is correct to choose both boxes or not. When you switch a winning agent for a losing one, you simultaneously switch the situation that agent is presented with from a winning situation to a losing one.
It makes me think of the following paradox: Imagine that at some point in your life, God (or an alien or whatever) looks at whether you have been a perfect rationalist, and if so punches you in the nose. And assume of course that you are well aware of this. “Easy!” you think, “Just make one minor irrational decision and your nose will be fine”. But, that would of course be the perfectly rational thing to do, and so you still get punched in the nose. You get the idea. Should I now go on to say that rationalism doesn’t always win, and we therefore need to construct a new approach which does? (hint: of course not, but try and see the parallels here.)
In any case, rather than continue to argue over what is a fairly controvertial paradox even for humans let alone decision theories, let me take another tack here. If you are a firm believer in the predictive power of the alien then the problem is entirely equivalent to:
Choice A: Get $1,000
Choice B: Get $1,000,000
If CDT is presented with this problem, it would surely choose $1,000,000. The only way I see it wouldn’t is if CDT is defined as something like “Make the correct decision, except for being deliberately obtuse in insisting that causality is strictly temporal”, and then lo and behold it loses in paradoxes relating to non-temporal causality. If that’s what CDT effectively means, then fine, it loses. But to me, we don’t need a substantially different decision theory to resolve this paradox, we need to apply substantially the same decision theory to a different problem. To me, answering the question “what are the outcomes of my decisions” is part of defining the problem, not part of decision theory.
So the examples in the article still don’t motivate me to see a need for substantially new decision theories, just more clearly defined problems. If the other decision theories are all about saying “I’m solving the wrong problem” then that’s fine, and I can imagine potentially useful, but based on the examples given at least it still seems like the backwards way of going about things.
An agent may have no control over what its source code is, but it does have control over what that source code does.
You can’t have it both ways. Either the agent’s behaviour is deterministic, or the alien cannot reliably predict it. If it is deterministic, what the source code is determines what the source code does, so it is contradictory to claim the agent can change one but not the other (if by “control” you mean “is responsible for” then that’s a different issue). If it is not deterministic, then aside from anything else the whole paradox falls apart.
Nope. See also the free will sequence. The decision is deterministic. The agent is the part of the deterministic structure that determines it, that controls what it actually is, the agent is the source code. The agent can’t change neither its source code, nor its decision, but it does determine its decision, it controls what the decision actually is without of course changing what it actually is, because it can be nothing else than what the agent decides.