I have sympathy for the commenters who agreed to pay outright (Nesov and ata), but viewed purely logically, this problem is underdetermined, kinda like Transparent Newcomb’s (thx Manfred). This is a subtle point, bear with me.
Let’s assume you precommit to not pay if asked. Now take an Omega that strictly follows the rules of the problem, but also has one additional axiom: I will award the player $1000 no matter what. This Omega can easily prove that the world in which it asks you to pay is logically inconsistent, and then it concludes that in that world you do agree to pay (because a falsity implies every statement, and this one happened to come first lexicographically or something). So Omega decides to award you $1000, its axiom system stays perfectly consistent, and all the conditions of the problem are fulfilled. I stress that the statement “You would pay if Omega asked you to” is logically true in the axiom system outlined, because its antecedent is false.
In summary, the system of logical statements that specifies the problem does not completely determine what will happen, because we can consistently extend it with another axiom that makes Omega cooperate even if you defect. IOW, you can’t go wrong by cooperating, but some correct Omegas will reward defectors as well. It’s not clear to me if this problem can be “fixed”.
ETA: it seems that several other decision problems have a similar flaw. In Counterfactual Mugging with a logical coin it makes some defectors win, as in our problem, and in Parfit’s Hitchhiker it makes some cooperators lose.
This Omega can easily prove that the world in which it asks you to pay is logically inconsistent, and then it concludes that in that world you do agree to pay (because a falsity implies every statement, and this one happened to come first lexicographically or something).
This seems to be confusing “counterfactual::if” with “logical::if”. Noting that a world is impossible because the agents will not make the decisions that lead to that world does not mean that you can just make stuff up about that world since “anything is true about a world that doesn’t exist”.
Your objection would be valid if we had a formalized concept of “counterfactual if” distinct from “logical if”, but we don’t. When looking at the behavior of deterministic programs, I have no idea how to make counterfactual statements that aren’t logical statements.
When a program takes explicit input, you can look at what the program does if you pass this or that input, even if some inputs will in fact never be passed.
Noting that a world is impossible because the agents will not make the decisions that lead to that world does not mean that you can just make stuff up about that world since “anything is true about a world that doesn’t exist”.
If event S is empty, then for any Q you make up, it’s true that [for all s in S, Q]. This statement also holds if S was defined to be empty if [Not Q], or if Q follows from S being non-empty.
Yes you can make logical deductions of that form, but my point was that you can’t feed those conlusions back into the decision making process without invalidating the assumptions that went into those conclusions.
Both these statements are true, so I’d say they are consistent :-)
In particular, the first one is true because “The player would pay if asked” is true.
“The player would pay if asked” is true because “The player will be asked” is false and implies anything.
“The player will be asked” is false by the extra axiom.
Note I’m using ordinary propositional logic here, not some sort of weird “counterfactual logic” that people have in mind and which isn’t formalizable anyway. Hence the lack of distinction between “will” and “would”.
I thought your post asked about the proposition “o=ASK ⇒ a=PAY”, and didn’t mention the other one at all. You asked this:
Omega asks you to pay him $100. Do you pay?
not this:
Do you precommit to pay?
So I just don’t use the naked proposition “a=PAY” anywhere. In fact I don’t even understand how to define its truth value for all agents, because it may so happen that the agent gets $1000 and walks away without being asked anything.
If both the agent and Omega are deterministic programs, and the agent is never in fact asked, that fact may be converted into a statement about natural numbers. So what you just said is equivalent to this:
Seems to me that for all agents there is a fact of the matter about whether they would pay if 1 were equal to 2.
Why? Say the world program W includes function f, and it’s provable that W could never call f with argument 1. That doesn’t mean there’s no fact of the matter about what happens when f(1) is computed (though of course it might not halt). (Function f doesn’t have to be called from W.)
Even if f can be regarded as a rational agent who ‘knows’ the source code of W, the worst that could happen is that f ‘deduces’ a contradiction and goes insane. That’s different from the agent itself being in an inconsistent state.
Analogy: We can define the partial derivatives of a Lagrangian with respect to q and q-dot, even though it doesn’t make sense for q and q-dot to vary independently of each other.
...Huh? My version of Omega doesn’t bother predicting the agent, so you gain nothing by crippling its prediction abilities :-)
ETA: maybe it makes sense to let Omega have a “trembling hand”, so it doesn’t always do what it resolved to do. In this case I don’t know if the problem stays or goes away. Properly interpreting “counterfactual evidence” seems to be tricky.
...Huh? My version of Omega doesn’t bother predicting the agent, so you gain nothing by crippling its prediction abilities :-)
I would consider an Omega that didn’t bother predicting in even that case to be ‘broken’. Omega is good when it comes to good faith natural language implementation. Perhaps I would consider it one of Omega’s many siblings, one that requires more formal shackles.
This takes the decision out of Omega’s hands and collapses Omega’s agent-provability by letting it know its decision. We already know that in ADT-style decision-making, all theories of consequences of actions other than the actual one are inconsistent, that they are merely agent-consistent, and adding an axiom specifying which action is actual won’t disturb consistency of the theory of consequences of the actual action. But there’s no guarantee that Omega’s decision procedure would behave nicely when faced with knowledge of inconsistency. For example, instead of concluding that you do agree to pay, it could just as well conclude that you don’t, which would be a moral argument to not award you the $1000, and then Omega just goes crazy. One isn’t meant to know own decisions, bad for sanity.
Yes, you got it right. I love your use of the word “collapse” :-)
My argument seems to indicate that there’s no easy way for UDT agents to solve such situations, because the problem statements really are incomplete. Do you see any way to fix that, e.g. in Parfit’s Hitchhiker? Because this is quite disconcerting. Eliezer thought he’d solved that one.
I don’t understand your argument. You’ve just broken Omega for some reason (by letting it know something true which it’s not meant to know at that point), and as a result it fails in its role in the thought experiment. Don’t break Omega.
My implementation of Omega isn’t broken and doesn’t fail. Could you show precisely where it fails? As far as I can see, all the conditions in Bongo’s post still hold for it, therefore all possible logical implications of Bongo’s post should hold for it too, and so should all possible “solutions”.
It doesn’t implement the counterfactual where depending on what response the agent assumes to give on observing a request to pay, it can agent-consistently conclude that Omega will either award or not award $1000. Even if we don’t require that Omega is a decision-theoretic agent with known architecture, the decision problem must make the intended sense.
In more detail. Agent’s decision is a strategy that specifies, for each possible observation (we have two: Omega rewards it, or Omega asks for money), a response. If Omega gives a reward, there is no response, and if it asks for money, there are two responses. So overall, we have two strategies to consider. The agent should be able to contemplate the consequences of adopting each of these strategies, without running into inconsistencies (observation is an external parameter, so even if in a given environment, there is no agent-with-that-observation, decision algorithm can still specify a response to that observation, it would just completely fail to control the outcome). Now, take your Omega implementation, and consider the strategy of not paying from agent’s perspective. What would the agent conclude about expected utility? By problem specification, it should (in the external sense, that is not necessarily according to its own decision theory, if that decision theory happens to fail this particular thought experiment) conclude that Omega doesn’t give it an award. But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
I think it would be fair to say that cousin_it’s (ha! Take that English grammar!) description of Omega’s behaviour does fit the problem specification we have given but certainly doesn’t match the problem we intended. That leaves us to fix the wording without making it look too obfuscated.
Taking another look at the actual problem specification it actually doesn’t look all that bad. The translation into logical propositions didn’t really do it justice. We have...
He will award you $1000 if he predicts you would pay him if he asked.
cousin_it allows “if” to resolve to “iif”, but translates “The player would pay if asked” into A → B; !B therefore ‘whatever’. Which is not quite what we mean when we use the phrase in English. We are trying to refer to the predicted outcome in a “possibly counterfactual but possibly real” reality.
Can you think of a way to say what we mean without any ambiguity and without changing the problem itself too much?
I believe you haven’t yet realized the extent of the damage :-)
It’s very unclear to me what it means for Omega to “implement the counterfactual” in situations where it gives the agent information about which way the counterfactual came out. After all, the agent knows its own source code A and Omega’s source code O. What sense does it make to inquire about the agent’s actions in the “possible world” where it’s passed a value of O(A) different from its true value? That “possible world” is logically inconsistent! And unlike the situation where the agent is reasoning about its own actions, in our case the inconsistency is actually exploitable. If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
The alternative is to let the agent tacitly assume that it does not necessarily receive the true value of O(A), i.e. that the causality has been surgically tweaked at some point—so the agent ought to respond to any values of O(A) mechanically by using a “strategy”, while taking care not to think too much about where they came from and what they mean. But: a) this doesn’t seem to accord with the spirit of Bongo’s original problem, which explicitly asked “you’re told this statement about yourself, now what do you do?”; b) this idea is not present in UDT yet, and I guess you will have many unexpected problems making it work.
If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
By the way, this bears an interesting similarity to the question of how would you explain the event of your left arm being replaced by a blue tentacle. The answer that you wouldn’t is perfectly reasonable, since you don’t need to be able to adequately respond to that observation, you can self-improve in a way that has a side effect of making you crazy once you observe your left arm being transformed into a blue tentacle, and that wouldn’t matter, since this event is of sufficiently low measure and has sufficiently insignificant contribution to overall expected utility to not be worth worrying about.
So in our case, the question should be, is it desirable to not go crazy when presented with this observation and respond in some other way instead, perhaps to win the Omega Award? If so, how should you think about the situation?
If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
It’s not the correct way of interpreting observations, you shouldn’t let observations drive you crazy. Here, we have A’s action-definition that is given in factorized form: action=A(O(“A”)). Normally, you’d treat such decompositions as explicit dependence bias, and try substituting everything in before starting to reason about what would happen if. But if O(“A”) is an observation, then you’re not deciding action, that is A(O(“A”)). Instead, you’re deciding just A(-), an Observations → Actions map. So being told that you’ve observed “no award” doesn’t mean that you now know that O(“A”)=”no award”. It just means that you’re the subagent responsible for deciding a response to parameter “no award” in the strategy for A(-). You might also want to acausally coordinate with the subagent that is deciding the other part of that same strategy, a response to “award”.
And this all holds even if the agent knows what O(“A”) means, it would just be a bad idea to not include O(“A”) as part of the agent in that case, and so optimize the overall A(O(“A”)) instead of the smaller A(-).
At this point it seems we’re arguing over how to better formalize the original problem. The post asked what you should reply to Omega. Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega, and whose judgment of you came from the counterfactual void rather than from looking at you. I’m not sure this constitutes a fair translation. Some of the commenters here (e.g. prase) seem to intuitively lean toward my interpretation—I agree it’s not UDT-like, but think it might turn out useful.
At this point it seems we’re arguing over how to better formalize the original problem.
It’s more about making more explicit the question of what are observations, and what are boundaries of the agent (Which parts of the past lightcone are part of you? Just the cells in the brain? Why is that?), in deterministic decision problems. These were never explicitly considered before in the context of UDT. The problem statement states that something is “observation”, but we lack a technical counterpart of that notion. Your questions resulted from treating something that’s said to be an “observation” as epistemically relevant, writing knowledge about state of the territory which shouldn’t be logically transparent right into agent’s mind.
(Observations, possible worlds, etc. will very likely be the topic of my next post on ADT, once I resolve the mystery of observational knowledge to my satisfaction.)
Thanks, this looks like a fair summary (though a couple levels too abstract for my liking, as usual).
A note on epistemic relevance. Long ago, when we were just starting to discuss Newcomblike problems, the preamble usually went something like this: “Omega appears and somehow convinces you that it’s trustworthy”. So I’m supposed to listen to Omega’s words and somehow split them into an “epistemically relevant” part and an “observation” part, which should never mix? This sounds very shady. I hope we can disentangle this someday.
Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega.
Yes. If the agent doesn’t know what Omega actually says, this can be an important consideration (decisions are made by considering agent-provable properties of counterfactuals, all of which except the actual one are inconsistent, but not agent-inconsistent). If Omega’s decision is known (and not just observed), it just means that counterfactual-you’s response to counterfactual-Omega doesn’t control utility and could well be anything. But at this point I’m not sure in what sense anything can actually be logically known, and not in some sense just observed.
In summary, the system of logical statements that specifies the problem does not completely determine what will happen, because we can consistently extend it with another axiom that makes Omega cooperate even if you defect. IOW, you can’t go wrong by cooperating, but some correct Omegas will reward defectors as well.
I am another person who pays outright. While I acknowledge the “could even reward defectors” logical difficulty I am also comfortable asserting that not paying is an outright wrong choice. A payoff of “$1,000″ is to be preferred to a payoff of “either $1,000 or $0”.
It’s not clear to me if this problem can be “fixed”.
It would seem to merely require more precise wording in the problem statement. At the crudest level you simply add the clause “if it is logically coherent to so refrain Omega will not give you $1,000”.
The solution has nothing to do with hacking the counterfactual; the reflectively consistent (and winning) move is to pay the $100, as precommitting to do so nets you a guaranteed $1000 (unless omega can be wrong). It is true that “The player will pay iff asked” implies “The player will not be asked” and therefore “The player will not pay”, but this does not cause omega to predict the player to not pay when asked.
You’ve added an extra axiom to Omega, noted that this resulted in a consistent result, and concluded that therefore the original axioms are incomplete (because the result is changed).
But that does not follow. This would only be true if the axiom was added secretly, and the result was still consistent. But because I know about this extra axiom, you’ve changed the problem; I behave differently, so the whole setup is different.
Or consider a variant: I have the numbers sqrt[2], e and pi. I am required to output the first number that I can prove is irrational, using the shortest proof I can find. This will be sqrt[2] (or maybe e), but not pi. Now add the axiom “pi is irrational”. Now I will output pi first, as the proof is one line long. This does not mean that the original axiomatic system was incorrect or under-specified...
I’m not completely sure what your comment means. The result hasn’t “changed”, it has appeared. Without the extra axiom there’s not enough axioms to nail down a single result (and even with it I had to resort to lexicographic chance at one point). That’s what incompleteness means here.
If you think that’s wrong, try to prove the “correct” result, e.g. that any agent who precommits to not paying won’t get the $1000, using only the original axioms and nothing else. Once you write out the proof, we will know for certain that one of us is wrong or the original axioms are inconsistent, which would be even better :-)
I was also previously suspicious to the word “change”, but lately made my peace with it. Saying that there’s change is just a way of comparing objects of the same category. So if you look at an apple and a grape, what changes from apple to grape is, for example, color. A change is simultaneously what’s different, and a method of producing one from the other. Application of change to time, or to the process of decision-making, are mere special cases. Particular ways of parsing change in descriptions of decision problems can be incorrect because of explicit dependence bias: those changes as methods of determining one from the other are not ambient dependencies. But other usages of “change” still apply. For example, your decision to take one box in Newcomb’s instead of two changes the content of the box.
I have sympathy for the commenters who agreed to pay outright (Nesov and ata), but viewed purely logically, this problem is underdetermined, kinda like Transparent Newcomb’s (thx Manfred). This is a subtle point, bear with me.
Let’s assume you precommit to not pay if asked. Now take an Omega that strictly follows the rules of the problem, but also has one additional axiom: I will award the player $1000 no matter what. This Omega can easily prove that the world in which it asks you to pay is logically inconsistent, and then it concludes that in that world you do agree to pay (because a falsity implies every statement, and this one happened to come first lexicographically or something). So Omega decides to award you $1000, its axiom system stays perfectly consistent, and all the conditions of the problem are fulfilled. I stress that the statement “You would pay if Omega asked you to” is logically true in the axiom system outlined, because its antecedent is false.
In summary, the system of logical statements that specifies the problem does not completely determine what will happen, because we can consistently extend it with another axiom that makes Omega cooperate even if you defect. IOW, you can’t go wrong by cooperating, but some correct Omegas will reward defectors as well. It’s not clear to me if this problem can be “fixed”.
ETA: it seems that several other decision problems have a similar flaw. In Counterfactual Mugging with a logical coin it makes some defectors win, as in our problem, and in Parfit’s Hitchhiker it makes some cooperators lose.
This seems to be confusing “counterfactual::if” with “logical::if”. Noting that a world is impossible because the agents will not make the decisions that lead to that world does not mean that you can just make stuff up about that world since “anything is true about a world that doesn’t exist”.
Your objection would be valid if we had a formalized concept of “counterfactual if” distinct from “logical if”, but we don’t. When looking at the behavior of deterministic programs, I have no idea how to make counterfactual statements that aren’t logical statements.
When a program takes explicit input, you can look at what the program does if you pass this or that input, even if some inputs will in fact never be passed.
If event S is empty, then for any Q you make up, it’s true that [for all s in S, Q]. This statement also holds if S was defined to be empty if [Not Q], or if Q follows from S being non-empty.
Yes you can make logical deductions of that form, but my point was that you can’t feed those conlusions back into the decision making process without invalidating the assumptions that went into those conclusions.
I will award the player $1000 iff the player would pay
I will award the player $1000 no matter what
How are these consistent??
Both these statements are true, so I’d say they are consistent :-)
In particular, the first one is true because “The player would pay if asked” is true.
“The player would pay if asked” is true because “The player will be asked” is false and implies anything.
“The player will be asked” is false by the extra axiom.
Note I’m using ordinary propositional logic here, not some sort of weird “counterfactual logic” that people have in mind and which isn’t formalizable anyway. Hence the lack of distinction between “will” and “would”.
Are you sure you’re not confusing the propositions
and
?
If not, could you present your argument formally?
I thought your post asked about the proposition “o=ASK ⇒ a=PAY”, and didn’t mention the other one at all. You asked this:
not this:
So I just don’t use the naked proposition “a=PAY” anywhere. In fact I don’t even understand how to define its truth value for all agents, because it may so happen that the agent gets $1000 and walks away without being asked anything.
Seems to me that for all agents there is a fact of the matter about whether they would pay if asked. Even for agents that never in fact are asked.
So I do interpret a=PAY as “would pay”. But maybe there are other legitimate interpretations.
If both the agent and Omega are deterministic programs, and the agent is never in fact asked, that fact may be converted into a statement about natural numbers. So what you just said is equivalent to this:
I don’t know, this looks shady.
Why? Say the world program W includes function f, and it’s provable that W could never call f with argument 1. That doesn’t mean there’s no fact of the matter about what happens when f(1) is computed (though of course it might not halt). (Function f doesn’t have to be called from W.)
Even if f can be regarded as a rational agent who ‘knows’ the source code of W, the worst that could happen is that f ‘deduces’ a contradiction and goes insane. That’s different from the agent itself being in an inconsistent state.
Analogy: We can define the partial derivatives of a Lagrangian with respect to q and q-dot, even though it doesn’t make sense for q and q-dot to vary independently of each other.
I assume that you would not consider this to be a problem if Omega was replaced with a 99% reliable predictor. Confirm?
...Huh? My version of Omega doesn’t bother predicting the agent, so you gain nothing by crippling its prediction abilities :-)
ETA: maybe it makes sense to let Omega have a “trembling hand”, so it doesn’t always do what it resolved to do. In this case I don’t know if the problem stays or goes away. Properly interpreting “counterfactual evidence” seems to be tricky.
I would consider an Omega that didn’t bother predicting in even that case to be ‘broken’. Omega is good when it comes to good faith natural language implementation. Perhaps I would consider it one of Omega’s many siblings, one that requires more formal shackles.
This takes the decision out of Omega’s hands and collapses Omega’s agent-provability by letting it know its decision. We already know that in ADT-style decision-making, all theories of consequences of actions other than the actual one are inconsistent, that they are merely agent-consistent, and adding an axiom specifying which action is actual won’t disturb consistency of the theory of consequences of the actual action. But there’s no guarantee that Omega’s decision procedure would behave nicely when faced with knowledge of inconsistency. For example, instead of concluding that you do agree to pay, it could just as well conclude that you don’t, which would be a moral argument to not award you the $1000, and then Omega just goes crazy. One isn’t meant to know own decisions, bad for sanity.
Yes, you got it right. I love your use of the word “collapse” :-)
My argument seems to indicate that there’s no easy way for UDT agents to solve such situations, because the problem statements really are incomplete. Do you see any way to fix that, e.g. in Parfit’s Hitchhiker? Because this is quite disconcerting. Eliezer thought he’d solved that one.
I don’t understand your argument. You’ve just broken Omega for some reason (by letting it know something true which it’s not meant to know at that point), and as a result it fails in its role in the thought experiment. Don’t break Omega.
My implementation of Omega isn’t broken and doesn’t fail. Could you show precisely where it fails? As far as I can see, all the conditions in Bongo’s post still hold for it, therefore all possible logical implications of Bongo’s post should hold for it too, and so should all possible “solutions”.
It doesn’t implement the counterfactual where depending on what response the agent assumes to give on observing a request to pay, it can agent-consistently conclude that Omega will either award or not award $1000. Even if we don’t require that Omega is a decision-theoretic agent with known architecture, the decision problem must make the intended sense.
In more detail. Agent’s decision is a strategy that specifies, for each possible observation (we have two: Omega rewards it, or Omega asks for money), a response. If Omega gives a reward, there is no response, and if it asks for money, there are two responses. So overall, we have two strategies to consider. The agent should be able to contemplate the consequences of adopting each of these strategies, without running into inconsistencies (observation is an external parameter, so even if in a given environment, there is no agent-with-that-observation, decision algorithm can still specify a response to that observation, it would just completely fail to control the outcome). Now, take your Omega implementation, and consider the strategy of not paying from agent’s perspective. What would the agent conclude about expected utility? By problem specification, it should (in the external sense, that is not necessarily according to its own decision theory, if that decision theory happens to fail this particular thought experiment) conclude that Omega doesn’t give it an award. But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
I think it would be fair to say that cousin_it’s (ha! Take that English grammar!) description of Omega’s behaviour does fit the problem specification we have given but certainly doesn’t match the problem we intended. That leaves us to fix the wording without making it look too obfuscated.
Taking another look at the actual problem specification it actually doesn’t look all that bad. The translation into logical propositions didn’t really do it justice. We have...
cousin_it allows “if” to resolve to “iif”, but translates “The player would pay if asked” into A → B; !B therefore ‘whatever’. Which is not quite what we mean when we use the phrase in English. We are trying to refer to the predicted outcome in a “possibly counterfactual but possibly real” reality.
Can you think of a way to say what we mean without any ambiguity and without changing the problem itself too much?
I believe you haven’t yet realized the extent of the damage :-)
It’s very unclear to me what it means for Omega to “implement the counterfactual” in situations where it gives the agent information about which way the counterfactual came out. After all, the agent knows its own source code A and Omega’s source code O. What sense does it make to inquire about the agent’s actions in the “possible world” where it’s passed a value of O(A) different from its true value? That “possible world” is logically inconsistent! And unlike the situation where the agent is reasoning about its own actions, in our case the inconsistency is actually exploitable. If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
The alternative is to let the agent tacitly assume that it does not necessarily receive the true value of O(A), i.e. that the causality has been surgically tweaked at some point—so the agent ought to respond to any values of O(A) mechanically by using a “strategy”, while taking care not to think too much about where they came from and what they mean. But: a) this doesn’t seem to accord with the spirit of Bongo’s original problem, which explicitly asked “you’re told this statement about yourself, now what do you do?”; b) this idea is not present in UDT yet, and I guess you will have many unexpected problems making it work.
By the way, this bears an interesting similarity to the question of how would you explain the event of your left arm being replaced by a blue tentacle. The answer that you wouldn’t is perfectly reasonable, since you don’t need to be able to adequately respond to that observation, you can self-improve in a way that has a side effect of making you crazy once you observe your left arm being transformed into a blue tentacle, and that wouldn’t matter, since this event is of sufficiently low measure and has sufficiently insignificant contribution to overall expected utility to not be worth worrying about.
So in our case, the question should be, is it desirable to not go crazy when presented with this observation and respond in some other way instead, perhaps to win the Omega Award? If so, how should you think about the situation?
It’s not the correct way of interpreting observations, you shouldn’t let observations drive you crazy. Here, we have A’s action-definition that is given in factorized form: action=A(O(“A”)). Normally, you’d treat such decompositions as explicit dependence bias, and try substituting everything in before starting to reason about what would happen if. But if O(“A”) is an observation, then you’re not deciding action, that is A(O(“A”)). Instead, you’re deciding just A(-), an Observations → Actions map. So being told that you’ve observed “no award” doesn’t mean that you now know that O(“A”)=”no award”. It just means that you’re the subagent responsible for deciding a response to parameter “no award” in the strategy for A(-). You might also want to acausally coordinate with the subagent that is deciding the other part of that same strategy, a response to “award”.
And this all holds even if the agent knows what O(“A”) means, it would just be a bad idea to not include O(“A”) as part of the agent in that case, and so optimize the overall A(O(“A”)) instead of the smaller A(-).
At this point it seems we’re arguing over how to better formalize the original problem. The post asked what you should reply to Omega. Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega, and whose judgment of you came from the counterfactual void rather than from looking at you. I’m not sure this constitutes a fair translation. Some of the commenters here (e.g. prase) seem to intuitively lean toward my interpretation—I agree it’s not UDT-like, but think it might turn out useful.
It’s more about making more explicit the question of what are observations, and what are boundaries of the agent (Which parts of the past lightcone are part of you? Just the cells in the brain? Why is that?), in deterministic decision problems. These were never explicitly considered before in the context of UDT. The problem statement states that something is “observation”, but we lack a technical counterpart of that notion. Your questions resulted from treating something that’s said to be an “observation” as epistemically relevant, writing knowledge about state of the territory which shouldn’t be logically transparent right into agent’s mind.
(Observations, possible worlds, etc. will very likely be the topic of my next post on ADT, once I resolve the mystery of observational knowledge to my satisfaction.)
Thanks, this looks like a fair summary (though a couple levels too abstract for my liking, as usual).
A note on epistemic relevance. Long ago, when we were just starting to discuss Newcomblike problems, the preamble usually went something like this: “Omega appears and somehow convinces you that it’s trustworthy”. So I’m supposed to listen to Omega’s words and somehow split them into an “epistemically relevant” part and an “observation” part, which should never mix? This sounds very shady. I hope we can disentangle this someday.
Yes. If the agent doesn’t know what Omega actually says, this can be an important consideration (decisions are made by considering agent-provable properties of counterfactuals, all of which except the actual one are inconsistent, but not agent-inconsistent). If Omega’s decision is known (and not just observed), it just means that counterfactual-you’s response to counterfactual-Omega doesn’t control utility and could well be anything. But at this point I’m not sure in what sense anything can actually be logically known, and not in some sense just observed.
Now that is a real concern!
I am another person who pays outright. While I acknowledge the “could even reward defectors” logical difficulty I am also comfortable asserting that not paying is an outright wrong choice. A payoff of “$1,000″ is to be preferred to a payoff of “either $1,000 or $0”.
It would seem to merely require more precise wording in the problem statement. At the crudest level you simply add the clause “if it is logically coherent to so refrain Omega will not give you $1,000”.
The solution has nothing to do with hacking the counterfactual; the reflectively consistent (and winning) move is to pay the $100, as precommitting to do so nets you a guaranteed $1000 (unless omega can be wrong). It is true that “The player will pay iff asked” implies “The player will not be asked” and therefore “The player will not pay”, but this does not cause omega to predict the player to not pay when asked.
You’ve added an extra axiom to Omega, noted that this resulted in a consistent result, and concluded that therefore the original axioms are incomplete (because the result is changed).
But that does not follow. This would only be true if the axiom was added secretly, and the result was still consistent. But because I know about this extra axiom, you’ve changed the problem; I behave differently, so the whole setup is different.
Or consider a variant: I have the numbers sqrt[2], e and pi. I am required to output the first number that I can prove is irrational, using the shortest proof I can find. This will be sqrt[2] (or maybe e), but not pi. Now add the axiom “pi is irrational”. Now I will output pi first, as the proof is one line long. This does not mean that the original axiomatic system was incorrect or under-specified...
I’m not completely sure what your comment means. The result hasn’t “changed”, it has appeared. Without the extra axiom there’s not enough axioms to nail down a single result (and even with it I had to resort to lexicographic chance at one point). That’s what incompleteness means here.
If you think that’s wrong, try to prove the “correct” result, e.g. that any agent who precommits to not paying won’t get the $1000, using only the original axioms and nothing else. Once you write out the proof, we will know for certain that one of us is wrong or the original axioms are inconsistent, which would be even better :-)
I was also previously suspicious to the word “change”, but lately made my peace with it. Saying that there’s change is just a way of comparing objects of the same category. So if you look at an apple and a grape, what changes from apple to grape is, for example, color. A change is simultaneously what’s different, and a method of producing one from the other. Application of change to time, or to the process of decision-making, are mere special cases. Particular ways of parsing change in descriptions of decision problems can be incorrect because of explicit dependence bias: those changes as methods of determining one from the other are not ambient dependencies. But other usages of “change” still apply. For example, your decision to take one box in Newcomb’s instead of two changes the content of the box.