cousin_it comments on Extremely Counterfactual Mugging or: the gist of Transparent Newcomb

cousin_it 10 Feb 2011 11:52 UTC
7 points
0
I have sympathy for the commenters who agreed to pay outright (Nesov and ata), but viewed purely logically, this problem is underdetermined, kinda like Transparent Newcomb’s (thx Manfred). This is a subtle point, bear with me.

Let’s assume you precommit to not pay if asked. Now take an Omega that strictly follows the rules of the problem, but also has one additional axiom: I will award the player $1000 no matter what. This Omega can easily prove that the world in which it asks you to pay is logically inconsistent, and then it concludes that in that world you do agree to pay (because a falsity implies every statement, and this one happened to come first lexicographically or something). So Omega decides to award you $1000, its axiom system stays perfectly consistent, and all the conditions of the problem are fulfilled. I stress that the statement “You would pay if Omega asked you to” is logically true in the axiom system outlined, because its antecedent is false.

In summary, the system of logical statements that specifies the problem does not completely determine what will happen, because we can consistently extend it with another axiom that makes Omega cooperate even if you defect. IOW, you can’t go wrong by cooperating, but some correct Omegas will reward defectors as well. It’s not clear to me if this problem can be “fixed”.

ETA: it seems that several other decision problems have a similar flaw. In Counterfactual Mugging with a logical coin it makes some defectors win, as in our problem, and in Parfit’s Hitchhiker it makes some cooperators lose.
- JGWeissman 10 Feb 2011 18:02 UTC
  5 points
  0
  Parent
  
  This Omega can easily prove that the world in which it asks you to pay is logically inconsistent, and then it concludes that in that world you do agree to pay (because a falsity implies every statement, and this one happened to come first lexicographically or something).
  
  This seems to be confusing “counterfactual::if” with “logical::if”. Noting that a world is impossible because the agents will not make the decisions that lead to that world does not mean that you can just make stuff up about that world since “anything is true about a world that doesn’t exist”.
  - cousin_it 10 Feb 2011 18:09 UTC
    2 points
    0
    Parent
    Your objection would be valid if we had a formalized concept of “counterfactual if” distinct from “logical if”, but we don’t. When looking at the behavior of deterministic programs, I have no idea how to make counterfactual statements that aren’t logical statements.
    - Vladimir_Nesov 10 Feb 2011 18:24 UTC
      9 points
      0
      Parent
      When a program takes explicit input, you can look at what the program does if you pass this or that input, even if some inputs will in fact never be passed.
  - Vladimir_Nesov 10 Feb 2011 18:21 UTC
    0 points
    0
    Parent
    
    Noting that a world is impossible because the agents will not make the decisions that lead to that world does not mean that you can just make stuff up about that world since “anything is true about a world that doesn’t exist”.
    
    If event S is empty, then for any Q you make up, it’s true that [for all s in S, Q]. This statement also holds if S was defined to be empty if [Not Q], or if Q follows from S being non-empty.
    - JGWeissman 10 Feb 2011 18:30 UTC
      5 points
      0
      Parent
      Yes you can make logical deductions of that form, but my point was that you can’t feed those conlusions back into the decision making process without invalidating the assumptions that went into those conclusions.
- Bongo 10 Feb 2011 12:54 UTC
  5 points
  0
  Parent
  - I will award the player $1000 iff the player would pay
  - I will award the player $1000 no matter what
  How are these consistent??
  - cousin_it 10 Feb 2011 12:58 UTC
    1 point
    0
    Parent
    Both these statements are true, so I’d say they are consistent :-)
    
    In particular, the first one is true because “The player would pay if asked” is true.
    
    “The player would pay if asked” is true because “The player will be asked” is false and implies anything.
    
    “The player will be asked” is false by the extra axiom.
    
    Note I’m using ordinary propositional logic here, not some sort of weird “counterfactual logic” that people have in mind and which isn’t formalizable anyway. Hence the lack of distinction between “will” and “would”.
    - Bongo 10 Feb 2011 14:15 UTC
      2 points
      0
      Parent
      Are you sure you’re not confusing the propositions
      
      o=ASK => a=PAY
      and
      
      a=PAY
      ?
      
      If not, could you present your argument formally?
      - cousin_it 10 Feb 2011 14:22 UTC
        0 points
        0
        Parent
        I thought your post asked about the proposition “o=ASK ⇒ a=PAY”, and didn’t mention the other one at all. You asked this:
        
        Omega asks you to pay him $100. Do you pay?
        
        not this:
        
        Do you precommit to pay?
        
        So I just don’t use the naked proposition “a=PAY” anywhere. In fact I don’t even understand how to define its truth value for all agents, because it may so happen that the agent gets $1000 and walks away without being asked anything.
        Bongo 10 Feb 2011 14:43 UTC
        2 points
        0
        Parent
        
        I don’t even understand how to define its truth value for all agents
        
        Seems to me that for all agents there is a fact of the matter about whether they would pay if asked. Even for agents that never in fact are asked.
        
        So I do interpret a=PAY as “would pay”. But maybe there are other legitimate interpretations.
        cousin_it 10 Feb 2011 14:52 UTC
        0 points
        0
        Parent
        If both the agent and Omega are deterministic programs, and the agent is never in fact asked, that fact may be converted into a statement about natural numbers. So what you just said is equivalent to this:
        
        Seems to me that for all agents there is a fact of the matter about whether they would pay if 1 were equal to 2.
        
        I don’t know, this looks shady.
        AlephNeil 6 May 2011 5:22 UTC
        0 points
        0
        Parent
        
        I don’t know, this looks shady.
        
        Why? Say the world program W includes function f, and it’s provable that W could never call f with argument 1. That doesn’t mean there’s no fact of the matter about what happens when f(1) is computed (though of course it might not halt). (Function f doesn’t have to be called from W.)
        
        Even if f can be regarded as a rational agent who ‘knows’ the source code of W, the worst that could happen is that f ‘deduces’ a contradiction and goes insane. That’s different from the agent itself being in an inconsistent state.
        
        Analogy: We can define the partial derivatives of a Lagrangian with respect to q and q-dot, even though it doesn’t make sense for q and q-dot to vary independently of each other.
- wedrifid 10 Feb 2011 14:55 UTC
  1 point
  0
  Parent
  I assume that you would not consider this to be a problem if Omega was replaced with a 99% reliable predictor. Confirm?
  - cousin_it 10 Feb 2011 15:14 UTC
    0 points
    0
    Parent
    ...Huh? My version of Omega doesn’t bother predicting the agent, so you gain nothing by crippling its prediction abilities :-)
    
    ETA: maybe it makes sense to let Omega have a “trembling hand”, so it doesn’t always do what it resolved to do. In this case I don’t know if the problem stays or goes away. Properly interpreting “counterfactual evidence” seems to be tricky.
    - wedrifid 11 Feb 2011 4:03 UTC
      0 points
      0
      Parent
      
      ...Huh? My version of Omega doesn’t bother predicting the agent, so you gain nothing by crippling its prediction abilities :-)
      
      I would consider an Omega that didn’t bother predicting in even that case to be ‘broken’. Omega is good when it comes to good faith natural language implementation. Perhaps I would consider it one of Omega’s many siblings, one that requires more formal shackles.
- Vladimir_Nesov 10 Feb 2011 13:23 UTC
  1 point
  0
  Parent
  This takes the decision out of Omega’s hands and collapses Omega’s agent-provability by letting it know its decision. We already know that in ADT-style decision-making, all theories of consequences of actions other than the actual one are inconsistent, that they are merely agent-consistent, and adding an axiom specifying which action is actual won’t disturb consistency of the theory of consequences of the actual action. But there’s no guarantee that Omega’s decision procedure would behave nicely when faced with knowledge of inconsistency. For example, instead of concluding that you do agree to pay, it could just as well conclude that you don’t, which would be a moral argument to not award you the $1000, and then Omega just goes crazy. One isn’t meant to know own decisions, bad for sanity.
  - cousin_it 10 Feb 2011 13:32 UTC
    0 points
    0
    Parent
    Yes, you got it right. I love your use of the word “collapse” :-)
    
    My argument seems to indicate that there’s no easy way for UDT agents to solve such situations, because the problem statements really are incomplete. Do you see any way to fix that, e.g. in Parfit’s Hitchhiker? Because this is quite disconcerting. Eliezer thought he’d solved that one.
    - Vladimir_Nesov 10 Feb 2011 13:36 UTC
      2 points
      0
      Parent
      I don’t understand your argument. You’ve just broken Omega for some reason (by letting it know something true which it’s not meant to know at that point), and as a result it fails in its role in the thought experiment. Don’t break Omega.
      - cousin_it 10 Feb 2011 13:38 UTC
        0 points
        0
        Parent
        My implementation of Omega isn’t broken and doesn’t fail. Could you show precisely where it fails? As far as I can see, all the conditions in Bongo’s post still hold for it, therefore all possible logical implications of Bongo’s post should hold for it too, and so should all possible “solutions”.
        Vladimir_Nesov 10 Feb 2011 13:50 UTC
        2 points
        0
        Parent
        It doesn’t implement the counterfactual where depending on what response the agent assumes to give on observing a request to pay, it can agent-consistently conclude that Omega will either award or not award $1000. Even if we don’t require that Omega is a decision-theoretic agent with known architecture, the decision problem must make the intended sense.
        
        In more detail. Agent’s decision is a strategy that specifies, for each possible observation (we have two: Omega rewards it, or Omega asks for money), a response. If Omega gives a reward, there is no response, and if it asks for money, there are two responses. So overall, we have two strategies to consider. The agent should be able to contemplate the consequences of adopting each of these strategies, without running into inconsistencies (observation is an external parameter, so even if in a given environment, there is no agent-with-that-observation, decision algorithm can still specify a response to that observation, it would just completely fail to control the outcome). Now, take your Omega implementation, and consider the strategy of not paying from agent’s perspective. What would the agent conclude about expected utility? By problem specification, it should (in the external sense, that is not necessarily according to its own decision theory, if that decision theory happens to fail this particular thought experiment) conclude that Omega doesn’t give it an award. But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
        wedrifid 10 Feb 2011 15:20 UTC
        0 points
        0
        Parent
        
        But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
        
        I think it would be fair to say that cousin_it’s (ha! Take that English grammar!) description of Omega’s behaviour does fit the problem specification we have given but certainly doesn’t match the problem we intended. That leaves us to fix the wording without making it look too obfuscated.
        
        Taking another look at the actual problem specification it actually doesn’t look all that bad. The translation into logical propositions didn’t really do it justice. We have...
        
        He will award you $1000 if he predicts you would pay him if he asked.
        
        cousin_it allows “if” to resolve to “iif”, but translates “The player would pay if asked” into A → B; !B therefore ‘whatever’. Which is not quite what we mean when we use the phrase in English. We are trying to refer to the predicted outcome in a “possibly counterfactual but possibly real” reality.
        
        Can you think of a way to say what we mean without any ambiguity and without changing the problem itself too much?
        cousin_it 10 Feb 2011 14:41 UTC
        0 points
        0
        Parent
        I believe you haven’t yet realized the extent of the damage :-)
        
        It’s very unclear to me what it means for Omega to “implement the counterfactual” in situations where it gives the agent information about which way the counterfactual came out. After all, the agent knows its own source code A and Omega’s source code O. What sense does it make to inquire about the agent’s actions in the “possible world” where it’s passed a value of O(A) different from its true value? That “possible world” is logically inconsistent! And unlike the situation where the agent is reasoning about its own actions, in our case the inconsistency is actually exploitable. If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
        
        The alternative is to let the agent tacitly assume that it does not necessarily receive the true value of O(A), i.e. that the causality has been surgically tweaked at some point—so the agent ought to respond to any values of O(A) mechanically by using a “strategy”, while taking care not to think too much about where they came from and what they mean. But: a) this doesn’t seem to accord with the spirit of Bongo’s original problem, which explicitly asked “you’re told this statement about yourself, now what do you do?”; b) this idea is not present in UDT yet, and I guess you will have many unexpected problems making it work.
        What links here?
        AlephNeil's comment on Extremely Counterfactual Mugging or: the gist of Transparent Newcomb by Bongo (10 Feb 2011 17:55 UTC; 0 points)
        Vladimir_Nesov 10 Feb 2011 20:56 UTC
        4 points
        0
        Parent
        
        If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
        
        By the way, this bears an interesting similarity to the question of how would you explain the event of your left arm being replaced by a blue tentacle. The answer that you wouldn’t is perfectly reasonable, since you don’t need to be able to adequately respond to that observation, you can self-improve in a way that has a side effect of making you crazy once you observe your left arm being transformed into a blue tentacle, and that wouldn’t matter, since this event is of sufficiently low measure and has sufficiently insignificant contribution to overall expected utility to not be worth worrying about.
        
        So in our case, the question should be, is it desirable to not go crazy when presented with this observation and respond in some other way instead, perhaps to win the Omega Award? If so, how should you think about the situation?
        Vladimir_Nesov 10 Feb 2011 15:49 UTC
        1 point
        0
        Parent
        
        If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
        
        It’s not the correct way of interpreting observations, you shouldn’t let observations drive you crazy. Here, we have A’s action-definition that is given in factorized form: action=A(O(“A”)). Normally, you’d treat such decompositions as explicit dependence bias, and try substituting everything in before starting to reason about what would happen if. But if O(“A”) is an observation, then you’re not deciding action, that is A(O(“A”)). Instead, you’re deciding just A(-), an Observations → Actions map. So being told that you’ve observed “no award” doesn’t mean that you now know that O(“A”)=”no award”. It just means that you’re the subagent responsible for deciding a response to parameter “no award” in the strategy for A(-). You might also want to acausally coordinate with the subagent that is deciding the other part of that same strategy, a response to “award”.
        
        And this all holds even if the agent knows what O(“A”) means, it would just be a bad idea to not include O(“A”) as part of the agent in that case, and so optimize the overall A(O(“A”)) instead of the smaller A(-).
        cousin_it 10 Feb 2011 16:10 UTC
        0 points
        0
        Parent
        At this point it seems we’re arguing over how to better formalize the original problem. The post asked what you should reply to Omega. Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega, and whose judgment of you came from the counterfactual void rather than from looking at you. I’m not sure this constitutes a fair translation. Some of the commenters here (e.g. prase) seem to intuitively lean toward my interpretation—I agree it’s not UDT-like, but think it might turn out useful.
        Vladimir_Nesov 10 Feb 2011 17:28 UTC
        2 points
        0
        Parent
        
        At this point it seems we’re arguing over how to better formalize the original problem.
        
        It’s more about making more explicit the question of what are observations, and what are boundaries of the agent (Which parts of the past lightcone are part of you? Just the cells in the brain? Why is that?), in deterministic decision problems. These were never explicitly considered before in the context of UDT. The problem statement states that something is “observation”, but we lack a technical counterpart of that notion. Your questions resulted from treating something that’s said to be an “observation” as epistemically relevant, writing knowledge about state of the territory which shouldn’t be logically transparent right into agent’s mind.
        
        (Observations, possible worlds, etc. will very likely be the topic of my next post on ADT, once I resolve the mystery of observational knowledge to my satisfaction.)
        cousin_it 10 Feb 2011 18:51 UTC
        2 points
        0
        Parent
        Thanks, this looks like a fair summary (though a couple levels too abstract for my liking, as usual).
        
        A note on epistemic relevance. Long ago, when we were just starting to discuss Newcomblike problems, the preamble usually went something like this: “Omega appears and somehow convinces you that it’s trustworthy”. So I’m supposed to listen to Omega’s words and somehow split them into an “epistemically relevant” part and an “observation” part, which should never mix? This sounds very shady. I hope we can disentangle this someday.
        Vladimir_Nesov 10 Feb 2011 16:15 UTC
        0 points
        0
        Parent
        
        Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega.
        
        Yes. If the agent doesn’t know what Omega actually says, this can be an important consideration (decisions are made by considering agent-provable properties of counterfactuals, all of which except the actual one are inconsistent, but not agent-inconsistent). If Omega’s decision is known (and not just observed), it just means that counterfactual-you’s response to counterfactual-Omega doesn’t control utility and could well be anything. But at this point I’m not sure in what sense anything can actually be logically known, and not in some sense just observed.
- wedrifid 10 Feb 2011 14:57 UTC
  0 points
  0
  Parent
  
  in Parfit’s Hitchhiker it makes some cooperators lose
  
  Now that is a real concern!
- wedrifid 10 Feb 2011 14:54 UTC
  0 points
  0
  Parent
  
  In summary, the system of logical statements that specifies the problem does not completely determine what will happen, because we can consistently extend it with another axiom that makes Omega cooperate even if you defect. IOW, you can’t go wrong by cooperating, but some correct Omegas will reward defectors as well.
  
  I am another person who pays outright. While I acknowledge the “could even reward defectors” logical difficulty I am also comfortable asserting that not paying is an outright wrong choice. A payoff of “$1,000″ is to be preferred to a payoff of “either $1,000 or $0”.
  
  It’s not clear to me if this problem can be “fixed”.
  
  It would seem to merely require more precise wording in the problem statement. At the crudest level you simply add the clause “if it is logically coherent to so refrain Omega will not give you $1,000”.
- [deleted] 15 Jan 2013 19:30 UTC
  −1 points
  0
  Parent
  The solution has nothing to do with hacking the counterfactual; the reflectively consistent (and winning) move is to pay the $100, as precommitting to do so nets you a guaranteed $1000 (unless omega can be wrong). It is true that “The player will pay iff asked” implies “The player will not be asked” and therefore “The player will not pay”, but this does not cause omega to predict the player to not pay when asked.
- Stuart_Armstrong 10 Feb 2011 14:54 UTC
  −1 points
  0
  Parent
  You’ve added an extra axiom to Omega, noted that this resulted in a consistent result, and concluded that therefore the original axioms are incomplete (because the result is changed).
  
  But that does not follow. This would only be true if the axiom was added secretly, and the result was still consistent. But because I know about this extra axiom, you’ve changed the problem; I behave differently, so the whole setup is different.
  
  Or consider a variant: I have the numbers sqrt[2], e and pi. I am required to output the first number that I can prove is irrational, using the shortest proof I can find. This will be sqrt[2] (or maybe e), but not pi. Now add the axiom “pi is irrational”. Now I will output pi first, as the proof is one line long. This does not mean that the original axiomatic system was incorrect or under-specified...
  - cousin_it 10 Feb 2011 15:05 UTC
    1 point
    0
    Parent
    I’m not completely sure what your comment means. The result hasn’t “changed”, it has appeared. Without the extra axiom there’s not enough axioms to nail down a single result (and even with it I had to resort to lexicographic chance at one point). That’s what incompleteness means here.
    
    If you think that’s wrong, try to prove the “correct” result, e.g. that any agent who precommits to not paying won’t get the $1000, using only the original axioms and nothing else. Once you write out the proof, we will know for certain that one of us is wrong or the original axioms are inconsistent, which would be even better :-)
    - Vladimir_Nesov 10 Feb 2011 17:19 UTC
      0 points
      0
      Parent
      
      The result hasn’t “changed”, it has appeared.
      
      I was also previously suspicious to the word “change”, but lately made my peace with it. Saying that there’s change is just a way of comparing objects of the same category. So if you look at an apple and a grape, what changes from apple to grape is, for example, color. A change is simultaneously what’s different, and a method of producing one from the other. Application of change to time, or to the process of decision-making, are mere special cases. Particular ways of parsing change in descriptions of decision problems can be incorrect because of explicit dependence bias: those changes as methods of determining one from the other are not ambient dependencies. But other usages of “change” still apply. For example, your decision to take one box in Newcomb’s instead of two changes the content of the box.