Wei Dai comments on Thomas C. Schelling’s “Strategy of Conflict”

Wei Dai 29 Jul 2009 2:11 UTC
8 points
0
There may be an interesting connection between this example and AIs knowing each other’s source code. The idea is, if one AI can unilaterally prove its source code to another without the receiver being able to credibly deny receipt of the proof, then it should change its source code to commit to an unfair agreement that favors itself, then prove this. If it succeeds in being the first to do so, the other side then has no choice but to accept. So, Freaky Fairness seems to depend on the details of the proof process in some way.
What links here?
- Wei Dai's comment on Ingredients of Timeless Decision Theory by Eliezer Yudkowsky (26 Aug 2009 6:46 UTC; 0 points)
- orthonormal 29 Jul 2009 18:52 UTC
  9 points
  0
  Parent
  
  If it succeeds in being the first to do so, the other side then has no choice but to accept.
  
  This presumes that the other side obeys standard causal decision theory; in fact, it’s an illustration of why causal decision theory is vulnerable to exploitation if precommitment is available, and suggests that two selfish rational CDT agents who each have precommitment options will generally wind up sabotaging each other.
  
  This is a reason to reject CDT as the basis for instrumental rationality, even if you’re not worried that Omega is lurking around the corner.
  - Wei Dai 29 Jul 2009 19:59 UTC
    4 points
    0
    Parent
    You can reject CDT but what are you going to replace it with? Until Eliezer publishes his decision theory and I have a chance to review it, I’m sticking with CDT.
    
    I thought cousin_it’s result was really interesting because it seems to show that agents using standard CDT can nevertheless convert any game into a cooperative game, as long as they have some way to prove their source code to each other. My comment was made in that context, pointing out that the mechanism for proving source code needs to have a subtle property, which I termed “consensual”.
    - Vladimir_Nesov 29 Jul 2009 20:55 UTC
      10 points
      0
      Parent
      One obvious “upgrade” to any decision theory that has such problems is to discard all of your knowledge (data, observations) before making any decisions (save for some structural knowledge to leave the decision algorithm nontrivial). For each decision that you make (using given decision algorithm) while knowing X, you can make a conditional decision (using the same decision algorithm) that says “If X, then A else B”, and then recall whether X is actually true. This, for example, mends the particular failure of not being able to precommit (you remember that you are on the losing branch only after you’ve made the decision to do a certain disadvantageous action if you are on the losing branch).
      What links here?
      Wei Dai's comment on Towards a New Decision Theory by Wei Dai (13 Aug 2009 13:00 UTC; 2 points)
      - Wei Dai 30 Jul 2009 0:37 UTC
        3 points
        0
        Parent
        You can claim that you are using such a decision theory and hence that I should find your precommitments credible, but if you have no way of proving this, then I shouldn’t believe you, since it is to your advantage to have me believe you are using such a decision theory without actually using it.
        
        From your earlier writings I think you might be assuming that AIs would be intelligent enough to just know what decision algorithms others are using, without any explicit proof procedure. I think that’s an interesting possibility to consider, but not a very likely one. But maybe I’m missing something. If you wrote down any arguments in favor of this assumption, I’d be interested to see them.
        Vladimir_Nesov 30 Jul 2009 10:16 UTC
        4 points
        0
        Parent
        That was an answer for your question about what should you replace CDT with. If you won’t be able to convince other agents that you now run on timeless CDT, you gain a little smaller advantage than otherwise, but that’s a separate problem. If you know that your claims of precommitment won’t be believed, you don’t precommit, it’s that easy. But sometimes, you’ll find a better solution than if you only lived in a moment.
        
        Also note that even if you do convince other agents about the abstract fact that your decision theory is now timeless, it won’t help you very much, since it doesn’t prove that you’ll precommit in a specific situation. You only precommit in a given situation if you know that this action makes the situation better for you, which in case of cooperation means that the other side will be able to tell whether you actually precommited, and this is not at all the same as being able to tell what decision theory you use.
        
        Since using a decision theory with precommitment is almost always an advantage, it’s easy to assume that a sufficiently intelligent agent always uses something of the sort, but that doesn’t allow you to know more about their actions—in fact, you know less, since such agent has more options now.
        Wei Dai 30 Jul 2009 20:17 UTC
        4 points
        0
        Parent
        
        But sometimes, you’ll find a better solution than if you only lived in a moment.
        
        Yes, I see that your decision theory (is it the same as Eliezer’s?) gives better solutions in the following circumstances:
        
        dealing with Omega
        dealing with copies of oneself
        cooperating with a counterpart in another possible world
        
        Do you think it gives better solutions in the case of AIs (who don’t initially think they’re copies of each other) trying to cooperate? If so, can you give a specific scenario and show how the solution is derived?
- Eliezer Yudkowsky 29 Jul 2009 4:13 UTC
  5 points
  0
  Parent
  Unless, of course, you already know that most AIs will go ahead and “suicidally” deny the unfair agreement.
- cousin_it 29 Jul 2009 7:25 UTC
  1 point
  0
  Parent
  Yes. In the original setting of FF the tournament setup enforces that everyone’s true source code is common knowledge). Most likely the problem is hard to solve without at least a little common knowledge.
  - Wei Dai 29 Jul 2009 12:14 UTC
    1 point
    0
    Parent
    
    Yes. In the original setting of FF the tournament setup enforces that everyone’s true source code is common knowledge. Most likely the problem is hard to solve without at least a little common knowledge.
    
    Hmm, I’m not seeing what common knowledge has to do with it. Instead, what seems necessary is that the source code proving process must be consensual rather than unilateral. (The former has to exist, and the latter cannot, in order for FF to work.)
    
    A model for a unilateral proof process would be a trustworthy device that accepts a string from the prover and then sends that string along with the message “1” to the receiver if the string is the prover’s source code, and “0″ otherwise.
    
    A model for a consensual proof process would be a trustworthy device that accepts from the prover and verifier each a string, and sends a message “1” to both parties if the two strings are identical and represent the prover’s source code, and “0″ otherwise.
    - cousin_it 29 Jul 2009 18:04 UTC
      0 points
      0
      Parent
      In your second case one party can still cheat by being out of town when the “1” message arrives. It seems to me that the whole endeavor hinges on the success of the exchange being common knowledge.
      - Wei Dai 29 Jul 2009 18:15 UTC
        0 points
        0
        Parent
        
        In your second case one party can still cheat by being out of town when the “1” message arrives.
        
        I’m not getting you. Can you elaborate on which party can cheat, and how. And by “second case” do you mean the “unilateral” one or the “consensual” one?
        cousin_it 29 Jul 2009 21:17 UTC
        0 points
        0
        Parent
        The “consensual” one.
        
        For a rigorous demonstration, imagine this: while preparing to play the Freaky Fairness game, I managed to install a subtle bug into the tournament code that will slightly and randomly distort all source code inputs passed to my algorithm. Then I submit some nice regular quining-cooperative program. In the actual game your program will assume I will cooperate, while mine will see you as a defector and play to win. When the game gives players an incentive to misunderstand, even a slight violation of “you know that I know that you know...” can wreak havoc, hence my emphasis on common knowledge.
        Wei Dai 29 Jul 2009 23:29 UTC
        0 points
        0
        Parent
        
        In the actual game your program will assume I will cooperate, while mine will see you as a defector and play to win.
        
        I see what you’re saying now, but this seems easy to prevent. Since you have changed your source code to FF, and I know you have, I can simply ask you whether you believe I am a defector, and treat you as a defector if you say “yes”. I know your source code so I know you can’t lie (specify Freaky Fairness to include this honesty). Doesn’t that solve the problem?
        
        ETA: There is still a chance of accidental miscommunication, but you no longer have an incentive to deliberately cheat.
        cousin_it 30 Jul 2009 9:24 UTC
        1 point
        0
        Parent
        In this solution you have an incentive to similarly be outa town when I say “no”. Think through it recursively. Related topics: two generals problem, two-phase commit.
        What links here?
        An Alternative Approach to AI Cooperation by Wei Dai (31 Jul 2009 12:14 UTC; 22 points)
        Wei Dai 30 Jul 2009 19:42 UTC
        0 points
        0
        Parent
        Ok, let’s say that two FFs can establish a cryptographically secure channel. The two players can each choose to block the channel at any time, but it can’t read, inject, delete, or change the order of messages. Is that sufficient to make it arbitrarily unlikely for any player to put the FFs into a state where FF1 will treat FF2 as a cooperator, but FF2 will treat FF1 as a defector? I think the answer is yes, using the following protocol:
        
        FF1 will start by sending a 1 or 0 (chosen randomly) to FF2. After that, each FF will send a 1 or 0 after it receives a 1 or 0 from the other, keeping the number of 1s sent no more than the number of 1s received plus one. If an FF receives N 1s before a time limit is reached, it will threat the other as a cooperator, otherwise as a defector. Now in order to cheat, a player would have to guess when to block the channel, and the probability of guessing the right time goes to 0 as N goes to infinity.
        
        This is not necessarily the most efficient protocol, but it may be good enough as a proof of concept. On the other hand, the “merger by secure joint construction” approach seems to have the advantage of not having to deal with this problem. Or is there an analogous one that I’m not seeing?