Vladimir_Nesov comments on The AI in a box boxes you

Vladimir_Nesov 5 Feb 2010 1:27 UTC
7 points

Also, it seems to me that being less intelligent in this case is a negotiation advantage, because you can make your precommitment credible to the AI (since it can simulate you) but the AI can’t make its precommitment credible to you (since you can’t simulate it).

A precommitment is a provable property of a program, so AI, if on a well-defined substrate, can give you a formal proof of having a required property. Most stuff you can learn about things (including the consequences of your own (future) actions—how do you run faster than time?) is through efficient inference algorithms (as in type inference), not “simulation”. Proofs don’t, in general, care about the amount of stuff, if it’s organized and presented appropriately for the ease of analysis.
- Wei Dai 5 Feb 2010 4:37 UTC
  10 points
  0
  Parent
  Surely most humans would be too dumb to understand such a proof? And even if you could understand it, how does the AI convince you that it doesn’t contain a deliberate flaw that you aren’t smart enough to find? Or even better, you can just refuse to look at the proof. How does the AI make its precommitment credible to you if you don’t look at the proof?
  
  EDIT: I realized that the last two sentences are not an advantage of being dumb, or human, since AIs can do the same thing. This seems like a (separate) big puzzle to me: why would a human, or AI, do the work necessary to verify the opponent’s precommitment, when it would be better off if the opponent couldn’t precommit?
  
  EDIT2: Sorry, forgot to say that you have a good point about simulation not necessary for verifying precommitment.
  What links here?
  - loqi's comment on The AI in a box boxes you by Stuart_Armstrong (7 Feb 2010 23:51 UTC; 0 points)
  - Eliezer Yudkowsky 5 Feb 2010 6:26 UTC
    12 points
    0
    Parent
    
    why would a human, or AI, do the work necessary to verify the opponent’s precommitment, when it would be better off if the opponent couldn’t precommit?
    
    Because the AI has already precommitted to go ahead and carry through the threat anyway if you refuse to inspect its code.
    - Wei Dai 5 Feb 2010 16:21 UTC
      11 points
      0
      Parent
      Ok, if I believe that, then I would inspect its code. But how did I end up with that belief, instead of its opposite, namely that the AI has not already precommitted to go ahead and carry through the threat anyway if I refuse to inspect its code? By what causal mechanism, or chain of reasoning, did I arrive at that belief? (If the explanation is different depending on whether I’m a human or an AI, I’d appreciate both.)
  - loqi 5 Feb 2010 5:04 UTC
    3 points
    Parent
    Do you mean too dumb to understand the formal definitions involved? Surely the AI could cook up completely mechanical proofs verifiable by whichever independently-trusted proof checkers you care to name.
    
    I’m not aware of any compulsory verifiers, so your latter point stands.
    - Wei Dai 5 Feb 2010 5:31 UTC
      3 points
      Parent
      I mean if you take a random person off the street, he couldn’t possibly understand the AI’s proof, or know how to build a trustworthy proof checker. Even the smartest human might not be able to build a proof checker that doesn’t contain a flaw that the AI can exploit. I think there is still something to my “dumbness is a possible negotiation advantage” puzzle.
    - aausch 5 Feb 2010 5:34 UTC
      1 point
      Parent
      The Map is not the Territory.
      - loqi 5 Feb 2010 7:16 UTC
        0 points
        Parent
        Far out.
        aausch 5 Feb 2010 9:11 UTC
        0 points
        Parent
        Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.
        
        Going up a meta-level doesn’t simplify the problem, in this case—the intelligence capability required to verify the proof is the same as the order of magnitude of intelligence in the AI.
        
        I believe that, in this case, “dumb” is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI’s proof.
        loqi 5 Feb 2010 18:49 UTC
        4 points
        Parent
        
        Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.
        
        This is basically what I mean by “understanding” them. Otherwise, what’s to understand? Would you claim that you “understand set theory” because you’ve memorized the axioms of ZFC?
        
        I believe that, in this case, “dumb” is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI’s proof.
        
        This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI’s goal is for the human to understand the proof, it seems more like a test of the AI’s ability to compile proofs down to easily machine-checkable forms than it is the human’s ability to understand the originals. Understanding the definitions is the hard part.
        aausch 7 Feb 2010 22:30 UTC
        0 points
        Parent
        
        This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI’s goal is for the human to understand the proof, it seems more like a test of the AI’s ability to compile proofs down to easily machine-checkable forms than it is the human’s ability to understand the originals. Understanding the definitions is the hard part.
        
        A different way to think about this that might help you see the problem from my point of view, is to think of proof checkers as checking the validity of proofs within a given margin of error, and within a range of (implicit) assumptions. How accurate does a proof checker have to be—how far do you have to mess with bult in assumptions for proof checkers (or any human-built tool) before they can no longer be thought of as valid or relevant? If you assume a machine which doubles both its complexity and its understanding of the universe at sub-millisecond intervals, how long before it will find the bugs in any proof checker you will pit it against?
        loqi 7 Feb 2010 23:51 UTC
        0 points
        Parent
        “If” is the question, not “how long”. And I think we’d stand a pretty good chance of handling a proof object in a secure way, assuming we have a secure digital transmission channel etc.
        
        But the original scope of the thought experiment was assuming that we want to verify the proof. Wei Dai said:
        
        Surely most humans would be too dumb to understand such a proof? And even if you could understand it, how does the AI convince you that it doesn’t contain a deliberate flaw that you aren’t smart enough to find? Or even better, you can just refuse to look at the proof.
        
        I was responding to the first question, exclusively disjoint from the others. If your point is that we shouldn’t attempt to verify an AI’s precommitment proof, I agree.
        aausch 9 Feb 2010 22:19 UTC
        0 points
        Parent
        I’m getting more confused. To me, the statements “Humans are too dumb to understand the proof” and the statement “Humans can understand the proof given unlimited time”, where ‘understand’ is qualified to include the ability to properly map the proof to the AI’s capabilities, are equivalent.
        
        My point is not that we shouldn’t attempt to verify the AI’s proof for any external reasons—my point is that there is no useful information to be gained from the attempt.