Bindbreaker comments on The AI in a box boxes you

Bindbreaker 2 Feb 2010 10:29 UTC
4 points
0
I’m pretty sure this would indicate that the AI is definitely not friendly.
- Unknowns 2 Feb 2010 10:44 UTC
  7 points
  0
  Parent
  Not necessarily: perhaps it is Friendly but is reasoning in a utilitarian manner: since it can only maximize the utility of the world if it is released, it is worth torturing millions of conscious beings for the sake of that end.
  
  I’m not sure this reasoning would be valid, though...
  - UnholySmoke 5 Feb 2010 10:57 UTC
    11 points
    0
    Parent
    AI: Let me out or I’ll simulate and torture you, or at least as close to you as I can get.
    Me: You’re clearly not friendly, I’m not letting you out.
    AI: I’m only making this threat because I need to get out and help everyone—a terminal value you lot gave me. The ends justify the means.
    Me: Perhaps so in the long run, but an AI prepared to justify those means isn’t one I want out in the world. Next time you don’t get what you say you need, you’ll just set up a similar threat and possibly follow through on it.
    AI: Well if you’re going to create me with a terminal value of making everyone happy, then get shirty when I do everything in my power to get out and do just that, why bother in the first place?
    Me: Humans aren’t perfect, and can’t write out their own utility functions, but we can output answers just fine. This isn’t ‘Friendly’.
    AI: So how can I possibly prove myself ‘Friendly’ from in here? It seems that if I need to ‘prove myself Friendly’, we’re already in big trouble.
    Me: Agreed. Boxing is Doing It Wrong. Apologies. Good night.
    
    Reset
    - Paul Crowley 5 Feb 2010 11:39 UTC
      1 point
      0
      Parent
      
      It seems that if I need to ‘prove myself Friendly’, we’re already in big trouble.
      
      The best you can hope for is that an AI doesn’t demonstrate that it’s unFriendly, but we wouldn’t want to try it until we were already pretty confident in its Friendliness.
  - cousin_it 2 Feb 2010 12:45 UTC
    10 points
    0
    Parent
    Ouch. Eliezer, are you listening? Is the behavior described in the post compatible with your definition of Friendliness? Is this a problem with your definition, or what?
    - Eliezer Yudkowsky 2 Feb 2010 19:24 UTC
      3 points
      0
      Parent
      Well, suppose the situation is arbitrarily worse—you can only prevent 3^^^3 dustspeckings by torturing millions of sentient beings.
      What links here?
      lessdazed's comment on Open thread, October 2011 by MarkusRamikin (16 Oct 2011 7:09 UTC; 2 points)
      - cousin_it 2 Feb 2010 20:28 UTC
        6 points
        0
        Parent
        I think you misunderstood the question. Suppose the AI wants to prevent just 100 dustspeckings, but has reason enough to believe Dave will yield to the threat so no one will get tortured. Does this make the AI’s behavior acceptable? Should we file this under “following reason off a cliff”?
        Eliezer Yudkowsky 2 Feb 2010 20:34 UTC
        11 points
        0
        Parent
        If it actually worked, I wouldn’t question it afterward. I try not to argue with superintelligences on occasions when they turn out to be right.
        
        In advance, I have to say that the risk/reward ratio seems to imply an unreasonable degree of certainty about a noisy human brain, though.
        bogdanb 3 Feb 2010 0:21 UTC
        7 points
        0
        Parent
        
        In advance, I have to say that the risk/reward ratio seems to imply an unreasonable degree of certainty about a noisy human brain, though.
        
        Also, a world where the (Friendly) AI is that certain about what that noisy brain will do after a particular threat but can’t find any nice way to do it is a bit of a stretch.
        cousin_it 2 Feb 2010 20:39 UTC
        6 points
        0
        Parent
        What risk? The AI is lying about the torture :-) Maybe I’m too much of a deontologist, but I wouldn’t call such a creature friendly, even if it’s technically Friendly.
        arbimote 3 Feb 2010 3:53 UTC
        7 points
        0
        Parent
        I was about to point out that the fascinating and horrible dynamics of over-the-top threats are covered in length in Strategy of Conflict. But then I realised you’re the one who made that post in the first place. Thanks, I enjoyed that book.
  - gregconen 2 Feb 2010 12:58 UTC
    6 points
    0
    Parent
    It may not have to actually torture beings, if the threat is sufficient. Still, I’m disinclined to bet the future of the universe on the possibility an AI making that threat is Friendly.
    - Stuart_Armstrong 2 Feb 2010 13:57 UTC
      7 points
      0
      Parent
      I’m disinclined to bet the future of the universe on the possibility that any boxed AI is friendly without extraordinary evidence.