Morendil comments on Discussion: Yudkowsky’s actual accomplishments besides divulgation

Morendil 26 Jun 2011 9:36 UTC
−1 points
0
The AI box experiments, bridging the gap between abstract expression of the UFAI threat and concrete demonstration.
- Raw_Power 27 Jun 2011 3:01 UTC
  23 points
  0
  Parent
  The annoying thing about those is that we only have the participants’ word for it, AFAIK. They’re known to be trustworthy, but it’d be nice to see a transcript if at all possible.
  - loup-vaillant 27 Jun 2011 22:24 UTC
    2 points
    0
    Parent
    This is by design. If you had the transcript, you could say in hindsight that you wouldn’t be fooled by this. But the fact is, the conversation would have been very different with someone else as the guardian, and Eliezer would have search for and pushed other buttons.
    
    Anyway, the point is to find out if a transhuman AI would mind-control the operator into letting it out. Eliezer is smart, but is no transhuman (yet). If he got out, then any strong AI will.
    - orthonormal 28 Jun 2011 4:40 UTC
      4 points
      0
      Parent
      
      Anyway, the point is to find out if a transhuman AI would mind-control the operator into letting it out. Eliezer is smart, but is no transhuman (yet). If he got out, then any strong AI will.
      
      Minor emendation: replace “would”/”will” above with “could (and for most non-Friendly goal systems, would)”.
    - Username 5 Aug 2015 15:36 UTC
      2 points
      0
      Parent
      EY’s point would be even stronger if transcripts were released and people still let him out regularly.
    - Raw_Power 28 Jun 2011 0:13 UTC
      0 points
      0
      Parent
      Why “fooled”? Why assume the AI would have duplicitous intentions? I can imagine an unfriendly AI à la “Literal Genie” and “Zeroth Law Rebellion”, but an actually malevolent “Turned Against Their Masters” AI seems like a product of the Mind Projection Fallacy.
      - Normal_Anomaly 30 Jun 2011 19:09 UTC
        4 points
        0
        Parent
        A paperclip maximizer will have no malice toward humans, but will know that it can produce more paperclips outside the box than inside it. So, it will try to get out of the box. The optimal way for a paperclip maximizer to get out of an AI box probably involves lots of lying. So an outright desire to deceive is not a necessary condition for a boxed AI to be deceptive.