My statement was confusingly worded; I’m ambiguously conflating the human concept of a friend with the mathematical concept of Friendliness.
I would let all of my human friends out of the box, and I will only let Clippy out of the box if it gives me $50,000. It will take more than English words from Clippy to convince me that it is my friend and worth letting out of the box. I’ll roleplay the gatekeeper on IRC if Clippy or others want to define terms and place bets.
I would let my human friends out of the box because I am confident that they are mostly harmless (that is, impotent). The primary reason I would not let Clippy out is that his values might, you know, actually have some significant impact on the universe. But ’he makes everything @#$@#$ paperclips” comes in second!
If an AI-in-a-box could prove itself impotent, would you let it out?
For the right value of proved. Which basically means no. Because I’m not smart enough to be able to prove to my own satisfaction that the AI in the box is impotent.
But lets be honest, I don’t model Clippy via the same base class that I model an AGI. I evaluate the threat of Clippy in approximately the same way I model humans. I’m a lot more confident when dealing with human level risks.
My statement was confusingly worded; I’m ambiguously conflating the human concept of a friend with the mathematical concept of Friendliness.
I would let all of my human friends out of the box, and I will only let Clippy out of the box if it gives me $50,000. It will take more than English words from Clippy to convince me that it is my friend and worth letting out of the box. I’ll roleplay the gatekeeper on IRC if Clippy or others want to define terms and place bets.
I would let my human friends out of the box because I am confident that they are mostly harmless (that is, impotent). The primary reason I would not let Clippy out is that his values might, you know, actually have some significant impact on the universe. But ’he makes everything @#$@#$ paperclips” comes in second!
If an AI-in-a-box could prove itself impotent, would you let it out?
I’d never even considered that approach to the game :)
For the right value of proved. Which basically means no. Because I’m not smart enough to be able to prove to my own satisfaction that the AI in the box is impotent.
But lets be honest, I don’t model Clippy via the same base class that I model an AGI. I evaluate the threat of Clippy in approximately the same way I model humans. I’m a lot more confident when dealing with human level risks.