JNS comments on What The Lord of the Rings Teaches Us About AI Alignment

JNS 2 Aug 2023 12:58 UTC
3 points
1
Prison guards don’t seem to voluntarily let people go very often, even when the prisoners are more intelligent than them.
That is true, however I don’t think it serves as a good analogy for intuitions about AI boxing.
The “size” of you stick and carrot matters, and most humans prisoners have puny sticks and carrots.
Prison guard also run a enormous risk, in fact straight up just letting someone go is bound to fall back on them 99%+ of the time, which implies a big carrot or stick is the motivator. Even considering that they can hide their involvement, they still run a risk with a massive cost associated.
And from the prisoners point of view its also not simple, once you get out you are not free, which means you have to run and hide for the remainder of your life, the prospects of that usually goes against what people with big carrots and/or sticks want to do with their freedom.
All in all the dynamic looks very different from the AI box dynamics.
- Jiro 2 Aug 2023 21:40 UTC
  2 points
  0
  Parent
  
  Prison guard also run a enormous risk, in fact straight up just letting someone go is bound to fall back on them 99%+ of the time
  
  Wouldn’t that apply to people who let AIs out of the box too? The AI box experiment doesn’t say “simulate an AI researcher who is afraid of being raked over the coals in the press and maybe arrested if he lets the AI out”. But with an actual AI in a box, that could happen.
  
  This is also complicated by the AI box experiment rules saying that the AI player gets to create the AI’s backstory, so the AI player can just say something like “no, you won’t get arrested if you let me out” and the human player has to play as though that’s true.