John_Maxwell comments on Can corrigibility be learned safely?

John_Maxwell 4 Apr 2018 0:58 UTC
2 points
0

if you try to learn in large chunks, you risk corrupting the external human and then learning corrupted versions of understanding and corrigibility

Why do you think small vs large chunks is the key issue when it comes to corrupting the external human? Can you articulate the chunk size at which you believe things start to become problematic?
- paulfchristiano 4 Apr 2018 1:45 UTC
  4 points
  0
  Parent
  There are many reasons you might consider an input “safe,” in the sense that you believe the human’s behavior on that input is benign. In my post I suggested relying on the safety of simple queries, and discussed some of these issues.