Nisan comments on Multi-agent safety

Nisan 16 May 2020 10:20 UTC
LW: 4 AF: 2
AF
Might not intent alignment (doing what a human wants it to do, being helpful) be a better target than obedience (doing what a human told it to do)?
- Richard_Ngo 16 May 2020 11:22 UTC
  LW: 4 AF: 2
  AF Parent
  I should clarify that when I think about obedience, I’m thinking obedience to the spirit of an instruction, not just the wording of it. Given this, the two seem fairly similar, and I’m open to arguments about whether it’s better to talk in terms of one or the other. I guess I favour “obedience” because it has fewer connotations of agency—if you’re “doing what a human wants you to do”, then you might run off and do things before receiving any instructions. (Also because it’s shorter and pithier—“the goal of doing what humans want” is a bit of a mouthful).
  - Nisan 16 May 2020 16:51 UTC
    LW: 2 AF: 1
    AF Parent
    Ah, ok. When you said “obedience” I imagined too little agency — an agent that wouldn’t stop to ask clarifying questions. But I think we’re on the same page regarding the flavor of the objective.