Richard_Ngo comments on What is the alternative to intent alignment called?

Richard_Ngo 30 Apr 2020 11:49 UTC
LW: 3 AF: 1
AF
So I guess more specifically what I’m trying to ask is: how do we distinguish between interpreting the good thing as “human intentions for the agent” versus “human goals”?
In other words, we have at least four options here:
1. AI intends to do what the human wants it to do.
2. AI actually achieves what the human wants it to do.
3. AI intends to pursue the human’s true goals.
4. AI actually achieves the human’s true goals.
So right now intent alignment (as specified by Paul) describes 1, and outcome alignment (as I’m inferring from your description) describes 4. But it seems quite important to have a name for 3 in particular.
- Ramana Kumar 21 Jul 2020 14:13 UTC
  LW: 1 AF: 1
  AF Parent
  I would use ‘outcome alignment’ for 2 (and agree with ‘intent alignment’ for 1). In other words, I see the important distinction between ‘outcome’ and ‘intent’ being in the first part of the options, not the second.
  I’d be inclined to see 3 and 4 as variations on 1 and 2 where what the human wants is for the AI to figure out some notion of their true goals and pursue/achieve that.