Both 1. “mimic the human” and 2. “maximize according to the human’s ontology” will only work well if the human can actually develop a world model (and in the case of 1 also plans) as well as the AI can. If you can do this, then we are probably set on value learning (at least at the level of detail in this post). Moreover, if we can produce world models as good as the AI then we can probably also produce plans as good as the AI, so probably we can just focus on [1]. I’m obviously much more optimistic about this than about approach [3].
Note: I think that the only reason to be interested in approval-directed agents rather than straightforward imitation learners is that it may be harder to effectively imitate behavior than to solve the same task in a very different way. So it seems wrong to say that imitation is most useful as an input into approval-directed agents.
Both 1. “mimic the human” and 2. “maximize according to the human’s ontology” will only work well if the human can actually develop a world model (and in the case of 1 also plans) as well as the AI can. If you can do this, then we are probably set on value learning (at least at the level of detail in this post). Moreover, if we can produce world models as good as the AI then we can probably also produce plans as good as the AI, so probably we can just focus on [1]. I’m obviously much more optimistic about this than about approach [3].
Note: I think that the only reason to be interested in approval-directed agents rather than straightforward imitation learners is that it may be harder to effectively imitate behavior than to solve the same task in a very different way. So it seems wrong to say that imitation is most useful as an input into approval-directed agents.