But it seems more natural is to use the human’s utterances as evidence, and either learn a model of the relationship between utterances and goals, or work in a setting where we can model the user as modeling the robot and making goal-directed utterances (which provide evidence about their goals in the same way as a trajectory would).
At extreme computing power, that would be the right approach (if we’ve managed to ground evidence of goals in the right way). But for lesser agents, I want to see if we can learn anything by doing it this way (and thanks for those links, but I’d already encountered them ^_^).
So what is the actual model here? We have IRL algorithms that in principle solve the proposed problem. The given hint doesn’t make the problem much easier information-theoretically, and if we are able to make collaborative IRL with language work then it doesn’t make the problem information-theoretically easier at all. Your concern doesn’t seem to be that IRL won’t work when the AI becomes powerful, but that it won’t work when AI is weak.
If you want to study a problem like this, it seems like you have to engage with the algorithms which you are claiming won’t work in practice, and then show that some modification improves their practical performance.
(Also, it seems weird to write a post about an elaboration of IRL, whose utility is founded on an empirical claim about the behavior of IRL algorithms, without mentioning it at all.
ETA: nevermind, didn’t see the previous post where this is discussed.)
At extreme computing power, that would be the right approach (if we’ve managed to ground evidence of goals in the right way). But for lesser agents, I want to see if we can learn anything by doing it this way (and thanks for those links, but I’d already encountered them ^_^).
So what is the actual model here? We have IRL algorithms that in principle solve the proposed problem. The given hint doesn’t make the problem much easier information-theoretically, and if we are able to make collaborative IRL with language work then it doesn’t make the problem information-theoretically easier at all. Your concern doesn’t seem to be that IRL won’t work when the AI becomes powerful, but that it won’t work when AI is weak.
If you want to study a problem like this, it seems like you have to engage with the algorithms which you are claiming won’t work in practice, and then show that some modification improves their practical performance.
(Also, it seems weird to write a post about an elaboration of IRL, whose utility is founded on an empirical claim about the behavior of IRL algorithms, without mentioning it at all.
ETA: nevermind, didn’t see the previous post where this is discussed.)