Do you know yet how your approach would differ from applying inverse RL (e.g. MaxCausalEnt IRL)?
If you don’t want to assume full-blown demonstrations (where you actually reach the goal), you can still combine a reward function learned from IRL with a specification of the goal. That’s effectively what we did in Preferences Implicit in the State of the World.
(The combination there isn’t very principled; a more principled version would use a CIRL-style setup, which is discussed in Section 8.6 of my thesis.)
Those are very relevant to this project, thanks. I want to see how far we can push these approaches; maybe some people you know would like to take part?
Hmm, you might want to reach out to CHAI folks, though I don’t have a specific person in mind at the moment. (I myself am working on different things now.)
Do you know yet how your approach would differ from applying inverse RL (e.g. MaxCausalEnt IRL)?
If you don’t want to assume full-blown demonstrations (where you actually reach the goal), you can still combine a reward function learned from IRL with a specification of the goal. That’s effectively what we did in Preferences Implicit in the State of the World.
(The combination there isn’t very principled; a more principled version would use a CIRL-style setup, which is discussed in Section 8.6 of my thesis.)
Those are very relevant to this project, thanks. I want to see how far we can push these approaches; maybe some people you know would like to take part?
Hmm, you might want to reach out to CHAI folks, though I don’t have a specific person in mind at the moment. (I myself am working on different things now.)
Cool, thanks; already in contact with them.