paulfchristiano comments on (C)IRL is not solely a learning process

paulfchristiano 21 Sep 2016 16:50 UTC
0 points
0
AF
What kind of object is $Q$ ? (I assume its not a string.) Are you directly specifying a distribution of preferences conditioned on observations? Are you specifying a distribution over observations conditioned on preferences and then using inference?

I assume the second case. So given that $Q$ is a predictive model, why wouldn’t you also use $Q$ as your model for planning? What is the advantage of using two separate models? Has anyone proposed using separate models in this way?

To the extent that your model $Q$ is bad, it seems like you are just doomed to perform badly, and the you either need to abandon the model-based approach or come up with a better model. Adding a second model $P$ doesn’t sound promising at face value.

It may be interesting or useful to have two models in this way, but I think it’s an unusual architecture that requires some discussion.