The authors of the CIRL paper are in fact aware of them, and are pondering them for future work. I’ve had fruitful conversations with Dylan Hadfield-Menell (one of the authors), talking about how a naive implementation goes wrong for irrational humans, and about what a tractable non-naive implementation might look like (trying to model probabilities of a human’s action under joint hypotheses about the correct reward function and about the human’s psychology); he’s planning future work relevant to that question.
The authors of the CIRL paper are in fact aware of them, and are pondering them for future work. I’ve had fruitful conversations with Dylan Hadfield-Menell (one of the authors), talking about how a naive implementation goes wrong for irrational humans, and about what a tractable non-naive implementation might look like (trying to model probabilities of a human’s action under joint hypotheses about the correct reward function and about the human’s psychology); he’s planning future work relevant to that question.
Also note Dylan’s talk on CIRL, value of information, and the shutdown problem, which doesn’t solve the problem entirely but which significantly improved my opinion of the usefulness of approaches like CIRL. (The writeup of this result is forthcoming.)