zulupineapple comments on Biased reward-learning in CIRL

zulupineapple 18 Jan 2018 12:00 UTC
3 points
0
CIRL model might simply not be flexible enough to represent manipulative actions. The state $s$ is known to both agents and is supposed to represent the world, but if $θ$ isn’t known to $R$ then the internal state of $H$ is not contained in $s$ . Then there needs to be some other $s^{H}$ invisible to $R$ , and an extended transition function, which is able to affect this state.