IAFF-User-225 comments on CIRL Wireheading

IAFF-User-225 7 May 2017 13:12 UTC
LW: 1 AF: 1
AF
As an observation, it seems like part of the problem in this example is that the agent has access to different actions than the supervisor. The supervisor cannot move to $s_{2}$ (and therefore cannot provide any information about the reward difference, as noted), but the agent can easily do so. If this were not the case, it would not matter what the agent believed about $s_{2}$ .

What happens in scenarios where you restrict the set of actions available to the agent so that it matches those available to the supervisor?
- tom4everitt 8 Aug 2017 6:29 UTC
  0 points
  AF Parent
  That is a good question. I don’t think it is essential that the agent can move from $s_{1}$ to $s_{2}$ , only that the agent is able to force a stay in $s_{2}$ if it wants to.
  
  The transition from $s_{1}$ to $s_{2}$ could instead happen randomly with some probability.
  
  The important thing is that the human’s action in $s_{1}$ does not reveal any information about $s_{2}$ .