ESRogs comments on Seeking Power is Often Convergently Instrumental in MDPs

ESRogs 5 Dec 2019 19:41 UTC
LW: 6 AF: 3
AF
We bake the opponent’s policy into the environment’s rules: when you choose a move, the game automatically replies.
And the opponent plays to win, with perfect play?
- TurnTrout 5 Dec 2019 21:17 UTC
  LW: 6 AF: 3
  AF Parent
  Yes in this case, although note that that only tells us about the rules of the game, not about the reward function—most agents we’re considering don’t have the normal Tic-Tac-Toe reward function.