We bake the opponent’s policy into the environment’s rules: when you choose a move, the game automatically replies.
And the opponent plays to win, with perfect play?
Yes in this case, although note that that only tells us about the rules of the game, not about the reward function—most agents we’re considering don’t have the normal Tic-Tac-Toe reward function.
And the opponent plays to win, with perfect play?
Yes in this case, although note that that only tells us about the rules of the game, not about the reward function—most agents we’re considering don’t have the normal Tic-Tac-Toe reward function.