But it really does seem that there is a difference between facing an environment and another player—the other player adapts to your strategy in a way the environment doesn’t. The environment only adapts to your actions.
I think for unbounded agents facing the environment, a deterministic policy is always optimal, but this might not be the case for bounded agents.
I always had the informal impression that the optimal policies were deterministic
So an impression that optimal memoryless polices were deterministic?
That seems even less likely to me. If the environment has state, and you’re not allowed to, you’re playing at a disadvantage. Randomness is one way to counter state when you don’t have state.
But it really does seem that there is a difference between facing an environment and another player—the other player adapts to your strategy in a way the environment doesn’t. The environment only adapts to your actions.
I still don’t see a difference. Your strategy is only known from your actions by both another player and the environment, so they’re in the same boat.
Labeling something the environment or a player seems arbitrary and irrelevant. What capabilities are we talking about? Are these terms of art for which some standard specifying capability exists?
What formal distinctions have been made between players and environments?
Take a game with a mixed strategy Nash equilibrium. If you and the other player follow this, using source of randomness that remain random for the other player, then it is never to your advantage to deviate from this. You play this game, again and again, against another player or against the environment.
Consider an environment in which the opponent’s strategies are in an evolutionary arms race, trying to best beat you; this is an environmental model. Under this, you’d tend to follow the Nash equilibrium on average, but, at (almost) any given turn, there’s a deterministic choice that’s a bit better than being stochastic, and it’s determined by the current equilibrium of strategies of the opponent/environment.
However, if you’re facing another player, and you make deterministic choices, you’re vulnerable if ever they figure out your choice. This is because they can peer into your algorithm, not just track your previous actions. To avoid this, you have to be stochastic.
This seems like a potentially relevant distinction.
Really? I wouldn’t have ever thought that at all. Why do you think you thought that?
Isn’t kind of what a player is? Part of the environment with a strategy and only partially observable states?
Although for this player, don’t you have an optimal strategy, except for the first move? The Markov “Player” seems to like change.
Isn’t this strategy basically optimal? ABABABABABAB… Deterministic, just not the same every round. Am I missing something?
It’s deterministic, but not memoryless.
But it really does seem that there is a difference between facing an environment and another player—the other player adapts to your strategy in a way the environment doesn’t. The environment only adapts to your actions.
I think for unbounded agents facing the environment, a deterministic policy is always optimal, but this might not be the case for bounded agents.
So an impression that optimal memoryless polices were deterministic?
That seems even less likely to me. If the environment has state, and you’re not allowed to, you’re playing at a disadvantage. Randomness is one way to counter state when you don’t have state.
I still don’t see a difference. Your strategy is only known from your actions by both another player and the environment, so they’re in the same boat.
Labeling something the environment or a player seems arbitrary and irrelevant. What capabilities are we talking about? Are these terms of art for which some standard specifying capability exists?
What formal distinctions have been made between players and environments?
Take a game with a mixed strategy Nash equilibrium. If you and the other player follow this, using source of randomness that remain random for the other player, then it is never to your advantage to deviate from this. You play this game, again and again, against another player or against the environment.
Consider an environment in which the opponent’s strategies are in an evolutionary arms race, trying to best beat you; this is an environmental model. Under this, you’d tend to follow the Nash equilibrium on average, but, at (almost) any given turn, there’s a deterministic choice that’s a bit better than being stochastic, and it’s determined by the current equilibrium of strategies of the opponent/environment.
However, if you’re facing another player, and you make deterministic choices, you’re vulnerable if ever they figure out your choice. This is because they can peer into your algorithm, not just track your previous actions. To avoid this, you have to be stochastic.
This seems like a potentially relevant distinction.
Is this how you define environment?
At least as an informal definition, it seems pretty good.