In the FHI’s indifference paper, they define policies as mapping observation-action histories to a distribution over actions instead of just actions (“π : H → ∆(A)”). Why is that? Is that common? Does it mean the agent is stochastic?
I didn’t look at that particular paper, but that definition sounds like a reasonable way of doing it, since that way your results apply to both stochastic and deterministic agents. A deterministic policy is a special case of a stochastic policy, where the distribution over actions assigns one action 100% probability of being taken and all other actions a 0% probability. So if you define policies as mapping from histories to distributions of actions, that allows for both deterministic and stochastic agents.
Yeah. I think I did notice it talking about a stochastic policy at one point, and on reflection I don’t see any other reasonable way to do that. This interpretation also accords with making the agent’s actions part of the observation history. If they were a pure function of the observations, we wouldn’t need them to be there.
In the FHI’s indifference paper, they define policies as mapping observation-action histories to a distribution over actions instead of just actions (“π : H → ∆(A)”). Why is that? Is that common? Does it mean the agent is stochastic?
I didn’t look at that particular paper, but that definition sounds like a reasonable way of doing it, since that way your results apply to both stochastic and deterministic agents. A deterministic policy is a special case of a stochastic policy, where the distribution over actions assigns one action 100% probability of being taken and all other actions a 0% probability. So if you define policies as mapping from histories to distributions of actions, that allows for both deterministic and stochastic agents.
Yeah. I think I did notice it talking about a stochastic policy at one point, and on reflection I don’t see any other reasonable way to do that. This interpretation also accords with making the agent’s actions part of the observation history. If they were a pure function of the observations, we wouldn’t need them to be there.