Gram_Stone comments on In partially observable environments, stochastic policies can be optimal

Gram_Stone 19 Jul 2016 14:19 UTC
8 points
Is the Absent-minded Driver an example of a single-player decision problem whose optimal policy is stochastic? Isn’t the optimal policy to condition your decision on an unbiased coin?

I ask because it seems like it might make a good intuitive example, as opposed to the POMDP in the OP. But I’m not sure who your intended audience is.
- Stuart_Armstrong 19 Jul 2016 17:16 UTC
  5 points
  Parent
  Yes, you can see this POMDP as a variant of the absent minded-driver, and get that result.