Is the Absent-minded Driver an example of a single-player decision problem whose optimal policy is stochastic? Isn’t the optimal policy to condition your decision on an unbiased coin?
I ask because it seems like it might make a good intuitive example, as opposed to the POMDP in the OP. But I’m not sure who your intended audience is.
Is the Absent-minded Driver an example of a single-player decision problem whose optimal policy is stochastic? Isn’t the optimal policy to condition your decision on an unbiased coin?
I ask because it seems like it might make a good intuitive example, as opposed to the POMDP in the OP. But I’m not sure who your intended audience is.
Yes, you can see this POMDP as a variant of the absent minded-driver, and get that result.