I am interpreting that formula as “compute this quantity in the sum and find the a′ from the set of all possible actions A that maximizes it, then do that”. Am I wrong? If that’s the interpretation, then that policy will always produce a pure strategy in the case of the absent-minded driver. I could actually write down all the functions for it since they are essentially simple lookup tables for such a small case.
The policy assigns non-zero probability to both exit and advance. But only one of the two has higher expected utility. Or is your point that the only self-consistent policy is the one where both have equal expected utility, and thus I can in fact choose either? Though then I have to choose according to the probabilities specified in the policy.
Think of it as a predicate on policies. The predicate (local optimality) is true when, for each action the policy assigns non-zero probability to, that action maximizes expected utility relative to the policy.
I am interpreting that formula as “compute this quantity in the sum and find the a’ from the set of all possible actions \mathcal{A} that maximizes it, then do that”. Am I wrong?
Yes. It’s a predicate on policies. If two different actions (given an observation) maximize expected utility, then either action can be taken. Your description doesn’t allow that, because it assumes there is a single a’ that maximizes expected utility. Whereas, with a predicate on policies, we could potentially allow multiple actions.
Or is your point that the only self-consistent policy is the one where both have equal expected utility, and thus I can in fact choose either? Though then I have to choose according to the probabilities specified in the policy.
Yes, exactly. Look up Nash equilibrium in matching pennies. It’s pretty similar. (Except your expected utilities as a function of your action depend on the opponent’s actions in matching pennies, and your own action in absent minded driver.)
I am interpreting that formula as “compute this quantity in the sum and find the a′ from the set of all possible actions A that maximizes it, then do that”. Am I wrong? If that’s the interpretation, then that policy will always produce a pure strategy in the case of the absent-minded driver. I could actually write down all the functions for it since they are essentially simple lookup tables for such a small case.
The policy assigns non-zero probability to both exit and advance. But only one of the two has higher expected utility. Or is your point that the only self-consistent policy is the one where both have equal expected utility, and thus I can in fact choose either? Though then I have to choose according to the probabilities specified in the policy.
Think of it as a predicate on policies. The predicate (local optimality) is true when, for each action the policy assigns non-zero probability to, that action maximizes expected utility relative to the policy.
Yes. It’s a predicate on policies. If two different actions (given an observation) maximize expected utility, then either action can be taken. Your description doesn’t allow that, because it assumes there is a single a’ that maximizes expected utility. Whereas, with a predicate on policies, we could potentially allow multiple actions.
Yes, exactly. Look up Nash equilibrium in matching pennies. It’s pretty similar. (Except your expected utilities as a function of your action depend on the opponent’s actions in matching pennies, and your own action in absent minded driver.)