dr_s comments on The absent-minded variations

dr_s 20 May 2025 15:23 UTC
2 points
0
I am interpreting that formula as “compute this quantity in the sum and find the $a^{'}$ from the set of all possible actions $A$ that maximizes it, then do that”. Am I wrong? If that’s the interpretation, then that policy will always produce a pure strategy in the case of the absent-minded driver. I could actually write down all the functions for it since they are essentially simple lookup tables for such a small case.

The policy assigns non-zero probability to both exit and advance. But only one of the two has higher expected utility. Or is your point that the only self-consistent policy is the one where both have equal expected utility, and thus I can in fact choose either? Though then I have to choose according to the probabilities specified in the policy.
- jessicata 20 May 2025 16:07 UTC
  2 points
  0
  Parent
  Think of it as a predicate on policies. The predicate (local optimality) is true when, for each action the policy assigns non-zero probability to, that action maximizes expected utility relative to the policy.
  
  I am interpreting that formula as “compute this quantity in the sum and find the a’ from the set of all possible actions \mathcal{A} that maximizes it, then do that”. Am I wrong?
  
  Yes. It’s a predicate on policies. If two different actions (given an observation) maximize expected utility, then either action can be taken. Your description doesn’t allow that, because it assumes there is a single a’ that maximizes expected utility. Whereas, with a predicate on policies, we could potentially allow multiple actions.
  
  Or is your point that the only self-consistent policy is the one where both have equal expected utility, and thus I can in fact choose either? Though then I have to choose according to the probabilities specified in the policy.
  
  Yes, exactly. Look up Nash equilibrium in matching pennies. It’s pretty similar. (Except your expected utilities as a function of your action depend on the opponent’s actions in matching pennies, and your own action in absent minded driver.)