Rohin Shah comments on I’m no longer sure that I buy dutch book arguments and this makes me skeptical of the “utility function” abstraction

Rohin Shah 22 Jun 2021 9:03 UTC
4 points
0
The problem with this is also that it’s too expressive. For any policy π, you can encode that policy into this sort of gradient: if π takes action a in state s, you say that your gradient points towards a (or the state s’ that results from taking action a), and away from every other action / state.