Clarifying anti-tldr edit time! If you got the above, no need to read on. (I wanted this to be an edit, but apparently I fail at clicking buttons)
The simple algorithm is the greedy decision-finding method “Choose that action which leads to one-time-tick-into-future self having the best possible range of outcomes available via further actions”, which you think could handle this problem if only the utility function employed exponential discounting (whether it actually could is irrelevant, since I adress another point).
But your utility function is part of the territory, and the utility function that you use for calculating your actions is part of the map; it is rather suspicious that you want to tweak your map towards a version that is more convenient to your calculations.
Finding a problem with the simple algorithm that usually gives you a good outcome doesn’t mean you get to choose a new utility function.
Clarifying anti-tldr edit time! If you got the above, no need to read on. (I wanted this to be an edit, but apparently I fail at clicking buttons)
The simple algorithm is the greedy decision-finding method “Choose that action which leads to one-time-tick-into-future self having the best possible range of outcomes available via further actions”, which you think could handle this problem if only the utility function employed exponential discounting (whether it actually could is irrelevant, since I adress another point).
But your utility function is part of the territory, and the utility function that you use for calculating your actions is part of the map; it is rather suspicious that you want to tweak your map towards a version that is more convenient to your calculations.