In higher dimensional spaces, assuming something like a continuous preference function, it should become increasingly likely for desired outcomes to be achievable by a series of myopic decisions. This is similar to how gradient descent works (and evolution, to a much lesser extent, due to its statistical tolerance for mild negative traits).
Given that the real world is immensely high dimensional, I would expect the most efficient optimisation function for a ML agent to learn would be somewhat myopic, and avoid this issue.
In higher dimensional spaces, assuming something like a continuous preference function, it should become increasingly likely for desired outcomes to be achievable by a series of myopic decisions. This is similar to how gradient descent works (and evolution, to a much lesser extent, due to its statistical tolerance for mild negative traits).
Given that the real world is immensely high dimensional, I would expect the most efficient optimisation function for a ML agent to learn would be somewhat myopic, and avoid this issue.