I don’t really go into the potential costs of a finite-state-Markov assumption here. The point of this post is mostly to claim that it’s not a hugely useful framework for thinking about RL.
The short answer for why I think there are costs to it is that the world is not finite-state Markov, certainly not fully observable finite state Markov. So yes, it could “remove information” by oversimplifying.
That section of the textbook seems to describe the alternative I mentioned: treating the whole interaction history as the state. It’s not finite-state anymore, but you can still treat the environment as fully observable without losing any generality, so that’s good. So if I were to take issue more strongly here, my issue would not be with the Markov property, but the finite state-ness.
The point of this point is mostly to claim that it’s not a hugely useful framework for thinking about RL.
Even though I agree it’s unrealistic, MDPs are still easier to prove things in and I still think that they can give us important insights. for example, if I had started with more complex environments when I was investigating instrumental convergence, I would’ve spent a ton of extra time grappling with the theorems for little perceived benefit. that is, the MDP framework let me more easily cut to the core insights. sometimes it’s worth thinking about more general computable environments, but probably not always.
I don’t really go into the potential costs of a finite-state-Markov assumption here. The point of this post is mostly to claim that it’s not a hugely useful framework for thinking about RL.
The short answer for why I think there are costs to it is that the world is not finite-state Markov, certainly not fully observable finite state Markov. So yes, it could “remove information” by oversimplifying.
That section of the textbook seems to describe the alternative I mentioned: treating the whole interaction history as the state. It’s not finite-state anymore, but you can still treat the environment as fully observable without losing any generality, so that’s good. So if I were to take issue more strongly here, my issue would not be with the Markov property, but the finite state-ness.
Even though I agree it’s unrealistic, MDPs are still easier to prove things in and I still think that they can give us important insights. for example, if I had started with more complex environments when I was investigating instrumental convergence, I would’ve spent a ton of extra time grappling with the theorems for little perceived benefit. that is, the MDP framework let me more easily cut to the core insights. sometimes it’s worth thinking about more general computable environments, but probably not always.