If I get it correctly, your issue is with the Markov Property of MDP? It simplifies the computation of the policy by not requiring to know the path by which the agent arrived at a given state; but it also removes the information about the history that is not written down into the state itself.
Not sure if you know it or if it is that useful, but this section of “Reinforcement Learning: an introduction” discuss ways to go beyond MDP and the Markov property.
I don’t really go into the potential costs of a finite-state-Markov assumption here. The point of this post is mostly to claim that it’s not a hugely useful framework for thinking about RL.
The short answer for why I think there are costs to it is that the world is not finite-state Markov, certainly not fully observable finite state Markov. So yes, it could “remove information” by oversimplifying.
That section of the textbook seems to describe the alternative I mentioned: treating the whole interaction history as the state. It’s not finite-state anymore, but you can still treat the environment as fully observable without losing any generality, so that’s good. So if I were to take issue more strongly here, my issue would not be with the Markov property, but the finite state-ness.
The point of this point is mostly to claim that it’s not a hugely useful framework for thinking about RL.
Even though I agree it’s unrealistic, MDPs are still easier to prove things in and I still think that they can give us important insights. for example, if I had started with more complex environments when I was investigating instrumental convergence, I would’ve spent a ton of extra time grappling with the theorems for little perceived benefit. that is, the MDP framework let me more easily cut to the core insights. sometimes it’s worth thinking about more general computable environments, but probably not always.
If I get it correctly, your issue is with the Markov Property of MDP? It simplifies the computation of the policy by not requiring to know the path by which the agent arrived at a given state; but it also removes the information about the history that is not written down into the state itself.
Not sure if you know it or if it is that useful, but this section of “Reinforcement Learning: an introduction” discuss ways to go beyond MDP and the Markov property.
I don’t really go into the potential costs of a finite-state-Markov assumption here. The point of this post is mostly to claim that it’s not a hugely useful framework for thinking about RL.
The short answer for why I think there are costs to it is that the world is not finite-state Markov, certainly not fully observable finite state Markov. So yes, it could “remove information” by oversimplifying.
That section of the textbook seems to describe the alternative I mentioned: treating the whole interaction history as the state. It’s not finite-state anymore, but you can still treat the environment as fully observable without losing any generality, so that’s good. So if I were to take issue more strongly here, my issue would not be with the Markov property, but the finite state-ness.
Even though I agree it’s unrealistic, MDPs are still easier to prove things in and I still think that they can give us important insights. for example, if I had started with more complex environments when I was investigating instrumental convergence, I would’ve spent a ton of extra time grappling with the theorems for little perceived benefit. that is, the MDP framework let me more easily cut to the core insights. sometimes it’s worth thinking about more general computable environments, but probably not always.