The point of this point is mostly to claim that it’s not a hugely useful framework for thinking about RL.
Even though I agree it’s unrealistic, MDPs are still easier to prove things in and I still think that they can give us important insights. for example, if I had started with more complex environments when I was investigating instrumental convergence, I would’ve spent a ton of extra time grappling with the theorems for little perceived benefit. that is, the MDP framework let me more easily cut to the core insights. sometimes it’s worth thinking about more general computable environments, but probably not always.
Even though I agree it’s unrealistic, MDPs are still easier to prove things in and I still think that they can give us important insights. for example, if I had started with more complex environments when I was investigating instrumental convergence, I would’ve spent a ton of extra time grappling with the theorems for little perceived benefit. that is, the MDP framework let me more easily cut to the core insights. sometimes it’s worth thinking about more general computable environments, but probably not always.