I think that the “state” in this case is just the full observation history, and is therefore observable by definition. The state space is enormous, of course, but they deal with it by value and policy networks which are both RNNs (probably sharing some parameters) that process the state as a temporal sequence.
I think that the “state” in this case is just the full observation history, and is therefore observable by definition. The state space is enormous, of course, but they deal with it by value and policy networks which are both RNNs (probably sharing some parameters) that process the state as a temporal sequence.