What are the main differences from the formalism in this paper?
Rewards and POMDP rather than utility and general environments.
This formalism adds nothing (it’s designed for its intended audience, but all these formalisms are pretty similar), it’s just posted here for the next posts, which will use it.
What are the main differences from the formalism in this paper?
Rewards and POMDP rather than utility and general environments.
This formalism adds nothing (it’s designed for its intended audience, but all these formalisms are pretty similar), it’s just posted here for the next posts, which will use it.