Stuart_Armstrong comments on Reward function learning: the learning process

Stuart_Armstrong 26 Apr 2018 8:59 UTC
2 points
0
Do you mean “onto” rather than “one-to-one”? (If the function is not one-to-one, which two inputs map to the same output?)
The observation function is onto, and not one-to-one. For most states $s \in S$ , the states $s \times {cook}$ and $s \times {wash}$ will map to the same observation.
Do you mean “then” instead of “when”?
Thanks, I’ve now corrected that.