One motivation for reinforcement learning as a framework for artificial intelligence is that many—or perhaps all—”intelligence-requiring” tasks can be formulated by means of an agent who prefers some observations over others and seeks to take actions that bring about their preferred observations. This preference is captured numerically with loss or reward functions.
I think you should explicitly mention the fact that here, your loss has to be determined only by your action-observation history. This, for example, probably doesn’t include most humans, who don’t stop caring about their children when blindfolded.
I think you should explicitly mention the fact that here, your loss has to be determined only by your action-observation history. This, for example, probably doesn’t include most humans, who don’t stop caring about their children when blindfolded.