the system can feel rewarded now by knowing that it will in future dominate in the universe
I do not see any mapping from these concepts to its action-selection-criterion of maximizing the rewards for its current episode.
I do not see any mapping from these concepts to its action-selection-criterion of maximizing the rewards for its current episode.