No, that’s not what I was saying. When I said “reward acquisition” I meant the actual reward function (that is, the base objective).
Wait, then how is “improving across-episode exploration” different from “preventing the agent from making an accidental mistake”? (What’s a situation that counts as one but not the other?)
Like I said in the post, I’m skeptical that “preventing the agent from making an accidental mistake” is actually a meaningful concept (or at least, it’s a concept with many possible conflicting definitions), so I’m not sure how to give an example of it.