Rohin Shah comments on Exploring safe exploration

Rohin Shah 7 Jan 2020 8:14 UTC
LW: 2 AF: 2
0
AF
No, that’s not what I was saying. When I said “reward acquisition” I meant the actual reward function (that is, the base objective).
Wait, then how is “improving across-episode exploration” different from “preventing the agent from making an accidental mistake”? (What’s a situation that counts as one but not the other?)
- evhub 7 Jan 2020 8:41 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Like I said in the post, I’m skeptical that “preventing the agent from making an accidental mistake” is actually a meaningful concept (or at least, it’s a concept with many possible conflicting definitions), so I’m not sure how to give an example of it.