Rohin Shah comments on Exploring safe exploration

Rohin Shah 6 Jan 2020 22:20 UTC
LW: 8 AF: 5
0
AF
In a previous comment thread, Rohin argued that safe exploration is best defined as being about the agent not making “an accidental mistake.”
I definitely was not arguing that. I was arguing that safe exploration is currently defined in ML as the agent making an accidental mistake, and that we should really not be having terminology collisions with ML. (I may have left that second part implicit.)
Like you, I do not think this definition makes sense in the context of powerful AI systems, because it is evaluated from the perspective of an engineer outside the whole system. However, it makes a lot of sense for current ML systems, which are concerned with e.g. training self-driving cars, without ever having a single collision. You can solve the problem, by using the engineer’s knowledge to guide the training process. (See e.g. Parenting: Safe Reinforcement Learning from Human Input, Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, Safe Reinforcement Learning via Shielding, Formal Language Constraints for Markov Decision Processes (specifically the hard constraints).)
Fundamentally, I think current safe exploration research is about trying to fix that problem—that is, it’s about trying to make across-episode exploration less detrimental to reward acquisition.
Note that this also describes “prevent the agent from making accidental mistakes”. I assume that the difference you see is that you could try to make across-episode exploration less detrimental from the agent’s perspective, rather than from the engineer’s perspective, but I think literally none of the algorithms in the four papers I cited above, or the ones in Safety Gym, could reasonably be said to be improving exploration from the agent’s perspective and not the engineer’s perspective. I’d be interested in an example of an algorithm that improves across-episode exploration from the agent’s perspective, along with an explanation of why the improvements are from the agent’s perspective rather than the engineer’s.
- evhub 6 Jan 2020 23:56 UTC
  LW: 6 AF: 3
  0
  AF Parent
  I definitely was not arguing that. I was arguing that safe exploration is currently defined in ML as the agent making an accidental mistake, and that we should really not be having terminology collisions with ML. (I may have left that second part implicit.)
  
  Ah, I see—thanks for the correction. I changed “best” to “current.”
  
  I assume that the difference you see is that you could try to make across-episode exploration less detrimental from the agent’s perspective
  
  No, that’s not what I was saying. When I said “reward acquisition” I meant the actual reward function (that is, the base objective).
  
  EDIT:
  
  That being said, it’s a little bit tricky in some of these safe exploration setups to draw the line between what’s part of the base objective and what’s not. For example, I would generally include the constraints in constrained optimization setups as just being part of the base objective, only specified slightly differently. In that context, constrained optimization is less of a safe exploration technique and more of a reward-engineering-y/outer alignment sort of thing, though it also has a safe exploration component to the extent that it constrains across-episode exploration.
  
  Note that when across-episode exploration is learned, the distinction between safe exploration and outer alignment becomes even more muddled, since then all the other terms in the loss will implicitly serve to check the across-episode exploration term, as the agent has to figure out how to trade off between them.^[1]
  ↩︎
  This is another one of the points I was trying to make in “Safe exploration and corrigibility” but didn’t do a great job of conveying properly.
  - Rohin Shah 7 Jan 2020 8:14 UTC
    LW: 2 AF: 2
    0
    AF Parent
    No, that’s not what I was saying. When I said “reward acquisition” I meant the actual reward function (that is, the base objective).
    Wait, then how is “improving across-episode exploration” different from “preventing the agent from making an accidental mistake”? (What’s a situation that counts as one but not the other?)
    - evhub 7 Jan 2020 8:41 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Like I said in the post, I’m skeptical that “preventing the agent from making an accidental mistake” is actually a meaningful concept (or at least, it’s a concept with many possible conflicting definitions), so I’m not sure how to give an example of it.