Rohin Shah comments on AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatism

Rohin Shah 21 Sep 2021 7:51 UTC
LW: 4 AF: 4
AF
Do you know yet how your approach would differ from applying inverse RL (e.g. MaxCausalEnt IRL)?
If you don’t want to assume full-blown demonstrations (where you actually reach the goal), you can still combine a reward function learned from IRL with a specification of the goal. That’s effectively what we did in Preferences Implicit in the State of the World.
(The combination there isn’t very principled; a more principled version would use a CIRL-style setup, which is discussed in Section 8.6 of my thesis.)
- Stuart_Armstrong 8 Oct 2021 14:28 UTC
  LW: 4 AF: 4
  AF Parent
  Those are very relevant to this project, thanks. I want to see how far we can push these approaches; maybe some people you know would like to take part?
  - Rohin Shah 9 Oct 2021 13:28 UTC
    LW: 4 AF: 3
    AF Parent
    Hmm, you might want to reach out to CHAI folks, though I don’t have a specific person in mind at the moment. (I myself am working on different things now.)
    - Stuart_Armstrong 11 Oct 2021 9:30 UTC
      LW: 4 AF: 4
      AF Parent
      Cool, thanks; already in contact with them.