Charlie Steiner comments on Draft papers for REALab and Decoupled Approval on tampering

Charlie Steiner 6 Nov 2020 19:39 UTC
LW: 3 AF: 2
AF
Very interesting. Naturalizing feedback (as opposed to directly accessing True Reward) seems like it could lead to a lot of desirable emergent behaviors, though I’m somewhat nervous about reliance on a handwritten model of what reliable feedback is.