Rohin Shah comments on Communication Prior as Alignment Strategy

Rohin Shah 14 Nov 2020 17:11 UTC
LW: 4 AF: 3
AF
If it were, then one of our first messages would be (a mathematical version of) “the behavior I want is approximately reward-maximizing”.
Yeah, I agree that if we had a space of messages that was expressive enough to encode this, then it would be fine to work in behavior space.