If it were, then one of our first messages would be (a mathematical version of) “the behavior I want is approximately reward-maximizing”.
Yeah, I agree that if we had a space of messages that was expressive enough to encode this, then it would be fine to work in behavior space.
Yeah, I agree that if we had a space of messages that was expressive enough to encode this, then it would be fine to work in behavior space.