I don’t know the answer to this, but strong upvoted because I think this question, and variants like “is anyone working on ensuring AI labs don’t sign-flip parts of the reward function” and equally silly things, are important.
I don’t know the answer to this, but strong upvoted because I think this question, and variants like “is anyone working on ensuring AI labs don’t sign-flip parts of the reward function” and equally silly things, are important.