Yeah, it’s hard to say whether the weights would be negative. As an extreme case, if there was someone who wanted to cause as much suffering as possible, then if that person was really smart, we might gain insight into how to reduce suffering by flipping around the policies he advocated. If someone wants you to get a perfect zero score on a binary multiple-choice test, you can get a perfect score by flipping the answers. These cases are rare, though. Even the hypothetical suffering maximizer still has many correct beliefs, e.g., that you need to breathe air to stay alive.
Probably not.
Yeah, it’s hard to say whether the weights would be negative. As an extreme case, if there was someone who wanted to cause as much suffering as possible, then if that person was really smart, we might gain insight into how to reduce suffering by flipping around the policies he advocated. If someone wants you to get a perfect zero score on a binary multiple-choice test, you can get a perfect score by flipping the answers. These cases are rare, though. Even the hypothetical suffering maximizer still has many correct beliefs, e.g., that you need to breathe air to stay alive.