people including alignment researchers just seem more confident about their own preferred solution to metaethics, and comfortable assuming their own preferred solution is correct as part of solving other problems, like AI alignment or strategy. (E.g., moral anti-realism is true, therefore empowering humans in straightforward ways is fine as the alignment target can’t be wrong about their own values.)
Obviously committed anti-realists would be right not to worry—if they’re correct! But I agree with you, we shouldn’t be overconfident in our metaethics...which makes me wonder, do you really think metaethics can be “solved?”
Secondly, even if it were solved (and to avoid the anti-realist apathy, let’s assume moral realism is true), how do you think that would help with alignment? Couldn’t the alignment-target simply say, “this is true, but I don’t care, as it doesn’t help me achieve my goals?” Saying “1+1=2, but I’m going to act as if it equals 3″ might keep you from achieving your goal. Saying, “stealing is wrong, but I would really like to have X” might actually help you achieve your goal.
Obviously committed anti-realists would be right not to worry—if they’re correct! But I agree with you, we shouldn’t be overconfident in our metaethics...which makes me wonder, do you really think metaethics can be “solved?”
Secondly, even if it were solved (and to avoid the anti-realist apathy, let’s assume moral realism is true), how do you think that would help with alignment? Couldn’t the alignment-target simply say, “this is true, but I don’t care, as it doesn’t help me achieve my goals?” Saying “1+1=2, but I’m going to act as if it equals 3″ might keep you from achieving your goal. Saying, “stealing is wrong, but I would really like to have X” might actually help you achieve your goal.