Basically, we should use the assumption that is most robust to being wrong. It would be easier if there were objective, mind independent rules of morality, called moral realism, but if that assumption is wrong, your solution can get manipulated.
So in practice, we shouldn’t try to base alignment plans on whether moral realism is correct. In other words I’d simply go with what values you have and solve the edge cases according to your values.
I feel like we’re talking past each other. I’m trying to point out the difficulty of “simply go with what values you have and solve the edge cases according to your values” as a learning problem: it is too high dimension, and you need too many case labels; part of the idea of the OP is to reduce the number of training cases required, and my question/suspicion is that it doesn’t doesn’t really help outside of the “easy” stuff.
Basically, we should use the assumption that is most robust to being wrong. It would be easier if there were objective, mind independent rules of morality, called moral realism, but if that assumption is wrong, your solution can get manipulated.
So in practice, we shouldn’t try to base alignment plans on whether moral realism is correct. In other words I’d simply go with what values you have and solve the edge cases according to your values.
I feel like we’re talking past each other. I’m trying to point out the difficulty of “simply go with what values you have and solve the edge cases according to your values” as a learning problem: it is too high dimension, and you need too many case labels; part of the idea of the OP is to reduce the number of training cases required, and my question/suspicion is that it doesn’t doesn’t really help outside of the “easy” stuff.
Yeah, I think this might be a case where we misunderstood each other.