It seems that our morality consists of two elements. First is bias, based on game theoretical environment of our ancestors. Humans developed complex feelings around activities that promoted inclusive genetic fitness and now we are intrinsically and authentically motivated to do them for their own sake.
There is also a limited capability for moral updates. That’s what we use to resolve contradictions in our moral intuitions. And that’s also what allow us to persuade ourselves that doing some status promoting thing is actually moral. One the one hand, the fact that it’s so easy to change our ethics due to status reasons is kind of scary. On the other, whole ability to morally update was probably developed exactly for this, so it’s kind of a miracle that we can use it differently at all.
I don’t think that we can update into genuinely feeling that maximizing paperclips for its own sake is the right thing to do. All possible human minds occupy only a small part of all possible minds space. We can consider alignment to be somwhat solved if TAI guarantee us optimization in the direction of some neighbourhood of our moral bias. However, I think it’s possible to do better and we do not need all humans to be moral philosophers for that. It will be enough if TAI itself is a perfect moral philosopher able to deduct our coherent extrapolated volition and become an optimization process in that direction.
It seems that our morality consists of two elements. First is bias, based on game theoretical environment of our ancestors. Humans developed complex feelings around activities that promoted inclusive genetic fitness and now we are intrinsically and authentically motivated to do them for their own sake.
There is also a limited capability for moral updates. That’s what we use to resolve contradictions in our moral intuitions. And that’s also what allow us to persuade ourselves that doing some status promoting thing is actually moral. One the one hand, the fact that it’s so easy to change our ethics due to status reasons is kind of scary. On the other, whole ability to morally update was probably developed exactly for this, so it’s kind of a miracle that we can use it differently at all.
I don’t think that we can update into genuinely feeling that maximizing paperclips for its own sake is the right thing to do. All possible human minds occupy only a small part of all possible minds space. We can consider alignment to be somwhat solved if TAI guarantee us optimization in the direction of some neighbourhood of our moral bias. However, I think it’s possible to do better and we do not need all humans to be moral philosophers for that. It will be enough if TAI itself is a perfect moral philosopher able to deduct our coherent extrapolated volition and become an optimization process in that direction.