I don’t see any strong reason to think that reflection alone will naturally lead all or most humans to the same place, especially given that the reflection process is underspecified.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
In retrospect, a big implicit source of disagreement on stuff like CEV is I don’t really believe this at all, and for my money, I put a lot of probability that they don’t get the same answer on all the high-stakes easy questions, and I think the crux might be that I disagree with this statement:
“but for most people (partly because almost everyone cares a lot about everyone else’s preferences)”,
and instead think most people have relatively selfish preferences, and while lots of people do care for others, a lot of them don’t, and the extrapolated values are likely to diverge, not converge.
I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
I think ethics is totally arbitrary in the sense that it’s just ‘what people happened to evolve’, but I don’t think it’s that complex or heterogeneous from the perspective of a superintelligence. There’s a limit to how much load-bearing complexity a human brain can even fit.
In retrospect, a big implicit source of disagreement on stuff like CEV is I don’t really believe this at all, and for my money, I put a lot of probability that they don’t get the same answer on all the high-stakes easy questions, and I think the crux might be that I disagree with this statement:
and instead think most people have relatively selfish preferences, and while lots of people do care for others, a lot of them don’t, and the extrapolated values are likely to diverge, not converge.