I think there’s more or less a ‘best way’ to extrapolate a human’s preferences (like, a way or meta-way we would and should endorse the most, after considering tons of different ways to extrapolate), and this will get different answers depending on who you extrapolate from, but for most people (partly because almost everyone cares a lot about everyone else’s preferences), you get the same answer on all the high-stakes easy questions.
Where by ‘easy questions’ I mean the kinds of things we care about today—very simple, close-to-the-joints-of-nature questions like ‘shall we avoid causing serious physical damage to chickens?’ that aren’t about entities that have been pushed into weird extreme states by superintelligent optimization. :)
In retrospect, a big implicit source of disagreement on stuff like CEV is I don’t really believe this at all, and for my money, I put a lot of probability that they don’t get the same answer on all the high-stakes easy questions, and I think the crux might be that I disagree with this statement:
“but for most people (partly because almost everyone cares a lot about everyone else’s preferences)”,
and instead think most people have relatively selfish preferences, and while lots of people do care for others, a lot of them don’t, and the extrapolated values are likely to diverge, not converge.
In retrospect, a big implicit source of disagreement on stuff like CEV is I don’t really believe this at all, and for my money, I put a lot of probability that they don’t get the same answer on all the high-stakes easy questions, and I think the crux might be that I disagree with this statement:
and instead think most people have relatively selfish preferences, and while lots of people do care for others, a lot of them don’t, and the extrapolated values are likely to diverge, not converge.