This seems to be missing what I see as the strongest argument for “utopia”: most of what we think of as “bad values” in humans comes from objective mistakes in reasoning about the world and about moral philosophy, rather than from a part of us that is orthogonal to such reasoning in a paperclip-maximizer-like way, and future reflection can be expected to correct those mistakes.
future reflection can be expected to correct those mistakes.
I’m pretty worried that this won’t happen, because these aren’t “innocent” mistakes. Copying from a comment elsewhere:
Why did the Malagasy people have such a silly belief? Why do many people have very silly beliefs today? (Among the least politically risky ones to cite, someone I’ve known for years who otherwise is intelligent and successful, currently believes, or at least believed in the recent past, that 2⁄3 of everyone will die as a result of taking the COVID vaccines.) I think the unfortunate answer is that people are motivated to or are reliably caused to have certain false beliefs, as part of the status games that they’re playing. I wrote about one such dynamic, but that’s probably not a complete account.
From another comment on why reflection might not fix the mistakes:
many people are not motivated to do “rational reflection on morality” or examine their value systems to see if they would “survive full logical and empirical information”. In fact they’re motivated to do the opposite, to protect their value systems against such reflection/examination. I’m worried that alignment researchers are not worried enough that if an alignment scheme causes the AI to just “do what the user wants”, that could cause a lock-in of crazy value systems that wouldn’t survive full logical and empirical information.
One crucial question is, assuming AI will enable value lock-in when humans want it, will they use that as part of their signaling/status games? In other words, try to obtain higher status within their group by asking their AIs to lock in their morally relevant empirical or philosophical beliefs? A lot of people in the past used visible attempts at value lock in (constantly going to church to reinforce their beliefs, avoiding talking with any skeptics/heretics, etc.) for signaling. Will that change when real lock in becomes available?
Yeah, I’m particular worried about the second comment/last paragraph—people not actually wanting to improve their values, or only wanting to improve them in ways we think are not actually an improvement (e.g. wanting to have purer faith)
Is this making a claim about moral realism? If so, why wouldn’t it apply to a paperclip maximiser? If not, how do we distinguish between objective mistakes and value disagreements?
I interpreted steven0461 to be saying that many apparent “value disagreements” between humans turn out, upon reflection, to be disagreements about facts rather than values. It’s a classic outcome concerning differences in conflict vs. mistake theory: people are interpreted as having different values because they favor different strategies, even if everyone shares the same values.
ah yeah, so the claim is something like ‘if we think other humans have ‘bad values’, maybe in fact our values are the same and one of us is mistaken, and we’ll get less mistaken over time’?
I tend to want to split “value drift” into “change in the mapping from (possible beliefs about logical and empirical questions) to (implied values)” and “change in beliefs about logical and empirical questions”, instead of lumping both into “change in values”.
This seems to be missing what I see as the strongest argument for “utopia”: most of what we think of as “bad values” in humans comes from objective mistakes in reasoning about the world and about moral philosophy, rather than from a part of us that is orthogonal to such reasoning in a paperclip-maximizer-like way, and future reflection can be expected to correct those mistakes.
I’m pretty worried that this won’t happen, because these aren’t “innocent” mistakes. Copying from a comment elsewhere:
From another comment on why reflection might not fix the mistakes:
One crucial question is, assuming AI will enable value lock-in when humans want it, will they use that as part of their signaling/status games? In other words, try to obtain higher status within their group by asking their AIs to lock in their morally relevant empirical or philosophical beliefs? A lot of people in the past used visible attempts at value lock in (constantly going to church to reinforce their beliefs, avoiding talking with any skeptics/heretics, etc.) for signaling. Will that change when real lock in becomes available?
Yeah, I’m particular worried about the second comment/last paragraph—people not actually wanting to improve their values, or only wanting to improve them in ways we think are not actually an improvement (e.g. wanting to have purer faith)
Is this making a claim about moral realism? If so, why wouldn’t it apply to a paperclip maximiser? If not, how do we distinguish between objective mistakes and value disagreements?
I interpreted steven0461 to be saying that many apparent “value disagreements” between humans turn out, upon reflection, to be disagreements about facts rather than values. It’s a classic outcome concerning differences in conflict vs. mistake theory: people are interpreted as having different values because they favor different strategies, even if everyone shares the same values.
ah yeah, so the claim is something like ‘if we think other humans have ‘bad values’, maybe in fact our values are the same and one of us is mistaken, and we’ll get less mistaken over time’?
I guess I was kind of subsuming this into ‘benevolent values have become more common’
I tend to want to split “value drift” into “change in the mapping from (possible beliefs about logical and empirical questions) to (implied values)” and “change in beliefs about logical and empirical questions”, instead of lumping both into “change in values”.
Could the same be also true about most “good values”? Maybe people just makes mistakes about almost everything.