“You will find yourself saying, “If I wanted to kill someone—even if I thought it was right to kill someone—that wouldn’t make it right.” Why? Because what is right is a huge computational property- an abstract computation—not tied to the state of anyone’s brain, including your own brain.”
Coherent Extrapolated Volition (or any roughly similar system) protects against this failure for any specific human, but not in general. Eg., suppose that you use various lawmaking processes to approximate Right(x), and then one person tries to decide independently that Right(Murder) > 0. You can detect the mismatch between the person’s actions and Right(x) by checking against the approximation (the legal code) and finding that murder is wrong. In the limit of the approximation, you can detect even mismatches that people at the time wouldn’t notice (eg., slavery). CEV also protects against specific kinds of group failures, eg., convince everybody that the Christian God exists and that the Bible is literally accurate, and CEV will correct for it by replacing the false belief of “God is real” with the true belief of “God is imaginary”, and then extrapolating the consequences.
However, CEV can’t protect against features of human cognitive architecture that are consistent under reflection, factual accuracy, etc. Suppose that, tomorrow, you used magical powers to rewrite large portions of everyone’s brain. You would expect that people now take actions with lower values of Right(x) than they previously did. But, now, there’s no way to determine the value of anything under Right(x) as we currently understand it. You can’t use previous records (these have all been changed, by act of magic), and you can’t use human intuition (as it too has been changed). So while the external Right(x) still exists somewhere out in thingspace, it’s a moot point, as nobody can access it. This wouldn’t work for, say, arithmetic, as people would rapidly discover that assuming 2 + 2 = 5 in engineering calculations makes bridges fall down.
This looks correct to me. CEV(my_morality) = CEV(your_morality) = CEV(yudkowsky_morality), because psychologically normal humans all have different extrapolations of the same basic moral fundamentals. We’ve all been handed the same moral foundation by evolution, unless we are mentally damaged in certain very specific ways.
However, CEV(human_morality) ≠ CEV(klingon_morality) ≠ CEV(idiran_morality). There’s no reason for morality to be generalizable beyond psychologically normal humans, since any other species would have been handed at least moderately different moral foundations, even if there happened to be some convergent evolution or something.
“You will find yourself saying, “If I wanted to kill someone—even if I thought it was right to kill someone—that wouldn’t make it right.” Why? Because what is right is a huge computational property- an abstract computation—not tied to the state of anyone’s brain, including your own brain.”
Coherent Extrapolated Volition (or any roughly similar system) protects against this failure for any specific human, but not in general. Eg., suppose that you use various lawmaking processes to approximate Right(x), and then one person tries to decide independently that Right(Murder) > 0. You can detect the mismatch between the person’s actions and Right(x) by checking against the approximation (the legal code) and finding that murder is wrong. In the limit of the approximation, you can detect even mismatches that people at the time wouldn’t notice (eg., slavery). CEV also protects against specific kinds of group failures, eg., convince everybody that the Christian God exists and that the Bible is literally accurate, and CEV will correct for it by replacing the false belief of “God is real” with the true belief of “God is imaginary”, and then extrapolating the consequences.
However, CEV can’t protect against features of human cognitive architecture that are consistent under reflection, factual accuracy, etc. Suppose that, tomorrow, you used magical powers to rewrite large portions of everyone’s brain. You would expect that people now take actions with lower values of Right(x) than they previously did. But, now, there’s no way to determine the value of anything under Right(x) as we currently understand it. You can’t use previous records (these have all been changed, by act of magic), and you can’t use human intuition (as it too has been changed). So while the external Right(x) still exists somewhere out in thingspace, it’s a moot point, as nobody can access it. This wouldn’t work for, say, arithmetic, as people would rapidly discover that assuming 2 + 2 = 5 in engineering calculations makes bridges fall down.
This looks correct to me. CEV(my_morality) = CEV(your_morality) = CEV(yudkowsky_morality), because psychologically normal humans all have different extrapolations of the same basic moral fundamentals. We’ve all been handed the same moral foundation by evolution, unless we are mentally damaged in certain very specific ways.
However, CEV(human_morality) ≠ CEV(klingon_morality) ≠ CEV(idiran_morality). There’s no reason for morality to be generalizable beyond psychologically normal humans, since any other species would have been handed at least moderately different moral foundations, even if there happened to be some convergent evolution or something.