The problem here is that, as evidenced by SL4 list posts, Phil is serious.
So basically, there is some super-morality or super-goal or something that is “better” by some standard than what humans have. Let’s call it woogah. Phil is worried because we’re going to make FAI that can’t possibly learn/reach/achieve/understand woogah because it’s based on human values.
As far as I can see, there are three options here:
Phil values woogah, which means it’s included in the space of human values, which means there’s no problem.
Phil does not value woogah, in which case we wouldn’t be having this discussion because he wouldn’t be worried about it.
Phil thinks that there’s some sort of fundamental/universal morality that makes woogah better than human values, even though woogah can’t be reached from a human perspective, at all, ever. This is perhaps the most interesting option, except that there’s no evidence, anywhere, that such a thing might exist. The is-ought problem does not appear to be solvable; we have only our own preferences out of which to make the future, because that’s all we have and all we can have. We could create a mind that doesn’t have our values, but the important question is: what would it have instead?
There is a fourth option: the “safe” set of values can be misaligned with humans’ actual values. Some values that humans have are either not listed in the “safe” set of values, or something in the safe set of values would not quite align with what it was trying to represent.
As a specific example, consider how a human might have defined values a few centuries ago.”Hmm, what value system should we build our society on? Aha! The seven heavenly virtues! Every utopian society must encourage chastity, temperance, charity, diligence, patience, kindness, and humility!”. Then, later, someone tries to put happiness somewhere in the list. However, since this was not put into the constrained optimization function it becomes a challenge to optimize for it.
This is NOT something that would only happen in the past. If an AI based it’s values today on what the majority agrees is a good idea, things like marijuana would be banned and survival would be replaced by “security” or something else slightly wrong.
The problem here is that, as evidenced by SL4 list posts, Phil is serious.
So basically, there is some super-morality or super-goal or something that is “better” by some standard than what humans have. Let’s call it woogah. Phil is worried because we’re going to make FAI that can’t possibly learn/reach/achieve/understand woogah because it’s based on human values.
As far as I can see, there are three options here:
Phil values woogah, which means it’s included in the space of human values, which means there’s no problem.
Phil does not value woogah, in which case we wouldn’t be having this discussion because he wouldn’t be worried about it.
Phil thinks that there’s some sort of fundamental/universal morality that makes woogah better than human values, even though woogah can’t be reached from a human perspective, at all, ever. This is perhaps the most interesting option, except that there’s no evidence, anywhere, that such a thing might exist. The is-ought problem does not appear to be solvable; we have only our own preferences out of which to make the future, because that’s all we have and all we can have. We could create a mind that doesn’t have our values, but the important question is: what would it have instead?
-Robin
There is a fourth option: the “safe” set of values can be misaligned with humans’ actual values. Some values that humans have are either not listed in the “safe” set of values, or something in the safe set of values would not quite align with what it was trying to represent.
As a specific example, consider how a human might have defined values a few centuries ago.”Hmm, what value system should we build our society on? Aha! The seven heavenly virtues! Every utopian society must encourage chastity, temperance, charity, diligence, patience, kindness, and humility!”. Then, later, someone tries to put happiness somewhere in the list. However, since this was not put into the constrained optimization function it becomes a challenge to optimize for it.
This is NOT something that would only happen in the past. If an AI based it’s values today on what the majority agrees is a good idea, things like marijuana would be banned and survival would be replaced by “security” or something else slightly wrong.