I agree that “values” are useful way describe human behaviour, but to think that they actually exist inside human brain is sort of mind projection fallacy.
Most people will near halt if we directly ask them about their final goals: they don’t know.
Psychologists deduce “human values” from a) human actions b ) claims of the person about his preferences. Actions and claims are often misaligned.
It all surely makes value alignment with AI more difficult: if humans don’t have exact values, what should be aligned with?
I agree that “values” are useful way describe human behaviour, but to think that they actually exist inside human brain is sort of mind projection fallacy.
Most people will near halt if we directly ask them about their final goals: they don’t know.
Psychologists deduce “human values” from a) human actions b ) claims of the person about his preferences. Actions and claims are often misaligned.
It all surely makes value alignment with AI more difficult: if humans don’t have exact values, what should be aligned with?