What “human values” gesture at is distinction from values-in-general, while “preferences” might be about arbitrary values.
I don’t understand what this means.
Taking current wishes/wants/beliefs as the meaning of “preferences” or “values” (denying further development of values/preferences as part of the concept) is similarly misleading as taking “moral goodness” as meaning anything in particular that’s currently legible, because the things that are currently legible are not where potential development of values/preferences would end up in the limit.
Is your point here that “values” and “preferences” are based on what you would decide to prefer after some amount of thinking/reflection? If yes, my point is that this should be stated explicitly in discussions, for example like “here I am discussing the preferences you, the reader, would have, after thinking for many hours.”
If you want to additionally claim that these preferences are tied to moral obligation, this should also be stated explicitly.
Stating things explicitly is a tradeoff that must be decided on success or failure in conveying the intended point, not by stricture of form.
By “human values” being distinct from arbitrary values I simply mean that anything called “human values” is less likely to be literal paperclipping than values-in-general, it’s suggesting a distribution over values that’s human-specific in some way. By “preferences” also gesturing at their further development on reflection I’m pointing out that this is a strong possibility for what the term might mean, so unless a clarification rules it out, it remains a possible intended meaning. (More specifically, I meant the whole process of potential ways of developing values/preferences, not some good-enough end-point, so not just thinking for many hours, but also not disregarding current wishes/wants/beliefs, as they too are part of this process.)
I don’t understand what this means.
Is your point here that “values” and “preferences” are based on what you would decide to prefer after some amount of thinking/reflection? If yes, my point is that this should be stated explicitly in discussions, for example like “here I am discussing the preferences you, the reader, would have, after thinking for many hours.”
If you want to additionally claim that these preferences are tied to moral obligation, this should also be stated explicitly.
Stating things explicitly is a tradeoff that must be decided on success or failure in conveying the intended point, not by stricture of form.
By “human values” being distinct from arbitrary values I simply mean that anything called “human values” is less likely to be literal paperclipping than values-in-general, it’s suggesting a distribution over values that’s human-specific in some way. By “preferences” also gesturing at their further development on reflection I’m pointing out that this is a strong possibility for what the term might mean, so unless a clarification rules it out, it remains a possible intended meaning. (More specifically, I meant the whole process of potential ways of developing values/preferences, not some good-enough end-point, so not just thinking for many hours, but also not disregarding current wishes/wants/beliefs, as they too are part of this process.)