Does the above derivation mean that values are anthropocentric? Maybe kind of. I’m deriving only an architectural claim: Bounded evolved agents compress their control objectives through a low-bandwidth interface. Humans are one instance. AI’s are different. They are designed and any evolutionary pressure on the architecture is not on anything value-like. If an AI has no such bottleneck, inferring and stabilizing its ‘values’ may be strictly harder. If it has one, it depends on its structure. Alignment might generalize, but not necessarily to human-compatible values.
Does the above derivation mean that values are anthropocentric? Maybe kind of. I’m deriving only an architectural claim: Bounded evolved agents compress their control objectives through a low-bandwidth interface. Humans are one instance. AI’s are different. They are designed and any evolutionary pressure on the architecture is not on anything value-like. If an AI has no such bottleneck, inferring and stabilizing its ‘values’ may be strictly harder. If it has one, it depends on its structure. Alignment might generalize, but not necessarily to human-compatible values.