Analogy to humans feels like generalizing from one example to me. My prior is that minds evolved under different circumstances will have different desires, so we shouldn’t expect an AI to robustly share any specific human value unless we can explain exactly how it develops that value.
But that aside, would you agree that if this were true, alignment should be fairly easy, because we just need to amplify the degree of caring?
Analogy to humans feels like generalizing from one example to me. My prior is that minds evolved under different circumstances will have different desires, so we shouldn’t expect an AI to robustly share any specific human value unless we can explain exactly how it develops that value.
But that aside, would you agree that if this were true, alignment should be fairly easy, because we just need to amplify the degree of caring?