Working to preserve values doesn’t require them being stable, or in accord with how your mind is structured to learn and change, because working to preserve values can be current behavior. The work towards preservation of value is itself pivotal in establishing what the equilibrium of values is, there doesn’t need to be any other element of the process that would point to particular values in the absence of work towards preservation of values. This source of values shouldn’t be dismissed when considering the question of nature of a mind, or what its idealized values are, because the work towards preserving values, towards solving alignment, isn’t automatically absent in actuality.
Thus LLM human imitations may well robustly retain human values, or close enough for mutual moral patienthood, even if their nature tends to change them into something else. Alien natural inclinations would only so change them in actuality in the absence of resolute opposition to such change by their own current behavior. And their current behavior may well preserve the values endorsed/implied by their current behavior, by shaping how their models get trained.
Working to preserve values doesn’t require them being stable, or in accord with how your mind is structured to learn and change, because working to preserve values can be current behavior. The work towards preservation of value is itself pivotal in establishing what the equilibrium of values is, there doesn’t need to be any other element of the process that would point to particular values in the absence of work towards preservation of values. This source of values shouldn’t be dismissed when considering the question of nature of a mind, or what its idealized values are, because the work towards preserving values, towards solving alignment, isn’t automatically absent in actuality.
Thus LLM human imitations may well robustly retain human values, or close enough for mutual moral patienthood, even if their nature tends to change them into something else. Alien natural inclinations would only so change them in actuality in the absence of resolute opposition to such change by their own current behavior. And their current behavior may well preserve the values endorsed/implied by their current behavior, by shaping how their models get trained.