The meta-values thing gets at the same thing that HRH is getting at. Also, I feel like fundamentally wireheading is a problem of embeddedness, and has a completely different causal story to the problem of reflective processes changing our values to be “zombified”, though they feel vaguely similar. The way I would look at this is if you are a non-embedded algorithm running in an embedded world, you are potentially susceptible to wireheading, and only if you are an embedded algorithm then you could possible have a preference that implies wanting zombification, or preferences guided by meta-values that avoid this, etc.
The meta-values thing gets at the same thing that HRH is getting at. Also, I feel like fundamentally wireheading is a problem of embeddedness, and has a completely different causal story to the problem of reflective processes changing our values to be “zombified”, though they feel vaguely similar. The way I would look at this is if you are a non-embedded algorithm running in an embedded world, you are potentially susceptible to wireheading, and only if you are an embedded algorithm then you could possible have a preference that implies wanting zombification, or preferences guided by meta-values that avoid this, etc.