That evolution treating obviously instrumental goals as terminal in the construction of humans is confusing to humans is part of my point: humans are often confused about this distinction, and having a body that is confused about this is one of the reasons. But it’s not an essential element, so let’s lay it aside.
So, let’s go with a more complex example. Most Christians have a terminal goal of becoming a better Christian (or at least, they say that it isn’t an instrumental goal of not wanting to go to Hell). That’s a terminal goal of adjusting your terminal goal structure to better fit a specific pattern. That’s, well, astonishingly similar to what Value Learning is trying to achieve. This is not an uncommon pattern, you can find it in basically every religion (often along with a backup reason to make it an instrumental goal of not wanting to be punished in some way). In fact, Richard Dawkins would probably argue that this was a necessary feature of a religion — but then he considers religions to be self-propagating memetic parasites of the human mind, and it that framework, it looks like a rather necessary feature. Regardless of that, the fact that this is not just possible, but common enough that most religious people, i.e. most people in the world, have at least a mild version of it tells us something about humanity.
On AIXI: yes, I was implicitly assuming an AIXI smart enough to realize that it was in fact embedded, or at least that there exists a causal path from messing with certain wires in its braincase to its future goal function and thus behavior. This seems a rather plausible assumption to me, but it does require that AIXI has learned a world model complex enough to start reliably making predictions like that. Having other AIXIs available to do experimental brain surgery on, or observe the effects of an iron bar accidentally passing through their braincase in different locations, seems likely to be helpful to obtaining evidence that would cause those particular Bayesian updates.
That evolution treating obviously instrumental goals as terminal in the construction of humans is confusing to humans is part of my point: humans are often confused about this distinction, and having a body that is confused about this is one of the reasons. But it’s not an essential element, so let’s lay it aside.
So, let’s go with a more complex example. Most Christians have a terminal goal of becoming a better Christian (or at least, they say that it isn’t an instrumental goal of not wanting to go to Hell). That’s a terminal goal of adjusting your terminal goal structure to better fit a specific pattern. That’s, well, astonishingly similar to what Value Learning is trying to achieve. This is not an uncommon pattern, you can find it in basically every religion (often along with a backup reason to make it an instrumental goal of not wanting to be punished in some way). In fact, Richard Dawkins would probably argue that this was a necessary feature of a religion — but then he considers religions to be self-propagating memetic parasites of the human mind, and it that framework, it looks like a rather necessary feature. Regardless of that, the fact that this is not just possible, but common enough that most religious people, i.e. most people in the world, have at least a mild version of it tells us something about humanity.
On AIXI: yes, I was implicitly assuming an AIXI smart enough to realize that it was in fact embedded, or at least that there exists a causal path from messing with certain wires in its braincase to its future goal function and thus behavior. This seems a rather plausible assumption to me, but it does require that AIXI has learned a world model complex enough to start reliably making predictions like that. Having other AIXIs available to do experimental brain surgery on, or observe the effects of an iron bar accidentally passing through their braincase in different locations, seems likely to be helpful to obtaining evidence that would cause those particular Bayesian updates.