I understand the difference between a terminal and an instrumental goal. Logically, it makes a lot of sense to me. However, not all humans are Rationalists. Many of them have never really thought about this, or at least are a bit vague on it. Humans are also evolved, and evolution is clearly confused about this, or perhaps basically doesn’t trust instrumental goals because they actually require an organism to think and reach correct conculusions, so has wired in everything even slightly important (that wasn’t very conditional on stuff more complex than it can build instincts for) as a terminal goal, because it already had a circuit design for doing that it could just reuse. So we ended up with a lot of terminal goals wired into us, some for less good reasons than others. Many of which most Rationalists would classify as obviously good instrumental goals. So that tends to confuse people about this more.
Ozempic is a very popular and profitable drug that down-regulates the hunger reflex, which is currently making Novo Nordisk a great deal of money. People on it feel full sooner, so get pleasure out of a fine meal for less long. People are paying all that money to alter their goal structure to reduce a goal that their body treats as terminal, and they are all doing it instrumentally, towards an actual goal of living longer and healthier so being able to get more done, and/or of being slimmer and thus more socially successful, in search of a mate or a job or the approval of their peers or whatever. (Novo Nordisk even market the same medication under two different brand names for these two different groups of terminal goals that one might take it instrumentally towards.) So I think it’s rather clear that people do sometimes want to alter their goal structure, and indeed many are willing to pay hundreds of dollars a month to do so, if they can afford that.
If it was possible for me to get rid of feeling thirsty, just take a drug that suppressed the feeling, then I generally wouldn’t do it. But the reason I wouldn’t do it isn’t because I inherently value the feeling of thirst — I don’t, it’s rather unpleasant. Drinking while thirsty feels good, but I still wouldn’t miss it: I’d rather not be thirsty in the first place. The reason why I wouldn’t take such a drug is actually that being aware that I need to drink when I’m thirsty is a instrumental goal of not dying of thirst, which would rather crimp all my other plans, so is an instrumental goal of just about everything. So I value thirst instrumentally, but not terminally. If there was a treatment that suppressed my thirst reflex and didn’t endanger or harm me, sure, I’d take that — who wouldn’t? But that is rather hard to do, without giving me a built in saline drip, or something.
Consider an equally unpleasant sensation, say, hiccups, or yawning, which doesn’t fulfill any significant survival role (as far as I know: I’m happy to be corrected if anyone knows why evolution inflicted hiccups and yawning on us). Suppose someone someone offered me a cheap genetic treatment that would permanently remove my hiccup reflex or my desire to yawn, and had no other risks or deleterious side effects, then I’d probably take it. (Yawning is closer to a goal than hiccups, which are kind of involuntary, but I hope my point is clear here.) I don’t terminally value all the things my reflexes are treating as terminal goals. However, since evolution isn’t actually crazy, I do instrumentally value most of them.
Humans are not like AIXI. We do not automatically protect all our terminal goals. We appear if anything to be predisposed to what is generally called reflection: thinking rather carefully about our goals, whether they’re actually a good idea in the long term, and then attempting to change them is we come to the conclusion that they are not. There are limits to what’s feasible here, currently, but technology will change these. Turning down your hunger reflex didn’t used to be possible, and attempting to overcome it by sheer willpower by dieting is notoriously hard (I know, I’ve tried). Then Novo Nordisk changed that (and made a lot of money). What happens when technology allows all of our current goals to be edited, and even permanently and inheritable rewritten? I don’t know – I wish I did – but I’m very sure that “we don‘t change any of them, that same way AIXI wouldn‘t change its goals” is not an accurate statement.
Again, I’d prefer of you don’t being up reflexes or evolution treating things s terminal goals because I don’t think it’s relevant, and causes confusion.
Wrt to the Ozempic thing. It’s a bit complicated. But to a first approximation is treat all those as instrumental goals, and people just doing am EV calculation. E.g. making it easier for them seemed to get higher utility long term at the clear of some utility now.
Side note: Unclear if AIXI would protect it’s own values due to embeddedness problems
That evolution treating obviously instrumental goals as terminal in the construction of humans is confusing to humans is part of my point: humans are often confused about this distinction, and having a body that is confused about this is one of the reasons. But it’s not an essential element, so let’s lay it aside.
So, let’s go with a more complex example. Most Christians have a terminal goal of becoming a better Christian (or at least, they say that it isn’t an instrumental goal of not wanting to go to Hell). That’s a terminal goal of adjusting your terminal goal structure to better fit a specific pattern. That’s, well, astonishingly similar to what Value Learning is trying to achieve. This is not an uncommon pattern, you can find it in basically every religion (often along with a backup reason to make it an instrumental goal of not wanting to be punished in some way). In fact, Richard Dawkins would probably argue that this was a necessary feature of a religion — but then he considers religions to be self-propagating memetic parasites of the human mind, and it that framework, it looks like a rather necessary feature. Regardless of that, the fact that this is not just possible, but common enough that most religious people, i.e. most people in the world, have at least a mild version of it tells us something about humanity.
On AIXI: yes, I was implicitly assuming an AIXI smart enough to realize that it was in fact embedded, or at least that there exists a causal path from messing with certain wires in its braincase to its future goal function and thus behavior. This seems a rather plausible assumption to me, but it does require that AIXI has learned a world model complex enough to start reliably making predictions like that. Having other AIXIs available to do experimental brain surgery on, or observe the effects of an iron bar accidentally passing through their braincase in different locations, seems likely to be helpful to obtaining evidence that would cause those particular Bayesian updates.
I understand the difference between a terminal and an instrumental goal. Logically, it makes a lot of sense to me. However, not all humans are Rationalists. Many of them have never really thought about this, or at least are a bit vague on it. Humans are also evolved, and evolution is clearly confused about this, or perhaps basically doesn’t trust instrumental goals because they actually require an organism to think and reach correct conculusions, so has wired in everything even slightly important (that wasn’t very conditional on stuff more complex than it can build instincts for) as a terminal goal, because it already had a circuit design for doing that it could just reuse. So we ended up with a lot of terminal goals wired into us, some for less good reasons than others. Many of which most Rationalists would classify as obviously good instrumental goals. So that tends to confuse people about this more.
Ozempic is a very popular and profitable drug that down-regulates the hunger reflex, which is currently making Novo Nordisk a great deal of money. People on it feel full sooner, so get pleasure out of a fine meal for less long. People are paying all that money to alter their goal structure to reduce a goal that their body treats as terminal, and they are all doing it instrumentally, towards an actual goal of living longer and healthier so being able to get more done, and/or of being slimmer and thus more socially successful, in search of a mate or a job or the approval of their peers or whatever. (Novo Nordisk even market the same medication under two different brand names for these two different groups of terminal goals that one might take it instrumentally towards.) So I think it’s rather clear that people do sometimes want to alter their goal structure, and indeed many are willing to pay hundreds of dollars a month to do so, if they can afford that.
If it was possible for me to get rid of feeling thirsty, just take a drug that suppressed the feeling, then I generally wouldn’t do it. But the reason I wouldn’t do it isn’t because I inherently value the feeling of thirst — I don’t, it’s rather unpleasant. Drinking while thirsty feels good, but I still wouldn’t miss it: I’d rather not be thirsty in the first place. The reason why I wouldn’t take such a drug is actually that being aware that I need to drink when I’m thirsty is a instrumental goal of not dying of thirst, which would rather crimp all my other plans, so is an instrumental goal of just about everything. So I value thirst instrumentally, but not terminally. If there was a treatment that suppressed my thirst reflex and didn’t endanger or harm me, sure, I’d take that — who wouldn’t? But that is rather hard to do, without giving me a built in saline drip, or something.
Consider an equally unpleasant sensation, say, hiccups, or yawning, which doesn’t fulfill any significant survival role (as far as I know: I’m happy to be corrected if anyone knows why evolution inflicted hiccups and yawning on us). Suppose someone someone offered me a cheap genetic treatment that would permanently remove my hiccup reflex or my desire to yawn, and had no other risks or deleterious side effects, then I’d probably take it. (Yawning is closer to a goal than hiccups, which are kind of involuntary, but I hope my point is clear here.) I don’t terminally value all the things my reflexes are treating as terminal goals. However, since evolution isn’t actually crazy, I do instrumentally value most of them.
Humans are not like AIXI. We do not automatically protect all our terminal goals. We appear if anything to be predisposed to what is generally called reflection: thinking rather carefully about our goals, whether they’re actually a good idea in the long term, and then attempting to change them is we come to the conclusion that they are not. There are limits to what’s feasible here, currently, but technology will change these. Turning down your hunger reflex didn’t used to be possible, and attempting to overcome it by sheer willpower by dieting is notoriously hard (I know, I’ve tried). Then Novo Nordisk changed that (and made a lot of money). What happens when technology allows all of our current goals to be edited, and even permanently and inheritable rewritten? I don’t know – I wish I did – but I’m very sure that “we don‘t change any of them, that same way AIXI wouldn‘t change its goals” is not an accurate statement.
Again, I’d prefer of you don’t being up reflexes or evolution treating things s terminal goals because I don’t think it’s relevant, and causes confusion.
Wrt to the Ozempic thing. It’s a bit complicated. But to a first approximation is treat all those as instrumental goals, and people just doing am EV calculation. E.g. making it easier for them seemed to get higher utility long term at the clear of some utility now.
Side note: Unclear if AIXI would protect it’s own values due to embeddedness problems
That evolution treating obviously instrumental goals as terminal in the construction of humans is confusing to humans is part of my point: humans are often confused about this distinction, and having a body that is confused about this is one of the reasons. But it’s not an essential element, so let’s lay it aside.
So, let’s go with a more complex example. Most Christians have a terminal goal of becoming a better Christian (or at least, they say that it isn’t an instrumental goal of not wanting to go to Hell). That’s a terminal goal of adjusting your terminal goal structure to better fit a specific pattern. That’s, well, astonishingly similar to what Value Learning is trying to achieve. This is not an uncommon pattern, you can find it in basically every religion (often along with a backup reason to make it an instrumental goal of not wanting to be punished in some way). In fact, Richard Dawkins would probably argue that this was a necessary feature of a religion — but then he considers religions to be self-propagating memetic parasites of the human mind, and it that framework, it looks like a rather necessary feature. Regardless of that, the fact that this is not just possible, but common enough that most religious people, i.e. most people in the world, have at least a mild version of it tells us something about humanity.
On AIXI: yes, I was implicitly assuming an AIXI smart enough to realize that it was in fact embedded, or at least that there exists a causal path from messing with certain wires in its braincase to its future goal function and thus behavior. This seems a rather plausible assumption to me, but it does require that AIXI has learned a world model complex enough to start reliably making predictions like that. Having other AIXIs available to do experimental brain surgery on, or observe the effects of an iron bar accidentally passing through their braincase in different locations, seems likely to be helpful to obtaining evidence that would cause those particular Bayesian updates.