When I say “human values” I mean the values of all individual humans mixed together into some aggregate utility function. And when I talk about the values of an individual human I mean something similar, but not quite the same as when you say
When I say “human values”, I mean a very large amount of information that, combined, would let someone (such as an ASI) predict the preference ordering of an individual human over possible world-state outcomes, so as to be able to predict what they want (and then potentially aggregate this across many people, presumably all humans)
I think the values of an individual human should be thought of either as a preference ordering over world-histories (including the future), or maybe more intuitively as preference orderings over world-states, if that human knew all the relevant facts (or just all the facts if ‘relevant’ make the characterization problematic in your mind).
I think this is an important distinction, because it makes the separation between terminal and instrumental values more clear. Its kind of similar to value-functions and reward-functions in reinforcement learning.
eg.
However, I’m deeply unclear on what, out of human values as I define them above, is actually a terminal goal or an instrumental goal, or a mix of the two. To give a rather trivial example, we have homeostatic circuits in us that try to maintain the correct level of blood glucose, salt, blood volume, etc. by giving us appropriate cravings. Our brain is wired to treat these as effectively terminal goals — they’re not conditional on anything, we continue caring about them even on our deathbed, asking for a final drink of water is not uncommon. Evolution, which is an optimizer but not a sapient agent, would (if it were a sapient agent) classify these as instrumental goals of our evolutionary fitness.
I think its clear to me that these are terminal goals. Or like, drinking water is instrumental to not feeling thirsty. I think its is unpleasant to be thirsty, so I don’t want to be thirsty. The fact that they play a instrumental role in what evolution is optimizing for doesn’t matter here. We’re talking about an individual human and what that humans want.
To me the distinction between instrumental and terminal goals is very clear.
If you have goal, ask yourself why you want to achieve that goal. And then
If answer is “There’s no reason.” or “I just want it” ⇒ terminal goal
If the answer is some other object level goal ⇒ its an instrumental goal
So for me the task of separating terminal and instrumental goals shouldn’t be that difficult. (In the sense building a rocket is “not difficult” that is..)
For the a/b and 1/2/3, I don’t think those matter very much. They’re explanations for why people have goals, and how they might change them.
But I think 1) we don’t care why we have goals, we just care about the goals themselves, and 2) we won’t want to change goals. If any of 1/2/3 lead to goals changing rapidly, that will be bad, and people will recognize that it is bad, and will not want to do them.
I understand the difference between a terminal and an instrumental goal. Logically, it makes a lot of sense to me. However, not all humans are Rationalists. Many of them have never really thought about this, or at least are a bit vague on it. Humans are also evolved, and evolution is clearly confused about this, or perhaps basically doesn’t trust instrumental goals because they actually require an organism to think and reach correct conculusions, so has wired in everything even slightly important (that wasn’t very conditional on stuff more complex than it can build instincts for) as a terminal goal, because it already had a circuit design for doing that it could just reuse. So we ended up with a lot of terminal goals wired into us, some for less good reasons than others. Many of which most Rationalists would classify as obviously good instrumental goals. So that tends to confuse people about this more.
Ozempic is a very popular and profitable drug that down-regulates the hunger reflex, which is currently making Novo Nordisk a great deal of money. People on it feel full sooner, so get pleasure out of a fine meal for less long. People are paying all that money to alter their goal structure to reduce a goal that their body treats as terminal, and they are all doing it instrumentally, towards an actual goal of living longer and healthier so being able to get more done, and/or of being slimmer and thus more socially successful, in search of a mate or a job or the approval of their peers or whatever. (Novo Nordisk even market the same medication under two different brand names for these two different groups of terminal goals that one might take it instrumentally towards.) So I think it’s rather clear that people do sometimes want to alter their goal structure, and indeed many are willing to pay hundreds of dollars a month to do so, if they can afford that.
If it was possible for me to get rid of feeling thirsty, just take a drug that suppressed the feeling, then I generally wouldn’t do it. But the reason I wouldn’t do it isn’t because I inherently value the feeling of thirst — I don’t, it’s rather unpleasant. Drinking while thirsty feels good, but I still wouldn’t miss it: I’d rather not be thirsty in the first place. The reason why I wouldn’t take such a drug is actually that being aware that I need to drink when I’m thirsty is a instrumental goal of not dying of thirst, which would rather crimp all my other plans, so is an instrumental goal of just about everything. So I value thirst instrumentally, but not terminally. If there was a treatment that suppressed my thirst reflex and didn’t endanger or harm me, sure, I’d take that — who wouldn’t? But that is rather hard to do, without giving me a built in saline drip, or something.
Consider an equally unpleasant sensation, say, hiccups, or yawning, which doesn’t fulfill any significant survival role (as far as I know: I’m happy to be corrected if anyone knows why evolution inflicted hiccups and yawning on us). Suppose someone someone offered me a cheap genetic treatment that would permanently remove my hiccup reflex or my desire to yawn, and had no other risks or deleterious side effects, then I’d probably take it. (Yawning is closer to a goal than hiccups, which are kind of involuntary, but I hope my point is clear here.) I don’t terminally value all the things my reflexes are treating as terminal goals. However, since evolution isn’t actually crazy, I do instrumentally value most of them.
Humans are not like AIXI. We do not automatically protect all our terminal goals. We appear if anything to be predisposed to what is generally called reflection: thinking rather carefully about our goals, whether they’re actually a good idea in the long term, and then attempting to change them is we come to the conclusion that they are not. There are limits to what’s feasible here, currently, but technology will change these. Turning down your hunger reflex didn’t used to be possible, and attempting to overcome it by sheer willpower by dieting is notoriously hard (I know, I’ve tried). Then Novo Nordisk changed that (and made a lot of money). What happens when technology allows all of our current goals to be edited, and even permanently and inheritable rewritten? I don’t know – I wish I did – but I’m very sure that “we don‘t change any of them, that same way AIXI wouldn‘t change its goals” is not an accurate statement.
Again, I’d prefer of you don’t being up reflexes or evolution treating things s terminal goals because I don’t think it’s relevant, and causes confusion.
Wrt to the Ozempic thing. It’s a bit complicated. But to a first approximation is treat all those as instrumental goals, and people just doing am EV calculation. E.g. making it easier for them seemed to get higher utility long term at the clear of some utility now.
Side note: Unclear if AIXI would protect it’s own values due to embeddedness problems
That evolution treating obviously instrumental goals as terminal in the construction of humans is confusing to humans is part of my point: humans are often confused about this distinction, and having a body that is confused about this is one of the reasons. But it’s not an essential element, so let’s lay it aside.
So, let’s go with a more complex example. Most Christians have a terminal goal of becoming a better Christian (or at least, they say that it isn’t an instrumental goal of not wanting to go to Hell). That’s a terminal goal of adjusting your terminal goal structure to better fit a specific pattern. That’s, well, astonishingly similar to what Value Learning is trying to achieve. This is not an uncommon pattern, you can find it in basically every religion (often along with a backup reason to make it an instrumental goal of not wanting to be punished in some way). In fact, Richard Dawkins would probably argue that this was a necessary feature of a religion — but then he considers religions to be self-propagating memetic parasites of the human mind, and it that framework, it looks like a rather necessary feature. Regardless of that, the fact that this is not just possible, but common enough that most religious people, i.e. most people in the world, have at least a mild version of it tells us something about humanity.
On AIXI: yes, I was implicitly assuming an AIXI smart enough to realize that it was in fact embedded, or at least that there exists a causal path from messing with certain wires in its braincase to its future goal function and thus behavior. This seems a rather plausible assumption to me, but it does require that AIXI has learned a world model complex enough to start reliably making predictions like that. Having other AIXIs available to do experimental brain surgery on, or observe the effects of an iron bar accidentally passing through their braincase in different locations, seems likely to be helpful to obtaining evidence that would cause those particular Bayesian updates.
When I say “human values” I mean the values of all individual humans mixed together into some aggregate utility function. And when I talk about the values of an individual human I mean something similar, but not quite the same as when you say
I think the values of an individual human should be thought of either as a preference ordering over world-histories (including the future), or maybe more intuitively as preference orderings over world-states, if that human knew all the relevant facts (or just all the facts if ‘relevant’ make the characterization problematic in your mind).
I think this is an important distinction, because it makes the separation between terminal and instrumental values more clear. Its kind of similar to value-functions and reward-functions in reinforcement learning.
eg.
I think its clear to me that these are terminal goals. Or like, drinking water is instrumental to not feeling thirsty. I think its is unpleasant to be thirsty, so I don’t want to be thirsty. The fact that they play a instrumental role in what evolution is optimizing for doesn’t matter here. We’re talking about an individual human and what that humans want.
To me the distinction between instrumental and terminal goals is very clear.
If you have goal, ask yourself why you want to achieve that goal. And then
If answer is “There’s no reason.” or “I just want it” ⇒ terminal goal
If the answer is some other object level goal ⇒ its an instrumental goal
So for me the task of separating terminal and instrumental goals shouldn’t be that difficult. (In the sense building a rocket is “not difficult” that is..)
For the a/b and 1/2/3, I don’t think those matter very much. They’re explanations for why people have goals, and how they might change them.
But I think 1) we don’t care why we have goals, we just care about the goals themselves, and 2) we won’t want to change goals. If any of 1/2/3 lead to goals changing rapidly, that will be bad, and people will recognize that it is bad, and will not want to do them.
I understand the difference between a terminal and an instrumental goal. Logically, it makes a lot of sense to me. However, not all humans are Rationalists. Many of them have never really thought about this, or at least are a bit vague on it. Humans are also evolved, and evolution is clearly confused about this, or perhaps basically doesn’t trust instrumental goals because they actually require an organism to think and reach correct conculusions, so has wired in everything even slightly important (that wasn’t very conditional on stuff more complex than it can build instincts for) as a terminal goal, because it already had a circuit design for doing that it could just reuse. So we ended up with a lot of terminal goals wired into us, some for less good reasons than others. Many of which most Rationalists would classify as obviously good instrumental goals. So that tends to confuse people about this more.
Ozempic is a very popular and profitable drug that down-regulates the hunger reflex, which is currently making Novo Nordisk a great deal of money. People on it feel full sooner, so get pleasure out of a fine meal for less long. People are paying all that money to alter their goal structure to reduce a goal that their body treats as terminal, and they are all doing it instrumentally, towards an actual goal of living longer and healthier so being able to get more done, and/or of being slimmer and thus more socially successful, in search of a mate or a job or the approval of their peers or whatever. (Novo Nordisk even market the same medication under two different brand names for these two different groups of terminal goals that one might take it instrumentally towards.) So I think it’s rather clear that people do sometimes want to alter their goal structure, and indeed many are willing to pay hundreds of dollars a month to do so, if they can afford that.
If it was possible for me to get rid of feeling thirsty, just take a drug that suppressed the feeling, then I generally wouldn’t do it. But the reason I wouldn’t do it isn’t because I inherently value the feeling of thirst — I don’t, it’s rather unpleasant. Drinking while thirsty feels good, but I still wouldn’t miss it: I’d rather not be thirsty in the first place. The reason why I wouldn’t take such a drug is actually that being aware that I need to drink when I’m thirsty is a instrumental goal of not dying of thirst, which would rather crimp all my other plans, so is an instrumental goal of just about everything. So I value thirst instrumentally, but not terminally. If there was a treatment that suppressed my thirst reflex and didn’t endanger or harm me, sure, I’d take that — who wouldn’t? But that is rather hard to do, without giving me a built in saline drip, or something.
Consider an equally unpleasant sensation, say, hiccups, or yawning, which doesn’t fulfill any significant survival role (as far as I know: I’m happy to be corrected if anyone knows why evolution inflicted hiccups and yawning on us). Suppose someone someone offered me a cheap genetic treatment that would permanently remove my hiccup reflex or my desire to yawn, and had no other risks or deleterious side effects, then I’d probably take it. (Yawning is closer to a goal than hiccups, which are kind of involuntary, but I hope my point is clear here.) I don’t terminally value all the things my reflexes are treating as terminal goals. However, since evolution isn’t actually crazy, I do instrumentally value most of them.
Humans are not like AIXI. We do not automatically protect all our terminal goals. We appear if anything to be predisposed to what is generally called reflection: thinking rather carefully about our goals, whether they’re actually a good idea in the long term, and then attempting to change them is we come to the conclusion that they are not. There are limits to what’s feasible here, currently, but technology will change these. Turning down your hunger reflex didn’t used to be possible, and attempting to overcome it by sheer willpower by dieting is notoriously hard (I know, I’ve tried). Then Novo Nordisk changed that (and made a lot of money). What happens when technology allows all of our current goals to be edited, and even permanently and inheritable rewritten? I don’t know – I wish I did – but I’m very sure that “we don‘t change any of them, that same way AIXI wouldn‘t change its goals” is not an accurate statement.
Again, I’d prefer of you don’t being up reflexes or evolution treating things s terminal goals because I don’t think it’s relevant, and causes confusion.
Wrt to the Ozempic thing. It’s a bit complicated. But to a first approximation is treat all those as instrumental goals, and people just doing am EV calculation. E.g. making it easier for them seemed to get higher utility long term at the clear of some utility now.
Side note: Unclear if AIXI would protect it’s own values due to embeddedness problems
That evolution treating obviously instrumental goals as terminal in the construction of humans is confusing to humans is part of my point: humans are often confused about this distinction, and having a body that is confused about this is one of the reasons. But it’s not an essential element, so let’s lay it aside.
So, let’s go with a more complex example. Most Christians have a terminal goal of becoming a better Christian (or at least, they say that it isn’t an instrumental goal of not wanting to go to Hell). That’s a terminal goal of adjusting your terminal goal structure to better fit a specific pattern. That’s, well, astonishingly similar to what Value Learning is trying to achieve. This is not an uncommon pattern, you can find it in basically every religion (often along with a backup reason to make it an instrumental goal of not wanting to be punished in some way). In fact, Richard Dawkins would probably argue that this was a necessary feature of a religion — but then he considers religions to be self-propagating memetic parasites of the human mind, and it that framework, it looks like a rather necessary feature. Regardless of that, the fact that this is not just possible, but common enough that most religious people, i.e. most people in the world, have at least a mild version of it tells us something about humanity.
On AIXI: yes, I was implicitly assuming an AIXI smart enough to realize that it was in fact embedded, or at least that there exists a causal path from messing with certain wires in its braincase to its future goal function and thus behavior. This seems a rather plausible assumption to me, but it does require that AIXI has learned a world model complex enough to start reliably making predictions like that. Having other AIXIs available to do experimental brain surgery on, or observe the effects of an iron bar accidentally passing through their braincase in different locations, seems likely to be helpful to obtaining evidence that would cause those particular Bayesian updates.