The theorem shows that if one adopts a simple utility function—or let’s say if an Artificial Intelligence has as its goal maximizing the computing power in existence, even if that means killing us and using us for parts—this yields a consistent set of preferences. It doesn’t seem like we could argue the AI into adopting a different goal unless that (implausibly) served the original goal better than just working at it directly. We could picture the AI as a physical process that first calculates the expected value of various actions in terms of computing power (this would have to be approximate, but we’ve found approximations very useful in practical contexts) and then automatically takes the action with the highest calculated expected value.
Now in a sense, this shows your problem has no solution. We have no apparent way to argue morality into an agent that doesn’t already have it, on some level. In fact this appears mathematically impossible. (Also, the Universe does not love you and will kill you if the math of physics happens to work out that way.)
But if you already have moral preferences, there shouldn’t be any way to argue you out of them by showing the non-existence of Vishnu. Any desires that correspond to a utility function would yield consistent preferences. If you follow them then nobody can raise any logical objection. God would have to do the same, if he existed. He would just have more strength and knowledge with which to impose his will (to the point of creating a logical contradiction—but we can charitably assume theologians meant something else.) When it comes to consistent moral foundations, the theorem gives no special place to his imaginary desires relative to yours.
I mentioned above that a simple utility function does not seem to capture my moral preferences, though it could be a good rule of thumb. There’s probably no simple way to find out what you value if you don’t already know. CFAR does not address the abstract problem; possibly they could help you figure out what you actually value, if you want practical guidance.
Now I’m curious about Crowley too. I almost never really get offended, so even if he is abrasive, I’m sure I can focus on the facts and pick out a few things to share, even if I wouldn’t share him directly.
Note that he doesn’t believe in making anything easy for the reader. The second half of this essay might perhaps have what you want, starting with section XI. Crowley wrote it under a pseudonym and at least once refers to himself in the third person; be warned.
Thanks a lot for explaining the utility theorem. So just to be sure, if moral preferences for my personal values (I’ll check CFAR for help on this, eventually) are the basis of morality, is morality necessarily subjective?
I’ll get to Crowley eventually too, thanks for the link. I’ve just started the Rationality e-book and I feel like it will give me a lot of the background knowledge to understand other articles and stuff people talk about here.
If “subjective” means “a completely different alien species would likely care about different things than humans”, then yes. You also can’t expect that a rock would have the same morality as you.
If “subjective” means “a different human would care about completely different things than me” then probably not much. It should be possible to define a morality of an “average human” that most humans would consider correct. The reason it appears otherwise is that for tribal reasons we are prone to assume that our enemies are psychologically nonhuman, and our reasoning is often based on factual errors, and we are actually not good enough at consistently following our own values. (Thus the definition of CEV as “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”; it refers to the assumption of having correct beliefs, being more consistent, and not being divided by factional conflicts.)
Of course, both of these answers are disputed by many people.
There is a set of reasonably objective facts about what values people have, and how your actions would impact them, That leads to reasonably objective answers about what you should and shouldn’t do in a specific situation. However, they are only locally objective,..what value based ethics removes is globally objective answers, in the sense that you should always do X .or refrain from Y irrespective of the contexts,
It’s a bit like the difference between small g and big G in physics,
There is a set of reasonably objective facts about what values people have, and hhow your actions would impact them, That leads to reasonably objective answers about what you should and shouldn’t do in a specific situation.
Nope. It leads to reasonably objective descriptive answers about what the consequences of your actions will be. It does not lead to normative answers about what you should or should not do.
Okay, I guess I’m still confused. So far I’ve loved everything I’ve read on this site and have been able to understand; I’ve appreciated/agreed with the first 110 pages of the Rationality ebook, felt a little skeptical for liking it so completely, and then reassured myself with the Aumann’s agreement theorem it mentions. So I feel like if this utility theorem which bases morality on preferences is commonly accepted around here, I’ll probably like it once I fully understand it. So bear with me as I ask more questions...
Whose preferences am I valuing? Only my own? Everyone’s equally? Those of an “average human”? What about future humans?
Yeah, by subjective, I meant that different humans would care about different things. I’m not really worried about basic morality, like not beating people up and stuff, but...
I have a feeling the hardest part of morality will now be determining where to strike a balance between individual human freedom and concern for the future of humanity.
Like, to what extent is it permissible to harm the environment? If something, like eating sugar for example, makes people dumber, should it be limited? Is population control like China’s a good thing?
Can you really say that most humans agree on where this line between individual freedom and concern for the future of humanity should be drawn? It seems unlikely...
I’m the wrong person to ask about “this utility theorem which bases morality on preferences” since I don’t really subscribe to this point of view.
I use the world “morality” as a synonym for “system of values” and I think that these values are multiple, somewhat hierarchical, and are NOT coherent. Moral decisions are generally taken on the basis of a weighted balance between several conflicting values.
By definition, you can only care about your own preferences. That being said, it’s certainly possible for you to have a preference for other people’s preferences to be satisfied, in which case you would be (indirectly) caring about the preferences of others.
The question of whether humans all value the same thing is a controversial one. Most Friendly AI theorists believe, however, that the answer is “yes”, at least if you extrapolate their preferences far enough. For more details, take a look at Coherent Extrapolated Volition.
Okay, that makes sense, but does this mean you can’t say someone else did something wrong, unless he was acting inconsistently with his personal preferences?
Ah, okay, I’ve been reading most hyperlinks here, but that one looks pretty long, so I will come back to it after I finish Rationality (or maybe my question will even be answered later on in the book...)
That is definitely not the idea behind CEV, though it may reflect the idea that a sizable majority will mostly share the same values under extrapolation.
This is an impressive failure to respond to what I said, which again was that you asked for an explanation of false data. “Most Friendly AI theorists” do not appear to think that extrapolation will bring all human values into agreement, so I don’t know what “arguments” you refer to or even what you think they seek to establish. Certainly the link above has Eliezer assuming the opposite (at least for the purpose of safety-conscious engineering).
ETA: This is the link to the full sub-thread. Note my response to dxu.
Is that a fact? It’s true that the theories often discussed here , utilitarianism and so in, don’t solve the motivation problem, but that doesn’t mean no theory does,
Not necessarily subjective, in the sense that “what should I do in situation X” necessarily lacks an objective answer.
Even if you treat all value as morally relevant, and you certain dont have to, there is a set of reasonably objective facts about what values people have, and how your actions would impact them, That leads to reasonably objective answers about what you should and shouldn’t do in a specific situation. However, they are only locally objective,..
The theorem shows that if one adopts a simple utility function—or let’s say if an Artificial Intelligence has as its goal maximizing the computing power in existence, even if that means killing us and using us for parts—this yields a consistent set of preferences. It doesn’t seem like we could argue the AI into adopting a different goal unless that (implausibly) served the original goal better than just working at it directly. We could picture the AI as a physical process that first calculates the expected value of various actions in terms of computing power (this would have to be approximate, but we’ve found approximations very useful in practical contexts) and then automatically takes the action with the highest calculated expected value.
Now in a sense, this shows your problem has no solution. We have no apparent way to argue morality into an agent that doesn’t already have it, on some level. In fact this appears mathematically impossible. (Also, the Universe does not love you and will kill you if the math of physics happens to work out that way.)
But if you already have moral preferences, there shouldn’t be any way to argue you out of them by showing the non-existence of Vishnu. Any desires that correspond to a utility function would yield consistent preferences. If you follow them then nobody can raise any logical objection. God would have to do the same, if he existed. He would just have more strength and knowledge with which to impose his will (to the point of creating a logical contradiction—but we can charitably assume theologians meant something else.) When it comes to consistent moral foundations, the theorem gives no special place to his imaginary desires relative to yours.
I mentioned above that a simple utility function does not seem to capture my moral preferences, though it could be a good rule of thumb. There’s probably no simple way to find out what you value if you don’t already know. CFAR does not address the abstract problem; possibly they could help you figure out what you actually value, if you want practical guidance.
Note that he doesn’t believe in making anything easy for the reader. The second half of this essay might perhaps have what you want, starting with section XI. Crowley wrote it under a pseudonym and at least once refers to himself in the third person; be warned.
Thanks a lot for explaining the utility theorem. So just to be sure, if moral preferences for my personal values (I’ll check CFAR for help on this, eventually) are the basis of morality, is morality necessarily subjective?
I’ll get to Crowley eventually too, thanks for the link. I’ve just started the Rationality e-book and I feel like it will give me a lot of the background knowledge to understand other articles and stuff people talk about here.
If “subjective” means “a completely different alien species would likely care about different things than humans”, then yes. You also can’t expect that a rock would have the same morality as you.
If “subjective” means “a different human would care about completely different things than me” then probably not much. It should be possible to define a morality of an “average human” that most humans would consider correct. The reason it appears otherwise is that for tribal reasons we are prone to assume that our enemies are psychologically nonhuman, and our reasoning is often based on factual errors, and we are actually not good enough at consistently following our own values. (Thus the definition of CEV as “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”; it refers to the assumption of having correct beliefs, being more consistent, and not being divided by factional conflicts.)
Of course, both of these answers are disputed by many people.
There is a set of reasonably objective facts about what values people have, and how your actions would impact them, That leads to reasonably objective answers about what you should and shouldn’t do in a specific situation. However, they are only locally objective,..what value based ethics removes is globally objective answers, in the sense that you should always do X .or refrain from Y irrespective of the contexts,
It’s a bit like the difference between small g and big G in physics,
Nope. It leads to reasonably objective descriptive answers about what the consequences of your actions will be. It does not lead to normative answers about what you should or should not do.
Okay, I guess I’m still confused. So far I’ve loved everything I’ve read on this site and have been able to understand; I’ve appreciated/agreed with the first 110 pages of the Rationality ebook, felt a little skeptical for liking it so completely, and then reassured myself with the Aumann’s agreement theorem it mentions. So I feel like if this utility theorem which bases morality on preferences is commonly accepted around here, I’ll probably like it once I fully understand it. So bear with me as I ask more questions...
Whose preferences am I valuing? Only my own? Everyone’s equally? Those of an “average human”? What about future humans?
Yeah, by subjective, I meant that different humans would care about different things. I’m not really worried about basic morality, like not beating people up and stuff, but...
I have a feeling the hardest part of morality will now be determining where to strike a balance between individual human freedom and concern for the future of humanity.
Like, to what extent is it permissible to harm the environment? If something, like eating sugar for example, makes people dumber, should it be limited? Is population control like China’s a good thing?
Can you really say that most humans agree on where this line between individual freedom and concern for the future of humanity should be drawn? It seems unlikely...
I’m the wrong person to ask about “this utility theorem which bases morality on preferences” since I don’t really subscribe to this point of view.
I use the world “morality” as a synonym for “system of values” and I think that these values are multiple, somewhat hierarchical, and are NOT coherent. Moral decisions are generally taken on the basis of a weighted balance between several conflicting values.
By definition, you can only care about your own preferences. That being said, it’s certainly possible for you to have a preference for other people’s preferences to be satisfied, in which case you would be (indirectly) caring about the preferences of others.
The question of whether humans all value the same thing is a controversial one. Most Friendly AI theorists believe, however, that the answer is “yes”, at least if you extrapolate their preferences far enough. For more details, take a look at Coherent Extrapolated Volition.
Okay, that makes sense, but does this mean you can’t say someone else did something wrong, unless he was acting inconsistently with his personal preferences?
Ah, okay, I’ve been reading most hyperlinks here, but that one looks pretty long, so I will come back to it after I finish Rationality (or maybe my question will even be answered later on in the book...)
That is definitely not the idea behind CEV, though it may reflect the idea that a sizable majority will mostly share the same values under extrapolation.
Do they have any arguments for this besides wishful thinking?
I told him “they” assume no such thing—his own link is full of talk about how to deal with disagreements.
Yes, I’ve read most of the arguments, they strike me as highly speculative and hand-wavy.
This is an impressive failure to respond to what I said, which again was that you asked for an explanation of false data. “Most Friendly AI theorists” do not appear to think that extrapolation will bring all human values into agreement, so I don’t know what “arguments” you refer to or even what you think they seek to establish. Certainly the link above has Eliezer assuming the opposite (at least for the purpose of safety-conscious engineering).
ETA: This is the link to the full sub-thread. Note my response to dxu.
Is that a fact? It’s true that the theories often discussed here , utilitarianism and so in, don’t solve the motivation problem, but that doesn’t mean no theory does,
Not necessarily subjective, in the sense that “what should I do in situation X” necessarily lacks an objective answer.
Even if you treat all value as morally relevant, and you certain dont have to, there is a set of reasonably objective facts about what values people have, and how your actions would impact them, That leads to reasonably objective answers about what you should and shouldn’t do in a specific situation. However, they are only locally objective,..