To those who say they don’t want to be wireheaded, how do you really know that, when you haven’t tried wireheading?
But it’s not because I think there’s some downside to the experience that I don’t want it. The experience is as good as can possibly be. I want to continue to be someone who thinks things and does stuff, even at a cost in happiness.
You don’t know how good “as good as can possibly be” is yet.
I want to continue to be someone who thinks things and does stuff, even at a cost in happiness.
But surely the cost in happiness that you’re willing to accept isn’t infinite. For example, presumably you’re not willing to be tortured for a year in exchange for a year of thinking and doing stuff. Someone who has never experienced much pain might think that torture is no big deal, and accept this exchange, but he would be mistaken, right?
How do you know you’re not similarly mistaken about wireheading?
How do you know you’re not similarly mistaken about wireheading?
I’m a bit skeptical of how well you can use the term “mistaken” when talking about technology that would allow us to modify our minds to an arbitrary degree. One could easily fathom a mind that (say) wants to be wireheaded for as long as the wireheading goes on, but ceases to want it the moment the wireheading stops. (I.e. both prefer their current state of wireheadedness/non-wireheadedness and wouldn’t want to change it.) Can we really say that one of them is “mistaken”, or wouldn’t it be more accurate to say that they simply have different preferences?
Perhaps I have a maximum utility to happiness, which increasing happiness approaches asymptotically?
Yes, I think that’s quite possible, but I don’t know whether it’s actually the case or not. A big question I have is whether any of our values scales up to the size of the universe, in other words, doesn’t asymptotically approach an upper bound well before we used up the resources in the universe. See also my latest post http://lesswrong.com/lw/1oj/complexity_of_value_complexity_of_outcome/ where I talk about some related ideas.
I want to continue to be someone who thinks things and does stuff, even at a cost in happiness.
The FAI can make you feel as though you “think things and do stuff”, just by changing your preferences. I don’t think any reason beginning with “I want” is going to work, because your preferences aren’t fixed or immutable in this hypothetical.
Anyway, can you explain why you are attached to your preferences? That “it’s better to value this than value that” is incoherent, and the FAI will see that. The FAI will have no objective, logical reason to distinguish between values you currently have and are attached to and values that you could have and be attached to, and might as well modify you than modify the universe. (Because the universe has exactly the same value either way.)
If any possible goal is considered to have the same value (by what standard?), then the “FAI” is not friendly. If preferences don’t matter, then why does them not mattering matter? Why change one’s utility function at all, if anything is as good as anything else?
The FAI can make you feel as though you “think things and do stuff”, just by changing your preferences.
I can’t see how a true FAI can change my preferences if I prefer them not being changed.
Anyway, can you explain why you are attached to your preferences? That “it’s better to value this than value that” is incoherent, and the FAI will see that. The FAI will have no objective, logical reason to distinguish between values you currently have and are attached to and values that you could have and be attached to, and might as well modify you than modify the universe. (Because the universe has exactly the same value either way.)
It does not work this way. We want to do what is right, not what would conform our utility function if we were petunias or paperclip AIs or randomly chosen expected utility maximizers; the whole point of Friendliness is to find out and implement what we care about and not anything else.
I’m not only attached to my preferences; I am great part my preferences. I even have a preference such that I don’t want my preferences to be forcibly changed. Thinking about changing meta-preferences quickly leads to a strange loop, but if I look at specific outcome (like me being turned to orgasmium) I can still make a moral judgement and reject that outcome.
The FAI will have no objective, logical reason to distinguish between values you currently have and are attached to and values that you could have and be attached to, and might as well modify you than modify the universe. (Because the universe has exactly the same value either way.)
The FAI has a perfectly objective, logical reason to do what’s right and not else; its existence and utility function is causally retractable to the humans that designed it. An AI that verges on nihilism and contemplates switching humanity’s utility function to something else, partly because the universe has the “exactly same value” either way, is definitely NOT a Friendly AI.
OK, I agree with this comment and this one that if you program an FAI to satisfy our actual preferences with no compromise, than that is what it is going to do. If people have a preference for their values being satisfied in reality, rather than them just being satisfied virtually, then no wire-heading for them.
However, if you do allow compromise so that the FAI should modify preferences that contradict each other, then we might be on our way to wire-heading. Eliezer observes there is a significant ‘objective component to human moral intuition’. We also value truth and meaning. (This comment strikes me as relevant.) If the FAI finds that these thre e are incompatible, which preference should it modify?
(Background for this comment in case you’re not familiar with my obsession—how could you have missed it? -- is that objective meaning, from any kind of subjective/objective angle, is incoherent.)
you do allow compromise so that the FAI should modify preferences that contradict each other, then we might be on our way to wire-heading.
First, I just note that this is a full-blown speculation about Friendliness content which should be only done while wearing a gas mask or a clown suit, or after donating to SIAI.
Quoting CEV:
“In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”
Also:
“Do we want our coherent extrapolated volition to satisfice, or maximize? My guess is that we want our coherent extrapolated volition to satisfice—to apply emergency first aid to human civilization, but not do humanity’s work on our behalf, or decide our futures for us. If so, rather than trying to guess the optimal decision of a specific individual, the CEV would pick a solution that satisficed the spread of possibilities for the extrapolated statistical aggregate of humankind.”
This should adddress your question. CEV would not typically modify humans on contradictions. But I repeat, this is all speculation.
It’s not clear to me from your recent posts whether you’ve read the metaethics sequence and/or CEV; if you haven’t, I recommend it whole-heartedly as it’s the most detailed discussion of morality available. Regarding your obsession, I’m aware of it and I think I’m able to understand your history and vantage point that enable such distress to arise, although my current self finds the topic utterly trivial and essentially a non-problem.
“Reason” here: a normal, unexceptional instance of cause and effect. It should be understood in a prosaic way, e.g. reason in a causal sense.
As for “objective”, I borrowed it from the parent post to illustrate my point. To expand on “objective” a bit: everything that exists in physical reality is, and our morality is as physical and extant as a brick (via our physical brains), so what sense does it make to distinguish between “subjective” and “objective,” or to refer to any phenomena as “objective” when in reality it is not a salient distinguishing feature.
If anything is “objective”, then I see no reason why human morality is not, that’s why I included the word in my post. But probably the best would be to simply refrain from generating further confusion by the objective/subjective distinction.
Reason is not the same as cause. Cause is whatever brings something about in the physical world. Reason is a special kind of cause for intentional actions. Specifically a reason for an action is a thought which convinces the actor that the action is good. So an objective reason would need an objective basis for something being called good. I don’t know of such a basis, and a bit more than a week ago half of the LW readers were beating up on Byrnema because she kept talking about objective reasons.
But it’s not because I think there’s some downside to the experience that I don’t want it. The experience is as good as can possibly be. I want to continue to be someone who thinks things and does stuff, even at a cost in happiness.
You don’t know how good “as good as can possibly be” is yet.
But surely the cost in happiness that you’re willing to accept isn’t infinite. For example, presumably you’re not willing to be tortured for a year in exchange for a year of thinking and doing stuff. Someone who has never experienced much pain might think that torture is no big deal, and accept this exchange, but he would be mistaken, right?
How do you know you’re not similarly mistaken about wireheading?
I’m a bit skeptical of how well you can use the term “mistaken” when talking about technology that would allow us to modify our minds to an arbitrary degree. One could easily fathom a mind that (say) wants to be wireheaded for as long as the wireheading goes on, but ceases to want it the moment the wireheading stops. (I.e. both prefer their current state of wireheadedness/non-wireheadedness and wouldn’t want to change it.) Can we really say that one of them is “mistaken”, or wouldn’t it be more accurate to say that they simply have different preferences?
EDIT: Expanded this to a top-level post.
Interesting problem! Perhaps I have a maximum utility to happiness, which increasing happiness approaches asymptotically?
Yes, I think that’s quite possible, but I don’t know whether it’s actually the case or not. A big question I have is whether any of our values scales up to the size of the universe, in other words, doesn’t asymptotically approach an upper bound well before we used up the resources in the universe. See also my latest post http://lesswrong.com/lw/1oj/complexity_of_value_complexity_of_outcome/ where I talk about some related ideas.
The maximum amount of pleasure is finite too.
The FAI can make you feel as though you “think things and do stuff”, just by changing your preferences. I don’t think any reason beginning with “I want” is going to work, because your preferences aren’t fixed or immutable in this hypothetical.
Anyway, can you explain why you are attached to your preferences? That “it’s better to value this than value that” is incoherent, and the FAI will see that. The FAI will have no objective, logical reason to distinguish between values you currently have and are attached to and values that you could have and be attached to, and might as well modify you than modify the universe. (Because the universe has exactly the same value either way.)
If any possible goal is considered to have the same value (by what standard?), then the “FAI” is not friendly. If preferences don’t matter, then why does them not mattering matter? Why change one’s utility function at all, if anything is as good as anything else?
Well I understand I owe money to the Singularity Institute now for speculating on what the output of the CEV would be. (Dire Warnings #3)
That page said:
“None may argue on the SL4 mailing list about the output of CEV”.
A different place, with different rules.
I can’t see how a true FAI can change my preferences if I prefer them not being changed.
It does not work this way. We want to do what is right, not what would conform our utility function if we were petunias or paperclip AIs or randomly chosen expected utility maximizers; the whole point of Friendliness is to find out and implement what we care about and not anything else.
I’m not only attached to my preferences; I am great part my preferences. I even have a preference such that I don’t want my preferences to be forcibly changed. Thinking about changing meta-preferences quickly leads to a strange loop, but if I look at specific outcome (like me being turned to orgasmium) I can still make a moral judgement and reject that outcome.
The FAI has a perfectly objective, logical reason to do what’s right and not else; its existence and utility function is causally retractable to the humans that designed it. An AI that verges on nihilism and contemplates switching humanity’s utility function to something else, partly because the universe has the “exactly same value” either way, is definitely NOT a Friendly AI.
OK, I agree with this comment and this one that if you program an FAI to satisfy our actual preferences with no compromise, than that is what it is going to do. If people have a preference for their values being satisfied in reality, rather than them just being satisfied virtually, then no wire-heading for them.
However, if you do allow compromise so that the FAI should modify preferences that contradict each other, then we might be on our way to wire-heading. Eliezer observes there is a significant ‘objective component to human moral intuition’. We also value truth and meaning. (This comment strikes me as relevant.) If the FAI finds that these thre e are incompatible, which preference should it modify?
(Background for this comment in case you’re not familiar with my obsession—how could you have missed it? -- is that objective meaning, from any kind of subjective/objective angle, is incoherent.)
First, I just note that this is a full-blown speculation about Friendliness content which should be only done while wearing a gas mask or a clown suit, or after donating to SIAI.
Quoting CEV:
“In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”
Also:
“Do we want our coherent extrapolated volition to satisfice, or maximize? My guess is that we want our coherent extrapolated volition to satisfice—to apply emergency first aid to human civilization, but not do humanity’s work on our behalf, or decide our futures for us. If so, rather than trying to guess the optimal decision of a specific individual, the CEV would pick a solution that satisficed the spread of possibilities for the extrapolated statistical aggregate of humankind.”
This should adddress your question. CEV would not typically modify humans on contradictions. But I repeat, this is all speculation.
It’s not clear to me from your recent posts whether you’ve read the metaethics sequence and/or CEV; if you haven’t, I recommend it whole-heartedly as it’s the most detailed discussion of morality available. Regarding your obsession, I’m aware of it and I think I’m able to understand your history and vantage point that enable such distress to arise, although my current self finds the topic utterly trivial and essentially a non-problem.
How do you define this term?
“Reason” here: a normal, unexceptional instance of cause and effect. It should be understood in a prosaic way, e.g. reason in a causal sense.
As for “objective”, I borrowed it from the parent post to illustrate my point. To expand on “objective” a bit: everything that exists in physical reality is, and our morality is as physical and extant as a brick (via our physical brains), so what sense does it make to distinguish between “subjective” and “objective,” or to refer to any phenomena as “objective” when in reality it is not a salient distinguishing feature.
If anything is “objective”, then I see no reason why human morality is not, that’s why I included the word in my post. But probably the best would be to simply refrain from generating further confusion by the objective/subjective distinction.
Reason is not the same as cause. Cause is whatever brings something about in the physical world. Reason is a special kind of cause for intentional actions. Specifically a reason for an action is a thought which convinces the actor that the action is good. So an objective reason would need an objective basis for something being called good. I don’t know of such a basis, and a bit more than a week ago half of the LW readers were beating up on Byrnema because she kept talking about objective reasons.
OK then, it was a misuse of the word from my part. Anyway, I’d never intend a teleological meaning for reasons discussed here before.
Please read Not for the Sake of Happiness (Alone) which addresses this point.