You cannot be mistaken about (not) wanting to wirehead

Kaj_Sotala26 Jan 2010 12:06 UTC

49 points

In the comments of Welcome to Heaven, Wei Dai brings up the argument that even though we may not want to be wireheaded now, our wireheaded selves would probably prefer to be wireheaded. Therefore we might be mistaken about what we really want. (Correction: what Wei actually said was that an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like, not that our wireheaded selves would prefer to be wireheaded.)

This is an argument I’ve heard frequently, one which I’ve even used myself. But I don’t think it holds up. More generally, I don’t think any argument that says one is wrong about what they want holds up.

To take the example of wireheading. It is not an inherent property of minds that they’ll become desperately addicted to anything that feels sufficiently good. Even from our own experience, we know that there are plenty of things that feel really good, but we don’t immediately crave for more afterwards. Sex might be great, but you can still afterwards get fatigued enough that you want to rest; eating good food might be enjoyable, but at some point you get full. The classic counter-example is that of the rats who could pull a lever stimulating a part of their brain, and ended up compulsively pulling it, to the exclusion of all else. People thought this to mean they were caught in a loop of stimulating their “pleasure center”, but it later turned out that wasn’t the case. Instead, the rats were stimulating their “wants to seek out things -center”.

The systems for experiencing pleasure and for wanting to seek out pleasure are separate ones. One can find something pleasurable, but still not develop a desire to seek it out. I’m sure all of you have had times when you haven’t felt the urge to participate in a particular activity, even though you knew you’d enjoy the activity in question if you just got around doing it. Conversly, one can also have a desire to seek out something, but still not find it pleasurable when it’s achieved.

Therefore, it is not an inherent property of wireheading that we’d automatically end up wanting it. Sure, you could wirehead someone in such a way that the person stopped wanting anything else, but you could also wirehead them in such a way that they were indifferent to whether or not it continued. You could even wirehead them in such a way that they enjoyed every minute of it, but at the same time wanted it to stop.

”Am I mistaken about wanting to be wireheaded?” is a wrong question. You might afterwards think you actually prefer to be wireheaded, or think you prefer not to be wireheaded, but that is purely a question of how you define the term “wireheading”. Is it a procedure that makes you want it, or is it not? Furthermore, even if we define wireheading so that you’d prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

If you’re not convinced about that last bit, consider the case of “anti-wireheading”: we rewire somebody so that they experience terrible, horrible, excruciating pain. We also rewire them so that regardless, they seek to maintain their current state. In fact, if they somehow stop feeling pain, they’ll compulsively seek a return to their previous hellish state. Would you say it was okay to anti-wirehead them, since an anti-wirehead will realize they were mistaken about not wanting to be an anti-wirehead? Probably not.

In fact, “I thought I wouldn’t want to do/experience X, but upon trying it out I realized I was wrong” doesn’t make sense. Previously the person didn’t want X, but after trying it out they did want X. X has caused a change in their preferences by altering their brain. This doesn’t mean that the pre-X person was wrong, it just means the post-X person has been changed. With the correct technology, anyone can be changed to prefer anything.

You can still be mistaken about whether or not you’ll like something, of course. But that’s distinct from whether or not you want it.

Note that this makes any thoughts along the lines of “an FAI might extrapolate the desires you had if you were more intelligent” tricky. It could just as well extrapolate the desires we had if we’d had our brains altered in some other way. What makes one method of mind alteration more acceptable than another? “Whether we’d consent to it now” is one obvious-seeming answer, but that too is filled with pitfalls. (For instance, what about our anti-wirehead?)

What links here?

Kaj_Sotala26 Jan 2010 12:06 UTC

49 points

79 comments3 min readLW link Archive

Wireheading Modest Epistemology

Wei Dai 26 Jan 2010 15:12 UTC
38 points
0
I’m really surprised that on a site called “Less Wrong”, there isn’t more skepticism about an argument that one can’t be wrong about X, especially when X isn’t just one statement but a large category of statements. That doesn’t scream out “hold on a second!” to anyone?
What links here?
- Meta: “Less Wrong” connotations? by Will_Newsome (1 Jul 2011 6:57 UTC; 9 points)
- Eliezer Yudkowsky 26 Jan 2010 21:20 UTC
  26 points
  0
  Parent
  Eyup. Humans can be wrong about anything. It’s like our superpower.
  - Jack 26 Jan 2010 21:31 UTC
    13 points
    0
    Parent
    You could be wrong about that.
    - Eliezer Yudkowsky 26 Jan 2010 21:35 UTC
      7 points
      0
      Parent
      What if I couldn’t be wrong about that?
      - thomblake 26 Jan 2010 21:39 UTC
        20 points
        0
        Parent
        Then you would clearly be immune to hemlock, and therefore weigh the same as a duck.
      - timtyler 26 Jan 2010 21:47 UTC
        −1 points
        0
        Parent
        Then you would be 100% certain—and 0 and 1 are not probabilities.
        Rob Bensinger 21 Jan 2013 21:02 UTC
        3 points
        0
        Parent
        It might be that he can’t be wrong about that, even though he doesn’t know for sure that he can’t be wrong about it. Infallibility and certainty are distinct concepts.
        timtyler 22 Jan 2013 2:12 UTC
        0 points
        0
        Parent
        Fallibility is in the mind.
        Rob Bensinger 22 Jan 2013 3:11 UTC
        4 points
        0
        Parent
        Certainty (confidence, etc.) is in the mind. Fallibility isn’t; you can be prone (or immune) to error even if no one thinks you are.
        
        The point is that ‘What if I couldn’t be wrong about it?’ does not express ‘What if I could be certain that I couldn’t be wrong about it?’; the latter requires that 1 be a probability, but the former does not, since I might be unable to be wrong about X and yet only assign, say, a .8 probability to X’s being true (because I don’t assign probability 1 to my own infallibility).
        timtyler 22 Jan 2013 23:49 UTC
        1 point
        0
        Parent
        
        Certainty (confidence, etc.) is in the mind. Fallibility isn’t; you can be prone (or immune) to error even if no one thinks you are.
        
        Though no one could ever possibly know. Seriously: fallibility is in the mind. It’s a measure of how likely something is to fail; likelihoods are probabilities—and probabilities are (best thought of as being) in the mind.
- Stuart_Armstrong 26 Jan 2010 16:03 UTC
  11 points
  0
  Parent
  Rigorously, I think the argument doesn’t stand up in its ultimate form. But it’s tiptoing in the direction of a very interesting point on how to deal with changing utility functions, especially in circumstances where the changes might be predictable.
  
  The simple answer is “judge everything in your future by your current utility function”, but that doesn’t seem satisfactory. Nor is “judge everything that occures in your future by your utility function at the time”, because of lobotomies, addicting wireheading, and so on. Some people have utility functions that they expect will change; and the degree of change allowable may vary from person to person and subject to subject (eg, people opposed to polygamy may have a wide range of reactions to the announcement “in fifty years time, you will approve of polygamy”). Some people trust their own CEV; I never would, but I might trust it one level removed.
  
  It’s a difficult subject, and my upvote was in thanks of bringing it up. Susequent posts on the subject I’ll judge more harshly.
  - Nick_Tarleton 26 Jan 2010 22:10 UTC
    14 points
    0
    Parent
    
    The simple answer is “judge everything in your future by your current utility function”, but that doesn’t seem satisfactory.
    
    It sounds satisfactory for agents that have utility functions. Humans don’t (unless you mean implicit utility functions under reflection, to the extent that different possible reflections converge), and I think it’s really misleading to talk as if we do.
    
    Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.
    - Stuart_Armstrong 27 Jan 2010 10:35 UTC
      0 points
      0
      Parent
      
      Also, while this is just me, I strongly doubt our notional-utility-functions-upon-reflection contain anything as specific as preferences about polygamy.
      
      That was just an example; people react differently to the idea that their values may change in the future, depending on the person and depending on the value.
  - CannibalSmith 28 Jan 2010 7:47 UTC
    1 point
    0
    Parent
    How about “judge by both utility functions and use the most pessimistic result”?
    - Paul Crowley 28 Jan 2010 9:07 UTC
      6 points
      0
      Parent
      If you take a utility function and multiply all the utilities by 0.01, is it the same utility function? In one sense it is, but by your measure it will always win a “most pessimistic” contest.
      
      Update: thinking about this further, if the only allowable operations on utilities are comparison and weighted sum, then you can multiply by any positive constant or add and subtract any constant and preserve isomorphism. Is there a name for this mathematical object?
      - Richard_Kennaway 28 Jan 2010 15:00 UTC
        9 points
        0
        Parent
        Affine transformations. Utility functions are defined up to affine transformation.
        
        In particular, this means that nothing has “positive utility” or “negative utility”, only greater or lesser utility compared to something else.
        
        ETA: If you want to compare two different people’s utilities, it can’t be done without introducing further structure to enable that comparison. This is required for any sort of felicific calculus.
        Paul Crowley 29 Jan 2010 17:52 UTC
        1 point
        0
        Parent
        There’s a name I can’t remember for the “number line with no zero” where you’re only able to refer to relative positions, not absolute ones. I’m looking for a name for the “number line with no zero and no scale”, which is invariant not just under translation but under any affine transformation with positive determinant.
        kpreid 29 Jan 2010 18:32 UTC
        1 point
        0
        Parent
        I’m in an elementary statistics class right now and we just heard about “levels of measurement” which seem to make these distinctions: your first is the interval scale, and second the ordinal scale.
        pengvado 29 Jan 2010 19:02 UTC
        1 point
        0
        Parent
        The “number line with no zero, but a uniquely preferred scale” isn’t in that list of measurement types; and it says the “number line with no zero and no scale” is the interval scale.
      - thomblake 28 Jan 2010 13:51 UTC
        0 points
        0
        Parent
        A utility function is just a representation of preference ordering. Presumably those properties would hold for anything that is merely an ordering making use of numbers.
        Richard_Kennaway 28 Jan 2010 15:03 UTC
        3 points
        0
        Parent
        You also need the conditions of the utility theorem to hold. A preference ordering only gives you conditions 1 and 2 of the theorem as stated in the link.
        thomblake 28 Jan 2010 15:42 UTC
        0 points
        0
        Parent
        Good point. I was effectively entirely leaving out the “mathematical” in “mathematical representation of preference ordering”. As I stated it, you couldn’t expect to aggregate utiles.
        Paul Crowley 29 Jan 2010 17:53 UTC
        0 points
        0
        Parent
        You can’t aggregate utils; you can only take their weighted sums. You can aggregate changes in utils though.
  - SarahNibs 26 Jan 2010 20:10 UTC
    1 point
    0
    Parent
    I completely agree. The argument may be wrong but the point it raises, that sloppily assuming things about which possible causal continuations of self I care about, is important.
    
    My initial reaction: we can still use our current utility function, but make sure the CEV analysis or whatever doesn’t say “what would you want if you were more intelligentetc?” but instead “what would you want if you were changed in a way you currently want to be changed”?
    
    This includes “what would you want if we found fixed points of iterated changes based on previous preferences”, so that if I currently want to value paperclips more but don’t care whether I value factories differently, but if upon modifying me to value paperclips more it turns out I would want to value factories more, then changing my preferences to value factories more is acceptable.
    
    The part where I’m getting confused right now (rather, the part where I notice I’m getting confused :)) is that calculating fixed points almost certainly depends on the order of alteration, so that there are lots of different future-mes that I prefer to current-me that are at local maximums.
    
    Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.
    - Stuart_Armstrong 27 Jan 2010 10:38 UTC
      1 point
      0
      Parent
      I don’t think the order issue is a big problem—there is not One Glowing Solution, we just need to find something nice and tolerable.
      
      Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.
      
      That is the question.
- RobinZ 26 Jan 2010 15:50 UTC
  3 points
  0
  Parent
  I think your heuristic is sound—that seemed screamingly wrong to me as well.
- Paul Crowley 27 Jan 2010 8:13 UTC
  1 point
  0
  Parent
  Incorrigibility is way too strong an assertion, but there’s a sense in which I cannot be completely wrong about my values, since I’m the only source of information about them; except perhaps to the extent that you can infer them from my fellow human beings, and to that extent humanity as a whole cannot be completely mistaken about its values.
  
  I suspect there may be an analogy with Donaldson’s observation that if you think penguins are tiny burrowing insects that live in the Sahara, you’re not so much mistaken about penguins as not talking about them at all. However, I can’t completely make this analogy work.
- timtyler 26 Jan 2010 21:21 UTC
  −1 points
  0
  Parent
  How about if X is a set of assertions that logical tautologies are true:
  
  http://en.wikipedia.org/wiki/Tautology_(logic))
  
  http://en.wikipedia.org/wiki/Tautology_(logic)#Definition_and_examples#Definition_and_examples)
  
  An example along similar lines to this post would be: you can’t be wrong about thinking you are thinking about X—if you are thinking about X.
  - Eliezer Yudkowsky 26 Jan 2010 21:37 UTC
    9 points
    0
    Parent
    http://www.spaceandgames.com/?p=27
    - wedrifid 28 Jan 2010 2:43 UTC
      4 points
      0
      Parent
      Now that is a overconfidence/independent statements anecdote I’ll remember. The ‘7 is prime probability 1’ part too.
    - timtyler 26 Jan 2010 21:45 UTC
      −1 points
      0
      Parent
      Nah, these are not “independent” statements, they are all much the same:
      
      They are “I want X” statements.
  - Jack 26 Jan 2010 22:02 UTC
    1 point
    0
    Parent
    P v -p is disputed, so someone is wrong there. Also, if you have ever done a 10+ line proof or 10+ place truth table you know it is trivially (pun intended) easy to get those wrong.
    
    I think the concept of a thought and what it is for a thought to be about something needs to be refined before we can say more about the second example. To begin with, if I see a dragonfly and mistake it for a fairy and then start to think about the fairy I saw, it isn’t clear that I really am thinking about a fairy.
Psychohistorian 26 Jan 2010 19:26 UTC
28 points
0
This conclusion is too strong, because there’s a clear distinction that we (or at least I) make intuitively that is incompatible with this reasoning.

Consider the following:

I don’t want to try sushi. A friend convinces/bribes/coerces me to try sushi. It turns out I really like sushi, and eat it all the time afterward.

I don’t want to try wireheading. I am convinced/bribed/coerced to try wireheading. I really like wireheading, and don’t want to stop doing it.

These sequences are superficially identical. Kaj’s construction of want suggests I could not have been mistaken about my desire for sushi. However, intuitively and in common language, it makes sense to say that I was mistaken about my desire for sushi. There is, however, something different about saying I was mistaken in not wanting to wirehead. It’s an issue of values.

Consider the ardent vegetarian who is coercively fed beef, and likes beef so much that he lacks the willpower to avoid eating it, even though it causes him tremendous psychic distress to do so. It seems reasonable to say he was correct in not wanting to eat beef, and have this judgement be entirely consistent with my being incorrect about not wanting to eat sushi. The issue is whether my action has a non-hedonic value. Eating sushi (for me) does not. Eating beef for him does. His hedonic values get in the way of his utilitarian values.

This dilemma actually integrates a number of rather complex problems. I’m hereby precommitting to making a top-level post about this before Friday. Let’s hope it works.
- Normal_Anomaly 12 Jun 2011 14:39 UTC
  27 points
  0
  Parent
  A possible solution to this: The person who does not want to try sushi thinks he will dislike it and say “Yuck!” He actually enjoys it. He is wrong in that he anticipated something different from what happened. A person who does not want to wirehead will anticipate enjoying it immensely, and this will be accurate. The first person’s decision to try to avoid sushi is based on a mistaken anticipation, but the second person’s decision to avoid wireheading takes into account a correct anticipation.
- Cyan 3 Feb 2010 2:56 UTC
  8 points
  0
  Parent
  No top level post? I has a sad.
  - Psychohistorian 3 Feb 2010 20:06 UTC
    7 points
    0
    Parent
    And commitment devices work, if belatedly.
    - Cyan 3 Feb 2010 20:08 UTC
      1 point
      0
      Parent
      Yay!
- Kaj_Sotala 27 Jan 2010 7:54 UTC
  1 point
  0
  Parent
  See my reply to zero_call below. Yes, in baseline humans and with current technology, it does make sense to use the expression “true desire”. As technology improves, however, you’ll need to define it more and more rigorously. Defining it by reference to your current values is one way.
knb 27 Jan 2010 2:16 UTC
14 points
0
The Onion on informing people their values are wrong:

http://www.theonion.com/content/news_briefs/man_who_enjoys_thing
- Peterdjones 22 Jan 2013 3:32 UTC
  0 points
  0
  Parent
  Yikes. Shades of Dennett
- timtyler 27 Jan 2010 13:50 UTC
  −1 points
  0
  Parent
  Though it is The Onion, that link seems pretty relevant!
Wei Dai 26 Jan 2010 13:03 UTC
11 points
0
What makes one method of mind alteration more acceptable than another?

It so happens that there are people working on this problem right now. See for example the current discussion taking place on Vladmir Nesov’s blog.

As a preliminary step we can categorize the ways that our “wants” can change as follows (these are mostly taken from a comment by Andreas):
1. resolving a logical uncertainty
2. updating in light of new evidence
3. correcting a past computational error
4. forgetting information
5. committing a new computational error
6. unintentional physical modification (i.e., brain damage)
7. intentional physical modification
8. other
Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are “it depends”?

The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would “try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded.” Does that seem more reasonable now?

For instance, what about our anti-wirehead?

If the FAI tries to extrapolate whether you’d want to be anti-wireheaded if you knew what it felt like to be anti-wireheaded, the obvious answer is no. You seem to assume that the FAI would instead try to predict whether you’d prefer to be anti-wireheaded after you were actually anti-wireheaded, but that change would be more like category 6.
What links here?
- Kaj_Sotala's comment on You cannot be mistaken about (not) wanting to wirehead by Kaj_Sotala (27 Jan 2010 7:43 UTC; 0 points)
- rwallace 26 Jan 2010 13:55 UTC
  7 points
  0
  Parent
  
  Can we agree that categories 1, 2, and 3 are acceptable, 5 and 6 are unacceptable, and 4, 7, and 8 are “it depends”?
  
  No. If someone—my next-door neighbor, my doctor, the government, a fictional genie, whoever—is proposing to rewire my brain, my informed consent beforehand is the only thing that can make it acceptable.
  - Kazuo_Thow 27 Jan 2010 1:23 UTC
    0 points
    0
    Parent
    Are you making this as a statement of personal preference, or general policy? What if it becomes practically impossible for a person to give informed consent, as in cases of extreme mental disability?
    - rwallace 27 Jan 2010 6:31 UTC
      0 points
      0
      Parent
      General policy. For example, if Wei Dai chooses the wirehead route, I might think he’s missing out on a lot of other things life has to offer, but that doesn’t give me the right to forcibly unwirehead him, any more than he has the right to do the reverse to me.
      
      In other words, he and I have two separate disagreements: of value axioms, whether there should be more to life than wireheading (which is a matter of personal preference), and of moral axioms, whether it’s okay to initiate the use of armed force (whether in person or by proxy) to impose one’s preferred lifestyle on another (which is a matter of general policy). (And this serves as a nice pair of counterexamples to the theory I have seen floating around that there is a universal set of human values.)
      
      In cases of extreme mental disability, we don’t have an entity that is inherently capable of giving informed consent, so indeed it’s not possible to apply that criterion. In that case (given the technology to do so) it would be necessary to intervene to repair the disability before the criterion can begin to apply.
      - Wei Dai 27 Jan 2010 7:18 UTC
        2 points
        0
        Parent
        rwallace, I’m not sure there is any actual disagreement between us. All I’m saying is that those who have not actually tried wireheading (or otherwise has knowledge about what it feels like to be wireheaded) perhaps shouldn’t be so sure that they really prefer not to be wireheaded. And I never mentioned anything about forcibly wireheading people. (Maybe you confused my position with denisbider’s?)
        rwallace 27 Jan 2010 8:49 UTC
        0 points
        0
        Parent
        
        The change that I suggested in my argument belongs to category 2, updating in light of new evidence. I wrote that the FAI would “try to extrapolate what your preferences would be if you knew what it felt like to be wireheaded.”
        
        I took this to mean that you agreed with denisbider’s position of licensing the initiation of force and justifying it based on what the altered version of the victim would prefer after the event—was that not your intent? If not, then you’re right, we don’t disagree to anywhere near the extent I had thought.
- Kaj_Sotala 26 Jan 2010 14:14 UTC
  6 points
  0
  Parent
  I’m not entirely sure if it’s alright to alter someone’s mind to update in light of new evidence if they didn’t want to update. The same goes for the 1 and 3.
  
  But let’s assume, for the sake of argument, that we accept your categorization. Or let’s at least assume that the person in question doesn’t mind the updating. It seems to me that there are two possible kinds of knowledge about what wireheading feels like, and we must distinguish between which one we mean.
  
  The first kind is abstract, declarative knowledge. This may affect our (instrumental?) preferences, depending on our existing preferences. For instance, I know that people choosing where to live underestimate the effect travel times have on their happiness and overestimate the effect that the amount of space has on their happiness. Knowing this, and preferring to be happy, I might choose a different home than I otherwise would have. I presume you don’t mean this kind of knowledge, as we already know in the abstract that wireheading would be the best feeling we could ever possibly experience.
  
  The second kind is a more visceral, experienced kind of knowledge, the knowledge of what it really feels like. Knowing what it feels like to be a bat, to use Nagel’s classic example. Here it becomes tricky. It’s an open question to what degree you can really add this kind of a knowledge to someone’s mind, as the recollection of the experience is necessarily incomplete. We might remember being happy or wireheaded, but just the act of recalling it doesn’t return us to a state of mind where we are just as happy as we were back then. Instead we have an abstract memory of having been happy, which possibly activates other emotions on our mind, depending on what sorts of associations have built up around the memory. We might feel an uplifting echo of that happiness, a longing to experience it again, bitterness or sorrow about being unable to relive it, or just a blank indifference.
  
  If an FAI simply simulates a state of mind where knowledge of the experience of wireheadedness has been added, I don’t think that will change the person’s preferences at all. The recollection of the wirehead state has just became an abstractly recalled piece of knowledge, without any emotional or motivational triggers that would affect one’s preferences in any way.
  - Wei Dai 26 Jan 2010 14:49 UTC
    5 points
    0
    Parent
    Let me try a different tack here. Suppose you have in front of you two flavors of ice cream. You don’t know what they taste like, but you prefer the red one because you like red and that’s the only thing you have to go on. Now an FAI comes along and tells you that it predicts if you knew what the flavors taste like, you’d choose the blue one instead. Do you not switch to the blue one?
    
    I presume you don’t mean this kind of knowledge, as we already know in the abstract that wireheading would be the best feeling we could ever possibly experience.
    
    Know that it’s the “best” is hardly having full declarative knowledge, when we don’t know how good “best” is.
    
    If an FAI simply simulates a state of mind where knowledge of the experience of wireheadedness has been added, I don’t think that will change the person’s preferences at all. The recollection of the wirehead state has just became an abstractly recalled piece of knowledge, without any emotional or motivational triggers that would affect one’s preferences in any way.
    
    I don’t see how that makes any sense, given my ice cream example.
    - Kaj_Sotala 26 Jan 2010 15:23 UTC
      6 points
      0
      Parent
      In the ice cream example, yes, I’ll switch to the blue one. But that one is like my previous example of choosing where to live: I switched because I gained information that allowed me to better fulfill my intrinsic preferences. It’s not that my actual preferences would have changed. If my preference would have been “I want to eat the best ice cream I can have, for as long as the taste doesn’t come from a blue ice cream”, (analogous to “I want to experience the best life there is, for as long as the enjoyment doesn’t come from wireheading”), I wouldn’t have switched.
      
      Know that it’s the “best” is hardly having full declarative knowledge, when we don’t know how good “best” is.
      
      Fair enough. But even if a person declining to be wireheaded was provided information of exactly how much better “best” would be, I doubt that would sway very many of them. (Though it may sway some, and in that case yes, an FAI telling them this could make them switch.)
      
      I don’t see how that makes any sense, given my ice cream example.
      
      Sorry, poor wording on my behalf. Let me reword it:
      
      “If an FAI simply simulates a state of mind where a memory of the experience of wireheadedness has been added, I don’t think that will change the person’s preferences at all. The recollection of the wirehead state is just the previously known ‘wireheading is a thousand times better than any other pleasure I could have’ knowledge, stored in a different format. But if no emotional or motivational associations are added, having the same information in a different format shouldn’t change any preferences.”
      - Wei Dai 26 Jan 2010 16:41 UTC
        4 points
        0
        Parent
        I think that resolves most of our disagreement, and I’ll think a bit more about your current position. (Have to go to sleep now.) In the mean time, can you please make a correction to your post? As you can see, my argument isn’t “our wireheaded selves would probably prefer to be wireheaded” but rather “an FAI might tell us that we would prefer to be wireheaded if we knew what it felt like.” I guess you had in your mind the previous argument you heard from others, and conflated mine with theirs.
        What links here?
        You cannot be mistaken about (not) wanting to wirehead by Kaj_Sotala (26 Jan 2010 12:06 UTC; 49 points)
        Kaj_Sotala 26 Jan 2010 17:12 UTC
        2 points
        0
        Parent
        Correction added.
      - denisbider 26 Jan 2010 15:31 UTC
        0 points
        0
        Parent
        
        If my preference would have been “I want to eat the best ice cream I can have, for as long as the taste doesn’t come from a blue ice cream”, (analogous to “I want to experience the best life there is, for as long as the enjoyment doesn’t come from wireheading”), I wouldn’t have switched.
        
        But such a preference is neurotic. Wire-heading isn’t a discrete, easily distinguishable category. Any number of improvements to your mind are possible. If we start at the very lowest end, chances are that, most of the improvements, you would welcome. Once you have been given those improvements, you would find the next level of improvement desirable. Eventually, you are at the level just below a total wire-head, and you can clearly see that wire-heading is the way to be.
        
        Yet, if you’re given the choice upfront, you will refuse to be a wire-head. This is essentially due to pre-conceived (probably wrong) notions of what matters and what wire-heading is. And the FAI would be correct in fixing you, just like it would be correct in fixing a depressed patient.
        Kaj_Sotala 26 Jan 2010 15:58 UTC
        1 point
        0
        Parent
        The main problem I have with wireheading is the notion of me simply being and not doing anything else. If I could just alter my mind to be maximally or close to maximally happy nearly all the time, but still letting me do all kinds of different things and still be motivated to do various things, I’d have a much smaller problem.
        tut 26 Jan 2010 16:18 UTC
        6 points
        0
        Parent
        Good news for you then: Humans are not understimulated rats. There was an experiment where some psychologists gave some subjects electrodes and a device which stimulated their “reward center” (this was back when it was believed that dopamine was the happiness chemical and desire-wireheading was the same as happiness-wireheading) whenever they pushed a button. They also recorded every time the button was pushed. The subjects carried the electrodes for a while (I believe it was a week) and then returned them. All the subjects went about their lives, doing normal things with about their normal amount of motivation. All of them used the button at least a few times and reported that they liked it. But only one guy used it more than ten times per day, and he was intentionally (but unsuccessfully) using it for classical conditioning.
        What links here?
        Ghatanathoah's comment on Not for the Sake of Pleasure Alone by lukeprog (25 Sep 2012 2:45 UTC; 1 point)
        Morendil 26 Jan 2010 16:21 UTC
        6 points
        0
        Parent
        A reference would be nice—please. :)
        tut 26 Jan 2010 17:13 UTC
        5 points
        0
        Parent
        This is the best I find right now and I need to go to bed. They retell the same anecdote that I referred to at the end of that piece.
        
        Here is the relevant part:
        
        Heath tells us some of his patients were given “self-stimulators” similar to the ones used by Old’s rats. Whenever he felt the urge, the patient could push any of 3 or 4 buttons on the self-stimulator hooked to his belt. Each button was connected to an electrode implanted in a different part of his brain, and the device kept track of the number of times he stimulated each site. … We ask Heath if human beings are as compulsive about pleasure as the rats of Old’s laboratory that self-stimulated until they passed out. “No,” he tells us. “People don’t self-stimulate constantly—as long as they’re feeling good. Only when they’re depressed does the stimulation trigger a big response. There are so many factors that play into a human being’s pleasure response: your experience, your memory system, sensory cues...” he muses.
        
        Though in the version I read several years ago the events were in a different order. And they were actually talking about this as a means to reach the happy equilibrium that Kaj is talking about, so they talked much more about the other subjects in the experiment. I had forgotten that Heath interfered with the gay guy after, because that was kind of downplayed.
        denisbider 26 Jan 2010 15:59 UTC
        −1 points
        0
        Parent
        I imagine the ultimate wireheading would involve complete happiness and interfacing with the FAI’s consciousness, experiencing much more than is possible by a solitary mind.
        What links here?
        Ghatanathoah's comment on Not for the Sake of Pleasure Alone by lukeprog (25 Sep 2012 2:45 UTC; 1 point)
    - Psychohistorian 27 Jan 2010 20:04 UTC
      1 point
      0
      Parent
      
      Now an FAI comes along and tells you that it predicts if you knew what the flavors taste like, you’d choose the blue one instead. Do you not switch to the blue one?
      
      There’s a rather enormous leap between the FAI saying, “Y’know, I think you’d like that one more,” and the FAI altering your brain so you select that one. Providing new information simply isn’t altering someone’s mind in this context.
Unknowns 26 Jan 2010 16:49 UTC
7 points
0
If this argument is correct, then CEV is very, very bad, since it will produce something that nobody in the world wants.
Stuart_Armstrong 26 Jan 2010 13:06 UTC
7 points
0
Thanks, this has clarified some of my thinking on this domain. It also touches on one of my main objection to CEV—I would not trust the opinions of the man that the man I want to be, would want to be. And it get worse the further thart it goes.

We are some messily programmed machines.
- pdf23ds 26 Jan 2010 19:46 UTC
  13 points
  0
  Parent
  My problem with CEV is that who you would be if you were smarter and better-informed is extremely path-dependent. Intelligence isn’t a single number, so one can increase different parts of it in different orders. The order people learn things in, and how fully they integrate that knowledge, and what incidental declarative/affective associations they form with the knowledge, can all send the extrapolated person off in different directions. Assuming a CEV-executor would be taking all that into account, and summing over all possible orders (and assuming that this could be somehow made computationally tractable) the extrapolation would get almost nowhere before fanning out uselessly.
  
  OTOH, I suppose that there would be a few well-defined areas of agreement. At the very least, the AI could see current areas of agreement between people. And if implemented correctly, it at least wouldn’t do any harm.
  - Stuart_Armstrong 27 Jan 2010 10:29 UTC
    1 point
    0
    Parent
    Good point, though I’m not too worried about the path dependency myself; I’m more preoccupied with getting some where “nice and tolerable” than somewhere “perfect”.
LauraABJ 26 Jan 2010 16:44 UTC
6 points
0
Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn’t get tired?

I also disagree with the idea that ‘pleasure’ is what is central to ‘wireheading.’ (I acknowledge that I may need a new term.) I take the broader view that wireheading is getting stuck in a positive feed-back loop that excludes all other activity, and for this to occur, anything positively-reinforcing will do.* For example, let’s say Jane Doe wants to want to exercise, and so modifies her preferences. Now lets say this modification is not calibrated correctly, and so she ends up on the treadmill ²⁴⁄₇, never wanting to get off of it. Though the activity is not pleasurable, she is still stuck in the loop. Even if we would not make a mistake quite this mundane, it is not difficult to imagine similar problems occurring after a few rounds of ‘preference modification’ by free transhumans. If someone has a drive to be satisfied, then satisfied he shall be, one way or another. Simple solutions, like putting in a preference for complexity, may not be sufficient safeguards either. Imagine an entity that spends all of its time computing and tracing infinite fractiles. Pinnacle of human evolution or wirehead?

*Disclaimer: I haven’t yet defined the time parameters. For example, if the loop takes 24 hours to complete as opposed to a few seconds, is it still wireheading? What about 100 years? But I think the general idea is important to consider.
- Kaj_Sotala 26 Jan 2010 17:21 UTC
  5 points
  0
  Parent
  The relevant part of those examples was the fact that it is possible to disentangle pleasure from the desire to keep doing the pleasurable thing. Yes, we could upgrade ourselves to a posthuman state where we don’t get tired after eating or sex, and want to keep doing it all the time. But it wouldn’t be impossible to upgrade us to a state where pleasure and wanting to do something didn’t correlate, either.
  
  I believe the commonly used definition for ‘wireheading’ mainly centers around pleasure, but your question is also important.
- RobinZ 26 Jan 2010 17:33 UTC
  2 points
  0
  Parent
  
  Your examples of getting tired after sex or satisfied after eating are based on current human physiology and neurochemistry, which I think most people here are assuming will no longer confine our drives after AI/uploading. How can you be sure what you would do if you didn’t get tired?
  
  I got bored with playing Gran Turismo all the time in less than a week—the timescale might change, but eventually blessed boredom would rescue me from such a loop.
  
  Edit: From most known loops of this type—I agree with your concern about loops in general.
thomblake 26 Jan 2010 14:25 UTC
6 points
0

More generally, I don’t think any argument that says one is wrong about what they want holds up.

Just to be clear, you don’t think one can be mistaken about what one wants? Does this only work in the present tense? If not, the statement “I thought I wanted that, but now I know that I didn’t” generates a contradiction—the speaker must be actually lying.
- Kaj_Sotala 26 Jan 2010 15:27 UTC
  1 point
  0
  Parent
  Well, in everyday usage people use the expression the way MrHen put it. If you want to define it like that, then yes, you can be mistaken about what you want.
MrHen 26 Jan 2010 14:41 UTC
5 points
0

In fact, “I thought I wouldn’t want to do/experience X, but upon trying it out I realized I was wrong” doesn’t make sense.

I interpret the confusing language to mean, “I did not predict I would want to do X after doing X or learning more about X.” It doesn’t explicitly say that, but when I hear people say things similar it is usually some forecast about their future self, not their current self.
What links here?
- Kaj_Sotala's comment on You cannot be mistaken about (not) wanting to wirehead by Kaj_Sotala (26 Jan 2010 15:27 UTC; 1 point)
zero_call 27 Jan 2010 3:38 UTC
4 points
0
I really like the core ideas of this post but some of the particulars are bothersome to me. For example, it confuses things IMO to talk about wireheading as though it can be modified to be whatever we want—wireheading is wireheading, and it has a rather clear, explicit meaning. (Although the degree of its strength would need to be qualified.)

Anyways, how do you really know what you want? That’s the really key question, which I don’t think you’ve really answered. It’s not just about redefining terms, IMO. There’s real substance to the idea that we have some innate, true sense of desires, yet whose identities elude us. To take the sushi example, the person who tries sushi and loves it had an innate desire, or interest, all along. It might not have been a “want”, but the fact that their preferences changed expresses something true about them. It wasn’t just a matter of definitions and perspectives and so on.

Maybe what you’re saying is that desires are somewhat irrelevant; they can be redefined, reupdated, or completely neglected, and they have little overall worth. So maybe the more interesting question is more straightforward: knowing we would be completely happy and fulfilled in a life of wireheading, should we do it?
- Kaj_Sotala 27 Jan 2010 7:43 UTC
  0 points
  0
  Parent
  
  wireheading is wireheading, and it has a rather clear, explicit meaning
  
  We’ve assumed that it has a clear, explicit meaning, but I don’t think that’s so.
  
  here’s real substance to the idea that we have some innate, true sense of desires, yet whose identities elude us.
  
  In baseline humans and with current technology, yes, it does make sense to use the expression “true desire”. Not that particular desires would be any more “true” than others, but there may be some unrealized desires which, if fulfilled, would lead to the person becoming happier than if those desires weren’t fulfilled. As technology increases, that distinction becomes less meaningful, as we become capable of rebuilding our minds and transforming any desire to such a “true desire”.
  
  If you wanted to keep the distinction even with improving technology, you’d define some class of alterations which are “acceptable” and some which aren’t. “True desires” would then be any wants that could be promoted to such a status using “acceptable” means. Wei Dai started compiling one possible list of such acceptable alterations.
  What links here?
  - Kaj_Sotala's comment on You cannot be mistaken about (not) wanting to wirehead by Kaj_Sotala (27 Jan 2010 7:54 UTC; 1 point)
Jack 26 Jan 2010 19:06 UTC
3 points
0
You’re right that where D is desire and t is time, Dx at t1 is not falsified by D(-x) at t2. Nor is it falsified by D(-x at t1) at t2. But you haven’t come close to showing where B is belief, BDx is necessarily true, or as a special case BDwh is necessarily true (wh is wireheading). Since the latter, not the former, is the titular claim of the post, you have some work left.
- Kaj_Sotala 27 Jan 2010 7:35 UTC
  1 point
  0
  Parent
  I’m afraid you’re a bit too concise for me to follow. Could you elaborate?
  - Jack 27 Jan 2010 23:35 UTC
    2 points
    0
    Parent
    Yeah, sorry. I made the comment right after I got back from my model logic class, so I was thinking in sentence letters and logical connectors.
    
    For me this is the key passage in your post:
    
    In fact, “I thought I wouldn’t want to do/experience X, but upon trying it out I realized I was wrong” doesn’t make sense. Previously the person didn’t want X, but after trying it out they did want X. X has caused a change in their preferences by altering their brain. This doesn’t mean that the pre-X person was wrong, it just means the post-X person has been changed. With the correct technology, anyone can be changed to prefer anything.
    
    This effectively shows that the claim “I desire X”, when made right now can’t be falsified by any desires I might have at different times. I actually don’t think this a point about technology, but a point about desires. Two desires made at different times are allowed to be contradictory, and we don’t even need to bring up wireheading or fancy technology. This phenomenon occurs all the time. We call it regret or changing our mind.
    
    So you have rebutted a common objection to the claim that someone does not want to wirehead. But it doesn’t follow from that that your beliefs about your desires in general, or desires to wirehead in particular, are infallible. Given certain conceptions of what desire/preference means and certain assumptions about the transparency of mental content it might follow that you can’t be wrong about desires (to wirehead and otherwise). But that hasn’t been shown in the OP even though that seems to be the claim the title is making.
    - Kaj_Sotala 30 Jan 2010 16:58 UTC
      2 points
      0
      Parent
      
      Given certain conceptions of what desire/preference means and certain assumptions about the transparency of mental content it might follow that you can’t be wrong about desires (to wirehead and otherwise). But that hasn’t been shown in the OP even though that seems to be the claim the title is making.
      
      Yes, (like I’ve stated in the other comments here), if you use a more broad definition of “mistaken about a want”, then we can easily conclude that one can be mistaken about their wants. I thought the narrowness of the definition of ‘want’ I was using would have been clear from the context, but I apparently succumbed to the illusion of transparency.
timtyler 11 Jun 2011 20:01 UTC
1 point
0
Others have said this already—but your own motives are one of the things that you can be wrong about.
RobertWiblin 29 Jan 2010 16:58 UTC
1 point
0
Silly to worry only about the preferences of your present self—you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.
- Vladimir_Nesov 29 Jan 2010 23:42 UTC
  6 points
  0
  Parent
  
  Silly to worry only about the preferences of your present self—you should also act to change your preferences to make them easier to satisfy. Your potential future self matters as much as your present self does.
  
  Irony? I gather if the “future self” is a rock, which is a state of existence that is easier to satisfy, this rock doesn’t matter as much as your present self.
dclayh 2 Feb 2010 3:45 UTC
0 points
0

Furthermore, even if we define wireheading so that you’d prefer it afterwards, that says nothing about the moral worth of wireheading somebody.

Agreed.