MixedNuts comments on Natural wireheadings: formal request.

MixedNuts 1 Jun 2011 19:38 UTC
1 point
0
The degradation effect I described seems fairly common. Lots of experiments in happiness studies show that set levels adjust ruthlessly.

New fun gets old. We want variety over time as well as space. Doesn’t affect complexity or external referents.

Yes, and Clippy would be right.

I meant that fun-maximizers don’t have more power to move me than paperclips-maximizers.

I do know why I value things outside my own experience.

Why? Honest question, I just get “404” if I ask myself that.

Just that there’s no particular reason for my values to be only about my experience, like there’s no reason for them to be only about parts of the universe that are green.

My perspective is that there are rewards and actions to get these rewards. I think that the rewards are the important thing. They are not instruments to get to my “real” values, they are my values.

Yeah, I understand that. I don’t know why I disagree. I value the reward, but I also value many things about the action.

As an example: in Judaism, you want to set obstacles for yourself and to overcome them (there even is a bunch of rules to prevent this from going overboard). There are vegan Orthodow Jews who complain that keeping kosher is too easy for them and who decide some vegetables count as milk and others as meat and they don’t mix them. Chosen fetters like that appeal to me like mad. It’s a kind of fun, but, I expect, one completely alien to you.

I was set up to favor specific pathways to these rewards for reasons that are not my own (but those of my genes, memes or other influences).

I have to ask where your own reasons come from, causally speaking.

Now you should go for the game-breaker.

Agree we should break the rules to get what we want, disagree about what we want.

But I consider this boredom to be a bad feature!

So you want classical orgasmium, not complex fun with no referent (like falling in love with a chatbot). Glad that’s clear.

Simplicity and elegance are good things.

Yes. But not so good that I’d renounce humor and challenge and pride and aesthetics and freedom and truth and more. Same reason that I can’t decide to burn down an orphanage—I care about making a point about choosing my own values and relativity of morality, but I care about human life more.

I notice that this discussion makes me feel resigned and sad, so I will adjust down my confidence that this is right.

I’ve been adjusting in the opposite direction as well. Truly we are an Aumann cage match made in heaven.

Can you try to elaborate on why you value external things?
- As I said earlier, why wouldn’t I? I value non-green things.
- My brain sucks. It can’t represent the joys and great deeds and wisdom in a single human life (except the one it’s being). Unless other people are clever chatbots, there are more great things in a day of their lives than in my brain in its highest bliss. It sounds just odd that this should be worthless. (Also phrased as: Every morning, I weigh myself. If it’s less than Earth, I save the world.)
- Also, not sure what happens to the value of suicide if you value only your subjective experience. Isn’t it undefined?
- Not directly related, but values are over 4D, not 3D. (Which is why it’s not completely stupid to care about paths, not just endpoints.)
Why you think a holodeck is bad, except for the emotions that come up?

Same reason I think that if I could never have anything more pleasant than ice cream (but I’d live forever and get as much complex and varied fun as I want and nothing horribly bad would be happening), it’d be bad. It’s missing stuff.

(Besides, just pay me more than the 10 bucks Dave offered and I’m not pressing anything. I’m very pain-averse, so no need to go through such extremes. ;))

Shooting you would be a last resort. I like you humans, I don’t wanna kill you.
- [deleted] 8 Jun 2011 0:50 UTC
  1 point
  0
  Parent
  Been thinking more and noticed that I’m confused about how “terminal values” actually work.
  
  It seems like my underlying model of preferences is eliminativist. (Relevant caricature.) Because the decision making process uses (projected and real) rewards to decide between actions, it is only these rewards that actually matter, not the patterns that triggered them. As such, there aren’t complex values and wireheading is a fairly obvious optimization.
  
  To take the position of a self-modifying AI, I might look at my source code and find the final decision making function that takes a list of possible actions and their expected utility. It then returns the action with the maximum utility. It is obvious to me that this function does not “care” about the actions, but only about the utility. I might then be tempted to modify it such that, for example, the list always contains a maximum utility dummy action (aka I wirehead myself). This is clearly what this function “wants”.
  
  But that’s not what “I” want. At the least, I should include the function that rates the actions, too. Now I might modify it so that it simply rates every action as optimal, but that’s taking the perspective of the function that picks the action, not the one that rates it! The rating function actually cares about internal criteria (its terminal values) and circumventing this would be wrong.
  
  The problem then becomes how to find out what those terminal values are and which of those to optimize for. (As humans are hypocritical and revealed preferences often match neither professed nor introspected preferences.) Picking the choosing function as an optimization target is much easier and always consistent.
  
  I’m not confident that this view is right, but I can’t quite reduce preferences in any other consistent way. I checked the Neuroscience of Desire again, but I don’t see how you can extract caring about referents from that. In other words, it’s all just neurons firing. What these neurons optimize is being triggered, not some external state of the world. (Wireheading solution: let’s just trigger them directly.)
  
  For now, I’m retracting my endorsement of wireheading until I have a better understanding of the issue. (I will also try to not blow up any world as I might still need it.)
- [deleted] 2 Jun 2011 23:00 UTC
  0 points
  0
  Parent
  
  I was set up to favor specific pathways to these rewards for reasons that are not my own (but those of my genes, memes or other influences).
  
  I have to ask where your own reasons come from, causally speaking.
  
  Good point. I can’t just disown all reasons or “I” become a rock, which doesn’t appeal to me, identity-wise. I like minimalist identities the most, so I retain pleasure = good, but not reproductive success, for example. In other words, I keep the basic mechanism that evolution gave me to achieve goals, I ignore the meta-goal of reproductive success it had.
  
  I’m not happy with this argument, but I find extended versions that care about externals just as implausible. The choice between both seems arbitrary, so I go with the simpler one for now.
  
  Also, not sure what happens to the value of suicide if you value only your subjective experience. Isn’t it undefined?
  
  Yes. Death itself has fairly close to 0 utility to me, but I don’t like dying (because of the pain and shame it causes me, mostly), so I’m normally against suicide.
  
  Can you try to elaborate on why you value external things? As I said earlier, why wouldn’t I? I value non-green things.
  
  Ok, fair. I can’t provide a better case for even why “pleasure” is good, but “pain” ain’t. It just feels that way to me. That’s just how the algorithm works. I’m just surprised that this difference in perceived values exists. If I further add MrMind’s stated values, either terminal value acquisition is fairly shaky and random in humans or easy to manipulate or very hard to introspect on, despite the appearance to the contrary.
  
  A thought experiment. Imagine “reality” disappears suddenly and you wake up in Omega’s Simulation Chamber. Omega explains that all your life has been a simulation of the wallpaper kind. There weren’t any other minds, only ELIZA-style chatbots (but more sophisticated). Would this make you sad?
  
  I don’t get a particularly bad response from that, maybe only slight disappointment because I was mistaken about the state of the world. I take that as weak evidence that I don’t care much about referents. But maybe I just have shitty relationships with people and nothing much to lose, so I’ll try improving in that regard first, to make that intuition more reliable. (That’s gotta take me some time.)
  
  ETA:
  
  The degradation effect I described seems fairly common. Lots of experiments in happiness studies show that set levels adjust ruthlessly.
  
  New fun gets old. We want variety over time as well as space. Doesn’t affect complexity or external referents.
  
  What about sustainability? What if we run out of interesting complexity?
  - [deleted] 8 Jun 2011 0:49 UTC
    0 points
    0
    Parent
    Been thinking more and noticed that I’m confused about how “terminal values” actually work.
    
    It seems like my underlying model of preferences is eliminativist. (Relevant caricature.) Because the decision making process uses (projected and real) rewards to decide between actions, it is only these rewards that actually matter, not the patterns that triggered them. As such, there aren’t complex values and wireheading is a fairly obvious optimization.
    
    To take the position of a self-modifying AI, I might look at my source code and find the final decision making function that takes a list of possible actions and their expected utility. It then returns the action with the maximum utility. It is obvious to me that this function does not “care” about the actions, but only about the utility. I might then be tempted to modify it such that, for example, the list always contains a maximum utility dummy action (aka I wirehead myself). This is clearly what this function “wants”.
    
    But that’s not what “I” want. At the least, I should include the function that rates the actions, too. Now I might modify it so that it simply rates every action as optimal, but that’s taking the perspective of the function that picks the action, not the one that rates it! The rating function actually cares about internal criteria (its terminal values) and circumventing this would be wrong.
    
    The problem then becomes how to find out what those terminal values are and which of those to optimize for. (As humans are hypocritical and revealed preferences often match neither professed nor introspected preferences.) Picking the choosing function as an optimization target is much easier and almost consistent.
    
    I’m not confident that this view is right, but I can’t quite reduce preferences in any other consistent way. I checked the Neuroscience of Desire again, but I don’t see how you can extract caring about referents from that. In other words, it’s all just neurons firing. What these neurons optimize is being triggered, not some external state of the world. (Wireheading solution: let’s just trigger them directly.)
    
    For now, I’m retracting my endorsement of wireheading until I have a better understanding of the issue. (I will also try to not blow up any world as I might still need it.)