steven0461 comments on The Blue-Minimizing Robot

steven0461 5 Jul 2011 21:56 UTC
26 points

What does this robot “actually want”, given that the world is not really a 2D grid of cells that have intrinsic color?

Who cares about the question what the robot “actually wants”? Certainly not the robot. Humans care about the question what they “actually want”, but that’s because they have additional structure that this robot lacks. But with humans, you’re not limited to just looking at what they do on auto-pilot; instead, you can just ask the aforementioned structure when you run into problems like this. For example, if you asked me what I really wanted under some weird ontology change, I could say, “I have some guesses, but I don’t really know; I would like to defer to a smarter version of me”. That’s how I understand preference extrapolation: not as something that looks at what your behavior suggests that you’re trying to do and then does it better, but as something that poses the question of what you want to some system you’d like to answer the question for you.

It looks to me like there’s a mistaken tendency among many people here, including some very smart people, to say that I’d be irrational to let my stated preferences deviate from my revealed preferences; that just because I seem to be trying to do something (in some sense like: when my behavior isn’t being controlled much by the output of moral philosophy, I can be modeled as a relatively good fit to a robot with some particular utility function), that’s a reason for me to do it even if I decide that I don’t want to. But rational utility maximizers get to be indifferent to whatever the heck they want, including their own preferences, so it’s hard for me to see why the underdeterminedness of the true preferences of robots like this should bother me at all.

Insert standard low confidence about me posting claims on complicated topics that others seem to disagree with.
What links here?
- cousin_it's comment on Humans aren’t agents—what then for value learning? by Charlie Steiner (17 Mar 2019 8:59 UTC; 4 points)
- cousin_it's comment on Aligned AI as a wrapper around an LLM by cousin_it (25 Mar 2023 17:46 UTC; 3 points)
- Wei Dai 6 Jul 2011 1:37 UTC
  20 points
  Parent
  In other words, our “actual values” come from our being philosophers, not our being consequentialists.
  
  It seems plausible to me, and I’m not sure that “many” others do disagree with you.
  - cousin_it 4 Aug 2011 12:13 UTC
    4 points
    Parent
    That would imply a great diversity of value systems, because philosophical intuitions differ much more from person to person than primitive desires. Some of these value systems (maybe including yours) would be simple, some wouldn’t. For example, my “philosophical” values seem to give large weight to my “primitive” values.
- [deleted] 10 Jul 2011 22:52 UTC
  4 points
  Parent
  
  preference extrapolation: not as something that looks at what your behavior suggests that you’re trying to do and then does it better, but as something that poses the question of what you want to some system you’d like to answer the question for you
  
  That might be a procedure that generates human preference, but it is not a general preference extrapolation procedure. E.g suppose we replace Wei Dai’s simple consequentialist robot with a robot that has similar behavior, but that also responds to the question, “What system do you want to answer the question of what you want for you?” with the answer, “A version of myself better able to answer that question. Maybe it should be smarter and know more things and be nicer to strangers and not have scope insensitivity and be less prone to skipping over invisible moral frameworks and have conecepts that are better defined over attribute space and be automatically strategic and super commited and stuff like that? But since I’m not that smart and I pass over moral frameworks and stuff, eveything I just said is probably insufficient to specify the right thing. Maybe you can look at my source code and figure out what I mean by right and then do the thing that a person who better understood that would do?” And then goes right back to zapping blue.
- [deleted] 10 Jul 2011 22:32 UTC
  1 point
  Parent
  Suppose we replace Wei Dai’s simple consequentialist robot with a robot that has similar behavior, but that also responds to the question, “What system do you want to answer the question of what you want for you?” with the answer, “I want to decide for myself” and responds to the question, “What do you want to do?” with the answer, “I want to make babies happy. Oh and help grandmother out of the burning building. Oh, and without killing her. Oh, and to preserve complex novelty. Oh and boredom. Oh, and there should still be people in the world who are trying to improve it. Oh and...damnit, this is complicated. Okay, never mind, I want you to ask the version of myself who I presently think is smart enough to answer this question and who knows what the right thing to do is even better than me.”
  
  It can answer those two questions, but if you ask it to clarify the last response, it just blows up.