The Human’s Hidden Utility Function (Maybe)

lukeprog23 Jan 2012 19:39 UTC

68 points

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don’t act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice. And suppose that upon reflection we would clearly reject the outputs of two of these systems, whereas the third system looks something more like a utility function we might be able to use in CEV.

What I just described is part of the leading theory of choice in the human brain.

Recall that human choices are made when certain populations of neurons encode expected subjective value (in their firing rates) for each option in the choice set, with the final choice being made by an argmax or reservation price mechanism.

Today’s news is that our best current theory of human choices says that at least three different systems compute “values” that are then fed into the final choice circuit:

The model-based system “uses experience in the environment to learn a model of the transition distribution, outcomes and motivationally-sensitive utilities.” (See Sutton & Barto 1998 for the meanings of these terms in reinforcement learning theory.) The model-based system also “infers choices by… building and evaluating the search decision tree to work out the optimal course of action.” In short, the model-based system is responsible for goal-directed behavior. However, making all choices with a goal-directed system using something like a utility function would be computationally prohibitive (Daw et al. 2005), so many animals (including humans) first evolved much simpler methods for calculating the subjective values of options (see below).
The model-free system also learns a model of the transition distribution and outcomes from experience, but “it does so by caching and then recalling the results of experience rather than building and searching the tree of possibilities. Thus, the model-free controller does not even represent the outcomes… that underlie the utilities, and is therefore not in any position to change the estimate of its values if the motivational state changes. Consider, for instance, the case that after a subject has been taught to press a lever to get some cheese, the cheese is poisoned, so it is no longer worth eating. The model-free system would learn the utility of pressing the lever, but would not have the informational wherewithal to realize that this utility had changed when the cheese had been poisoned. Thus it would continue to insist upon pressing the lever. This is an example of motivational insensitivity.”
The Pavlovian system, in contrast, calculates values based on a set of hard-wired preparatory and consummatory “preferences.” Rather than calculate value based on what is likely to lead to rewarding and punishing outcomes, the Pavlovian system calculates values consistent with automatic approach toward appetitive stimuli, and automatic withdrawal from aversive stimuli. Thus, “animals cannot help but approach (rather than run away from) a source of food, even if the experimenter has cruelly arranged things in a looking-glass world so that the approach appears to make the food recede, whereas retreating would make the food more accessible (Hershberger 1986).”

Or, as Jandila put it:

Model-based system: Figure out what’s going on, and what actions maximize returns, and do them.
Model-free system: Do the thingy that worked before again!
Pavlovian system: Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.

In short:

We have described three systems that are involved in making choices. Even in the case that they share a single, Platonic, utility function for outcomes, the choices they express can be quite different. The model-based controller comes closest to being Platonically appropriate… The choices of the model-free controller can depart from current utilities because it has learned or cached a set of values that may no longer be correct. Pavlovian choices, though determined over the course of evolution to be appropriate, can turn out to be instrumentally catastrophic in any given experimental domain...

[Having multiple systems that calculate value] is [one way] of addressing the complexities mentioned, but can lead to clashes between Platonic utility and choice. Further, model-free and Pavlovian choices can themselves be inconsistent with their own utilities.

We don’t yet know how choice results from the inputs of these three systems, nor how the systems might interact before they deliver their value calculations to the final choice circuit, nor whether the model-based system really uses anything like a coherent utility function. But it looks like the human might have a “hidden” utility function that would reveal itself if it wasn’t also using the computationally cheaper model-free and Pavlovian systems to help determine choice.

At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system’s preferences as representing “my values,” and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.

On the other hand, as Eliezer points out, perhaps we ought to be suspicious of this, because “it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone’s going to jump up and say: ‘Ha ha! Love and friendship were actually in the other two!’”

Unfortunately, it’s too early to tell whether these results will be useful for CEV. But it’s a little promising. This is the kind of thing that sometimes happens when you hack away at the edges of hard problems. This is also a repeat of the lesson that “you can often out-pace most philosophers simply by reading what today’s leading scientists have to say about a given topic instead of reading what philosophers say about it.”

(For pointers to the relevant experimental data, and for an explanation of the mathematical role of each valuation system in the brain’s reinforcement learning system, see Dayan (2011). All quotes in this post are from that chapter, except for the last one.)

What links here?

lukeprog23 Jan 2012 19:39 UTC

68 points

91 comments3 min readLW link Archive

Utility Functions Neuroscience AI

Scott Alexander 24 Jan 2012 18:03 UTC
36 points
0

This is also a repeat of the lesson that “you can often out-pace most philosophers simply by reading what today’s leading scientists have to say about a given topic instead of reading what philosophers say about it.”

On the other hand, rationality can be faster than science. And I’m feeling pretty good about positing three different forms of motivation, divided between model-free tendencies based on conditioning, and model-based goals, then saying we could use transhumanism to focus on the higher-level rational ones, without having read the particular neuroscience you’re citing...

...actually, wait. I read as much of the linked paper as I could (Google Books hides quite a few pages) and I didn’t really see any strong neuroscientific evidence. It looked like they were inferring the existence of the three systems from psychology and human behavior, and then throwing in a bit of neuroscience by mentioning some standard results like the cells that represent error in reinforcement learning. What I didn’t see was a description of how three separate systems naturally fall out of brain studies. But I missed a lot of the paper—is there anything like that in there?
- lukeprog 25 Jan 2012 21:50 UTC
  10 points
  0
  Parent
  
  What I didn’t see was a description of how three separate systems naturally fall out of brain studies. But I missed a lot of the paper—is there anything like that in there?
  
  Some, yes. I’ve now updated the link in the OP so it points to a PDF of the full chapter.
Eliezer Yudkowsky 23 Jan 2012 22:32 UTC
19 points
0
Um, objection, I didn’t actually say that and I would count the difference as pretty significant here. I said, “I would be suspicious of that for the inverse reason my brain wants to say ‘but there has to be a different way to stop the train’ in the trolley problem—it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone’s going to jump up and say: ‘Ha ha! Love and friendship were actually in the other two!’”
What links here?
- The Human’s Hidden Utility Function (Maybe) by lukeprog (23 Jan 2012 19:39 UTC; 68 points)
- lukeprog 23 Jan 2012 22:42 UTC
  10 points
  0
  Parent
  What? You said that? Sorry, I didn’t mean to misquote you so badly. I’ll blame party distractions or something. Do you remember the line about a gift basket and it possibly making CEV easier?
  
  Anyway, I’ll edit the OP immediately to remove the misquote.
  
  For reference, the original opening to this post was:
  
  Me: “Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don’t act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice. And suppose that upon reflection we would clearly reject the outputs of two of these systems, whereas the third system looks something more like a utility function. How would you feel?”
  
  Eliezer: “I would feel like someone had left an enormous gift basket at my front door. That could make CEV easier.”
  
  Me: “Okay, well, what I just described is part of the leading theory of choice in the human brain.”
Nick_Beckstead 7 Feb 2012 17:35 UTC
17 points
0
What’s the evidence that this is the “leading theory of choice in the human brain”? (I am not saying I have evidence that it isn’t, but it’s important for this post that some large relevant section of the scientific community thinks this theory is awesome.)
cousin_it 23 Jan 2012 21:44 UTC
14 points
0
Congratulations on continuing this line of inquiry!

One thing that worries me is that it seems to focus on the “wanting” part to the exclusion of the “liking” part, so we may end up in a world we desire today but won’t enjoy tomorrow. In particular, I suspect that a world built according to our publicly stated preferences (which is what many people seem to think when they hear “reflective equilibrium”) won’t be very fun to live in. That might happen if we get much of our fun from instinctive and Pavlovian actions rather than planned actions, which seems likely to be true for at least some people. What do you think about that?
- lukeprog 23 Jan 2012 22:02 UTC
  11 points
  0
  Parent
  I think that upon reflection, we would desire that our minds be designed in such a way that we get pleasure from getting the things we want, or pleasure whenever we want, or something — instead of how the system is currently set up, where we can’t always choose when we feel good and we only sometimes feel good as a result of getting what we want.
  - Multiheaded 24 Jan 2012 17:34 UTC
    0 points
    0
    Parent
    Yeah, I agree. I said that we should, in principle, rewire ourselves for this very reason in Bakkot’s (in)famous introduction thread, but Konkvistador replied he’s got reasons to be suspicious and fearful about such an undertaking.
  - [deleted] 24 Jan 2012 1:24 UTC
    0 points
    0
    Parent
    It would be nice if liking and wanting coincided, but why does “make pleasurable that which we desire” sound better to you than “make desirable that which we find pleasurable”?
    
    Suppose Kelly can’t stop thinking about pickle milkshakes. “Oh dang,” thinks she, “I could go for a pickle milkshake”. But in fact, she’d find a pickle milkshake quite gross. What would Kelly-mature want for Kelly-now? Have someone tell her that pickle milkshakes are gross? Modify her tongue to enjoy pickle milkshakes? Directly make her stop wanting pickle milkshakes? Search flavour space for a beverage superficially similar to pickle milkshakes that does not upset her stomach? Take second order utilities into account and let her drink the milkshake, provided it’s not damaging to her health in the long term, so that she’s in control of and can learn from her pickle milkshake experiences?
    
    The things you listed sound modifying Kelly’s tongue. Is that a fair characterization?
Vladimir_Nesov 23 Jan 2012 20:58 UTC
14 points
0

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don’t act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations

Humans violate any given set of axioms simply because they are not formally flawless, so such explanations only start being relevant when discussing an idealization, in this case a descriptive one. But properties of descriptive idealizations don’t easily translate into properties of normative idealizations.
Alicorn 23 Jan 2012 19:59 UTC
10 points
0
The quoted summaries of each of the three systems are confusing and I don’t feel like I have an understanding of them, except insofar as the word “Pavlovian” gives a hint. Can you translate more clearly, please?
- [deleted] 23 Jan 2012 20:11 UTC
  40 points
  0
  Parent
  Or, to put it more simply:
  1. Figure out what’s going on, and what actions maximize returns, and do them.
  2. Do the thingy that worked before again!
  3. Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.
  What links here?
  - The Human’s Hidden Utility Function (Maybe) by lukeprog (23 Jan 2012 19:39 UTC; 68 points)
  - lukeprog 23 Jan 2012 20:25 UTC
    8 points
    0
    Parent
    Added to the original post, credit given.
    What links here?
    lukeprog's comment on A Voting Puzzle, Some Political Science, and a Nerd Failure Mode by ChrisHallquist (10 Oct 2013 20:07 UTC; 3 points)
    - JoachimSchipper 24 Jan 2012 12:11 UTC
      7 points
      0
      Parent
      Could you put it before the hard-to-parse explanations? It was nice to confirm my understanding, but it would have saved me a minute or two of effort if you’d put those first.
  - Shmi 23 Jan 2012 21:22 UTC
    6 points
    0
    Parent
    Maybe give Luke a lesson or two on C^3 (clear, concise and catchy) summaries.
    - lukeprog 23 Jan 2012 22:05 UTC
      1 point
      0
      Parent
      Note that I wrote this post in two hours flat and made little attempt to optimize presentation in this case.
      - Shmi 23 Jan 2012 22:58 UTC
        12 points
        0
        Parent
        Sorry, I did not intend my comment to rub you the wrong way (or any of my previous comments that might have). FWIW, I think that you are doing a lot of good stuff for the SIAI, probably most of it invisible to an ordinary forum regular. I realize that you cannot afford spending extra two hours per post on polishing the message. Hopefully one of the many skills of your soon-to-be-hired executive assistant will be that of “optimizing presentation”.
        lukeprog 23 Jan 2012 22:59 UTC
        7 points
        0
        Parent
        No worries!
        MACHISMO 26 Jan 2012 21:49 UTC
        4 points
        0
        Parent
        Indeed. Much invisible work is required before optimization can occur. Invisible forging of skills precedes their demonstration.
      - lukeprog 26 Jan 2012 17:56 UTC
        6 points
        0
        Parent
        For my own reference, here are the posts I tried to write well:
        
        Secure Your Beliefs
        Optimal Philanthropy for Human Beings
        A Rationalist’s Tale
        Existential Risk
        Rationality Lessons Learned from Irrational Adventures in Romance
        What Curiosity Looks Like
        Can the Chain Still Hold You?
        TheOtherDave 26 Jan 2012 19:01 UTC
        2 points
        0
        Parent
        It might be an interesting exercise to record predictions in a hidden-but-reliable form about karma of posts six months out, by way of calibrating one’s sense of how well-received those posts will be to their target community.
      - Swimmer963 (Miranda Dixon-Luinenburg) 23 Jan 2012 23:29 UTC
        1 point
        0
        Parent
        It’s still better than the posts I write in 2 hours! Did that 2 hours include the time spent researching, or were you just citing sources you’d already read for other reasons? In either case...not bad.
  - Scott Alexander 24 Jan 2012 4:02 UTC
    3 points
    0
    Parent
    Is 2 operant/Skinnerian conditioning, and 3 classical/Pavolvian conditioning?
    - [deleted] 24 Jan 2012 7:44 UTC
      5 points
      0
      Parent
      If by “is” you mean “Do these correspond the underlying cognitive antecedents used in...”, then my answer is “it would seem so.”
- [deleted] 23 Jan 2012 20:08 UTC
  8 points
  0
  Parent
  The first one incorporates information about past experiences into simplified models of the world, and then uses the models to steer decisions through search-space based upon a sort of back-of-the-envelope, hazy calculation of expected value. It’s a utility function, basically, as implemented by brain.
  
  The second one also incorporates information about past experiences, but rather than constructing the dataset into a model and performing searches over it, it derives expectations directly from what’s remembered, and is insensitive to things like probability or shifting subjecting values.
  
  The third one is sort of like the first in its basic operations (incorporate information, analyze it, make models) -- but instead of calculating expected values, it aims to satisfy various inbuilt “drives”, and sorts paths through search space based upon approach/avoid criteria linked to those drives.
- lukeprog 23 Jan 2012 20:19 UTC
  0 points
  0
  Parent
  I like Jandila’s explanations.
FiftyTwo 23 Jan 2012 21:13 UTC
8 points
1
I’m not sure I understand the difference between 2 and 3. The term pavlovian is being applied to the third system, but 2 sounds more like the archtypal pavlovian learned response (dog learns that bell results in food). Does 3 refer exclusively to pre-encoded pleasant/unpleasant responses rather than learned ones? Or is there maybe a distinction between a value and an action response that I’m missing?
- Swimmer963 (Miranda Dixon-Luinenburg) 23 Jan 2012 23:24 UTC
  2 points
  0
  Parent
  It appears to me like 3 is only pre-encoded preferences, whereas 2 refers to preferences that are learned in an automatic, “reflex-like” way...which, yeah, sounds a lot like the Pavlovian learned response.
BrianNachbar 27 Jan 2012 15:04 UTC
7 points
0
Where do the model-based system’s terminal goals come from?
Linda Linsefors 25 Aug 2022 11:11 UTC
6 points
0
If anyone reads this comment…
Do you know if this claims are have held up? Does this post still agree with current neuroscience, or have there been some major updates?
- Gunnar_Zarncke 26 Aug 2022 22:04 UTC
  3 points
  0
  Parent
  I think the three sub-systems can be loosely mapped to the structure discussed in the [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering as follows:
  - the model-based system is the Learning System, except that the Learning System doesn’t calculate value but only learns to model better via reward prediction error.
  - the Pavlovian system is the Steering System and is the only system that provides ground truth “value” (this value is low-level reward; abstract concepts of value are formed by the learning system around this ground truth, but these exist only in so far as they are useful to predict the ground truth).
  - the model-free system doesn’t exist as a separate system but is in the shallower parts of the Learning System. I don’t think it maps to the Thought Assessor but may be wrong.
  In this framework, one could say, as Eliezer suspected, that the value originated outside the model-based system.
jimmy 24 Jan 2012 0:35 UTC
6 points
0
I’m skeptical of any clear divide between the systems. Of course, there are more abstract and more primitive information paths, but they talk to each other, and I don’t buy that they can be cleanly separated.

Plans can be more or less complicated, and can involve “I don’t know how this part works, but it worked last time, so lets do this” and what worked last time can be very pleasurable and rewarding—so it doesn’t seem to break down cleanly into any one category.

I’d also argue that, to the extent that abstract planning is successful, it is because it propagates top down and affects the lower pavlovian systems. If your thoughts about your project aren’t associated with motivation and wanting to actually do something, then your abstract plans aren’t of much use. It just isn’t salient that this happening unless the process is disrupted and you find yourself not doing what you “want” to do.

Another point that is worth stating explicitly is that algorithms for maximizing utility are not utility functions. In theory, you could have 3 different optimizers that all maximize the same utility function, or 3 different utility functions that all use the same optimizer—or any other combination.

I don’t think this is a purely academic distinction either—I think that we have conflicts at the same level all the time (multiple personality disorder being an extreme case). Conflicts between systems with no talk at between levels look like someone saying they want something, and then doing another without looking bothered at all. When someone is obviously pained by the conflict, then they are clearly both operating on an emotional level, even if the signal originated at different places. Or I could create a pavlovian conflict in my dog by throwing a steak on the wet tile, and watching as his conditioned fear of wet tile fights the conditioned desire of steak.
Multiheaded 24 Jan 2012 17:35 UTC
4 points
0

‘Ha ha! Love and friendship were actually in the other two!’”

This concern is not-abstract and very personal for me. As I’ve said around here before, I often find myself exhibiting borderline-sociopathic thinking in many situations, but the arrangement of empathy and ethical inhibitions in my brain, though off-kilter in many ways*, drives me to take even abstract ethical problems (LW examples: Three Worlds Collide, dust specks, infanticide, recently Moldbug’s proposal of abolishing civil rights for the greater good) very personally, generates all kinds of strong emotions about them—yet it has kept me from doing anything ugly so far.

(The most illegal thing I’ve done in my life during the moments when I ‘let myself go’ was some petty and outwardly irrational shoplifting in my teenage years; reflecting back upon that, I did it not solely to get an adrenaline rush but also to push my internal equilibrium into a place where this “superego” thing would recieve an alarm and come back online)

What if this internal safety net of mine is founded solely upon #2 and #3?

( As I’ve mentioned in some personal anecdotes, - and hell, I don’t wish to drone on and on about this, just feeling it’s relevant - this part of me has been either very weak or dormant until I watched Evangelion when I was 18. The weird, lingering cathartic sensation and the feeling of psychological change, which felt a little like growing up several years in a week, was the most interesting direct experience in my life so far. However, I’ve mostly been flinching from consciously* trying to push myself towards the admirable ethics of interpersonal relations that I view as the director’s key teaching. It’s painful enough when it’s happening without conscious effort on your part!)
- TheOtherDave 24 Jan 2012 18:05 UTC
  2 points
  0
  Parent
  Do you have any particular reason for expecting it to be?
  
  Or is this a more general “what if”? For example, if you contemplate moving to a foreign country, do you ask yourself what if your internal safety net is founded solely on living in the country you live in now?
  - JoachimSchipper 25 Jan 2012 12:38 UTC
    2 points
    0
    Parent
    I’m not Multiheaded, but it feels-as-if the part of brain that does math has no problem at all personally slaughtering a million people if it saves one million and ten (1); the ethical injunction against that, which is useful, feels-as-if it comes from “avoid the unpleasant (c.q. evil) thing”. (Weak evidence based on introspection, obviously.)
    
    (1) Killing a million people is really unpleasant, but saving ten people should easily overcome that even if I care more about myself than about others.
    - Multiheaded 26 Jan 2012 22:57 UTC
      0 points
      0
      Parent
      Rougly that; I’ve thought about it in plenty more detail, but everything beyond this summary feels vague and I’m too lazy currently to make it coherent enough to post.
  - Multiheaded 24 Jan 2012 18:07 UTC
    0 points
    0
    Parent
    
    Do you have any particular reason for expecting it to be?
    
    It feels like I do, but it’ll take a bit of very thoughtful writing to explicate why. So maybe I’ll explain it here later.
- [deleted] 24 Jan 2012 18:28 UTC
  1 point
  0
  Parent
  .
Bugmaster 25 Jan 2012 4:28 UTC
3 points
0
This might be a silly question, but still:

Are the three models actually running on three different sets of wetware within the brain, or are they merely a convenient abstraction of human behavior ?
- BrianNachbar 27 Jan 2012 19:39 UTC
  0 points
  0
  Parent
  I think what matters is whether they’re concurrent—which it sounds like they are. Basically, whether they’re more or less simultaneous and independent. If you were emulating a brain on a computer, they could all be on one CPU, or on different ones, and I don’t think anyone would suggest that the em on the single CPU should get a different CEV than an identical one on multiple CPUs.
  - Bugmaster 27 Jan 2012 19:56 UTC
    4 points
    0
    Parent
    I was really more interested in whether or not we can observe these models running independently in real, currently living humans (or chimps or rats, really). This way, we could gather some concrete evidence in favor of this three-model approach; and we could also directly measure how strongly the three models are weighted relative to each other.
    - MaoShan 16 Feb 2012 3:24 UTC
      −2 points
      0
      Parent
      If you could reduce the cognitive cost of the model-based system by designing a “decision-making app”, you could directly test if it was beneficial and actually (subjectively or otherwise) improved the subject’s lives. If it was successful, you’d have a good chance of beta-testing a real CEV.
Vladimir_Nesov 23 Jan 2012 20:42 UTC
3 points
0
It seems to me that the actual situation is that upon reflection we would clearly reject (most of) the outputs of all three systems. What human brain actually computes, in any of its modules or in all of them together, is not easily converted into considerations about how the decisions should be made.

In other words, the valuations made by human valuation systems are irrelevant, even though the only plausible solution involves valuations based on human valuation systems. And converting brains into definitions of value will likely break any other abstractions about the brains that theorize them as consisting of various modules with various purposes.
- lukeprog 23 Jan 2012 21:02 UTC
  1 point
  0
  Parent
  I said that ” it seems that upon reflection I would embrace an extrapolation of the model-based system’s preferences as representing ‘my values’.”
  
  Which does, in fact, mean that I would reject “most of the outputs of all three systems.”
  
  Note: I’ve since changed “would” to “might” in that sentence.
  - Vladimir_Nesov 23 Jan 2012 21:14 UTC
    1 point
    0
    Parent
    
    I said that ” it seems that upon reflection I would embrace an extrapolation of the model-based system’s preferences as representing ‘my values’.”
    
    OK, didn’t notice that; I was referring more to the opening dialog. Though “extrapolation” still doesn’t seem to fit, because brain “modules” are not the same kind of thing as goals. Two-step process where first you extract “current preferences” and then “extrapolate” them is likely not how this works, so positing that you get the final preferences somehow starting from the brains is weaker (and correspondingly better, in the absence of knowledge of how this is done).
    - lukeprog 23 Jan 2012 22:03 UTC
      0 points
      0
      Parent
      I agree that the two-step process may very well not work. This is an extremely weak and preliminary result. There’s a lot more hacking at the edges to be done.
      - Vladimir_Nesov 23 Jan 2012 22:43 UTC
        2 points
        0
        Parent
        
        I agree that the two-step process may very well not work. This is an extremely weak and preliminary result.
        
        What are you referring to by “this” in the second sentence? I don’t think there is a good reason to posit the two-step process, so if this is what you refer to, what’s the underlying result, however weak and preliminary?
        lukeprog 23 Jan 2012 22:49 UTC
        0 points
        0
        Parent
        By “this” I meant the content of the OP about the three systems that contribute to choice.
        Vladimir_Nesov 23 Jan 2012 22:55 UTC
        0 points
        0
        Parent
        OK, in that case I’m confused, since I don’t see any connection between the first and the second sentences...
        lukeprog 23 Jan 2012 22:59 UTC
        3 points
        0
        Parent
        Let me try again:
        
        Two-step process = (1) Extract preferences, (2) Extrapolate preferences. This may not work. This is one reason that this discovery about three valuation systems in the brain is so weak and preliminary for the purposes of CEV. I’m not sure it will turn out to be relevant to CEV at all.
        Vladimir_Nesov 23 Jan 2012 23:31 UTC
        6 points
        0
        Parent
        I see, so the two-step thing acts as a precondition. Is it right that you are thinking of descriptive idealization/analysis of human brain as a path that might lead to definition of “current” (extracted) preferences, which is then to be corrected by “extrapolation”? If so, that would clarify for me your motivation for hoping to get anything FAI-relevant out of neuroscience: extrapolation step would correct the fatal flaws of the extraction step.
        
        (I think extrapolation step (in this context) is magic that can’t work, and instead analysis of human brain must extract/define the right decision problem “directly”, that is formally/automatically, without losing information during descriptive idealization performed by humans, which any object-level study of neuroscience requires.)
        lukeprog 24 Jan 2012 0:37 UTC
        5 points
        0
        Parent
        Extraction + extrapolation is one possibility, though at this stage in the game it still looks incoherent to me. But sometimes things look incoherent before somebody smart comes along and makes them coherent and tractable.
        
        Another possibility is that an FAI uploads some subset of humans and has them reason through their own preferences for a million subjective years and does something with their resulting judgments and preferences. This might also be basically incoherent.
        
        Another possibility is that a single correct response to preferences falls out of game theory and decision theory, as Drescher attempts in Good and Real. This might also be incoherent.
        Vladimir_Nesov 24 Jan 2012 0:58 UTC
        3 points
        0
        Parent
        In these terms, the plan I see as the most promising is that the correct way of extracting preferences from humans that doesn’t require further “extrapolation” falls out of decision theory.
        
        (Not sure what you meant by Drescher’s option (what’s “response to preferences”?): does the book suggest that it’s unnecessary to use humans as utility definition material? In any case, this doesn’t sound like something he would currently believe.)
        What links here?
        Vladimir_Nesov's comment on The Human’s Hidden Utility Function (Maybe) by lukeprog (25 Jan 2012 17:37 UTC; 0 points)
        Expand this thread
        lukeprog 24 Jan 2012 1:03 UTC
        0 points
        0
        Parent
        As I recall, Drescher still used humans as utility definition material but thought that there might be a single correct response to these utilities — one which falls out of decision theory and game theory.
        Vladimir_Nesov 24 Jan 2012 1:19 UTC
        1 point
        0
        Parent
        What’s “response to utilities” (in grandparent you used “response to preferences” which I also didn’t understand)? Response of what for what purpose? (Perhaps, the right question is about what you mean by “utilities” here, as in extracted/descriptive or extrapolated/normative.)
        lukeprog 24 Jan 2012 7:28 UTC
        1 point
        0
        Parent
        
        Response of what for what purpose?
        
        Yeah, I don’t know. It’s kind of like asking what “should” or “ought” means. I don’t know.
        Vladimir_Nesov 24 Jan 2012 13:40 UTC
        5 points
        0
        Parent
        No, it’s not a clarifying question about subtleties of that construction, I have no inkling of what you mean (seriously, no irony), and hence fail to parse what you wrote (related to “response to utilities” and “response to preferences”) at the most basic level. This is what I see in the grandparent:
        
        Drescher still used humans as utility definition material but thought that there might be a single correct borogove — one which falls out of decision theory and game theory.
        
        lukeprog 25 Jan 2012 1:51 UTC
        0 points
        0
        Parent
        For our purposes, how about...
        
        Drescher still used humans as utility definition material but thought that there might be a single, morally correct way to derive normative requirements from values — one which falls out of decision theory and game theory.
        
        Vladimir_Nesov 25 Jan 2012 2:16 UTC
        2 points
        0
        Parent
        Still no luck. What’s the distinction between “normative requirements” and “values”, in what way are these two ideas (as intended) not the same?
        lukeprog 25 Jan 2012 6:09 UTC
        0 points
        0
        Parent
        Suppose that by “values” in that sentence I meant something similar to the firing rates of certain populations of neurons, and by “normative requirements” I meant what I’d mean if I had solved metaethics.
        Vladimir_Nesov 25 Jan 2012 10:05 UTC
        1 point
        0
        Parent
        Then that would refer to the “extrapolation” step (falling out of decision theory, as opposed to something CEV-esque), and assume that the results of an “extraction” step are already available, right? Does (did) Drescher hold this view?
        lukeprog 25 Jan 2012 14:03 UTC
        0 points
        0
        Parent
        From what I meant, it needn’t assume that the results of an extraction step are already available, and I don’t recall Drescher talking in so much detail about it. He just treats humans as utility material, however that might work.
        Vladimir_Nesov 25 Jan 2012 17:37 UTC
        0 points
        0
        Parent
        OK, thanks! That would agree with my plan then.
        
        (In general, it’s not clear in what ways descriptive “utility” can be more useful than original humans, or what it means as “utility”, unless it’s already normative preference, in which case it can’t be “extrapolated” any further. “Extrapolation” makes more sense as a way of constructing normative preference from something more like an algorithm that specifies behavior, which seems to be CEV’s purpose, and could then be seen as a particular method of extraction-without-need-for-extrapolation.)
        pjeby 23 Jan 2012 23:07 UTC
        2 points
        0
        Parent
        I think you’ve also missed the possibility that all three “systems” might just be the observably inconsistent behavior of one system in different edge cases, or at least that the systems are far more entangled and far less independent than they seem.
        
        (I think you may have also ignored the part where, to the extent that the model-based system has values, they are often more satisficing than maximizing.)
AspiringKnitter 24 Jan 2012 20:33 UTC
2 points
0
If I understand this correctly, then the model-based system and the model-free system sound like inside and outside views.
- Manfred 25 Jan 2012 15:40 UTC
  2 points
  0
  Parent
  Although in this case the “outside view” can’t learn from anybody else’s mistakes, it always has to make them itself.
- lessdazed 25 Jan 2012 4:03 UTC
  1 point
  0
  Parent
  I agree.
  
  Whoever downvoted this should have said why they disagreed if they did.
- JoachimSchipper 25 Jan 2012 12:40 UTC
  0 points
  0
  Parent
  My inside view already feels pretty probabilistic, actually. (I suspect LW-reading mathematicians are not a very good model of the average human, though.)
Htarlov 9 Feb 2025 0:14 UTC
1 point
0
Part of the animal nature, including humans, is to crave novelty and surprise and avoid boredom. This is pretty crucial to the learning process in a changing and complex environment. Humans have multi-level drives, and not all of them are well-targeted on specific goals or needs.
It is very visible in small children. Some people with ADHD, like me, have a harder time regulating themself well and this is also especially visible for us, even when being adult. I know exactly what I should be doing. This is one thing. I also may feel hungry. That’s another thing. But still, I may indulge in doing a third thing instead—something that satiates my need for stimulation and novelty (most often for me this means gaining some knowledge or understanding—I often fell into reading and thinking about rabbit holes of topics, that have hardly any real-life use, and that I can hardly do something about). Something not readily useful in terms of goal seeking, but generating some interesting possibilities long-term. In other words—exploration without targeted purpose.

Craving for novelty and surprise and avoidance of boredom is another element that in my opinion should be included.
[deleted] 5 Aug 2012 6:21 UTC
1 point
0

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don’t act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and

A question, probably silly: Suppose you calculate what a person would do given every possible configuration of sensory inputs, and then construct a utility function that returns one if that thing is done and zero otherwise. Can’t we then say that any deterministic action-taking thing acts according to some utility function?

Or, even more trivially, just let the utility be constant. Then any action maximizes utility.

Edit: If you’re using utility functions to predict actions, then the constant utility function is like a maximum entropy prior, and the “every possible configuration” thing is like a hypothesis that simply lists all observations without positing some underlying pattern, so it would eventually get killed off by being more complicated than hypotheses that actually “compress” the evidence.
- Richard_Kennaway 5 Aug 2012 11:39 UTC
  3 points
  0
  Parent
  
  A question, probably silly: Suppose you calculate what a person would do given every possible configuration of sensory inputs, and then construct a utility function that returns one if that thing is done and zero otherwise. Can’t we then say that any deterministic action-taking thing acts according to some utility function?
  
  No, although this idea pops up often enough that I have given it a name: the Texas Sharpshooter Utility Function.
  
  There are two things glaringly wrong with it. Firstly, it is not a utility function in the sense of VNM (proof left as an exercise). Secondly, it does not describe how anything works—it is purely post hoc (hence the name).
  What links here?
  - Risto_Saarelma's comment on Issue 301 shipped: Show parent comments on /comments by matt (5 Aug 2012 13:46 UTC; 2 points)
gaffa 23 Jan 2012 22:41 UTC
1 point
0
As a first reaction (and without being read up on the details), I’m very skeptical. Assuming these three systems are actually in place, I don’t see any convincing reason why any one of them should be trusted in isolation. Natural selection has only ever been able to work on their compound output, oblivious to the role played by each one individually and how they interact.

Maybe the “smart” system has been trained to assign some particular outcome a value of 5 utilons, whereas we would all agree that it’s surely and under all circumstances worth more than 20, because as it happens throughout evolution one of the other “dumb” systems has always kicked in and provided the equivalent of at least 15 utilons. If you then extract the first system bare and naked, it might deliver some awful outputs.
- mfb 5 Feb 2012 18:14 UTC
  1 point
  0
  Parent
  As I understand it, the first system should be able to predict the result of the other two—if the brain knows a bit about how brains work.
  
  While I don’t know if the brain really has three different systems, I think that the basic idea is true: The brain has the option to rely on instincts, on “it worked before”, or on “let’s make a pro/contra list”—this includes any combination of the concepts.
  
  The “lower” systems evolved before the “higher” ones, therefore I would expect that they can work as a stand-alone system as well (and they do in some animals).
- endoself 24 Jan 2012 4:09 UTC
  0 points
  0
  Parent
  I’m not familiar with the theory beyond what Luke has posted, but I think only one system is active at a time, so there is no summation occurring. However, we don’t yet know what determines which system makes a particular decision or how these systems are implemented, so there definitely could be problems isolating them.
Deanushka 6 Feb 2012 22:31 UTC
0 points
0
Just some initial thoughts,

I do understand that these statements are broad generalisations for what really does occur though the premise is that a successful choice would be made from wieighting options provided from the scenarios.

As with genetics and other systems the beneficial error scenario which can be described in situations such as a miskeyed note on a keyboard leading to a variation of the sequence that is favourable seems excluded from these scenarios.

Improvisation based on self introduced errors may also be a core to these utilities being able to evolve reason.

Model-based system: Figure out what’s going on, and what actions maximize returns, and do them.

Model-free system: Do the thingy that worked before again!

Pavlovian system: Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.
mfb 29 Jan 2012 17:37 UTC
0 points
0
I think that you can keep up the utility function a bit longer if you add the costs of thinking to it—required time and energy, and maybe aversion of thinking about it. “I could compare these two items in the supermarket for 30 minutes and finally find out which product is better—or I could just ignore the new option and take the same thing as last time”. It can be the perfectly rational option to just stick with something which worked before.

It is also rational to decide how much time you invest to decide something (and if there is a lot of money involved, this is usually done). If the time for a decision is not enough to build and use a model, you fall back to more “primitive” methods. In fact, most of the everyday decisions have to be done like that. Each second, you have several options available, and no possibility to re-think about all of them every time.

We need all 3 systems for our life. The interesting thing is just to decide which system is useful for which decision and which time it should get. Look at it from a higher perspective, and you can get a well-defined utility function for a brain which has access to these systems to evaluate things.
Dmytry 26 Jan 2012 22:51 UTC
0 points
0
Okay, which system decides which way the rat should turn when rat is navigating a maze? A cat doing actual path-finding on complex landscape? (which is surprisingly hard to do if you are coding a cat AI. Path finding, well, it is rather ‘rational’ in the sense that animals don’t walk into the walls and the like) A human navigating a maze with a map to get food? A cat doing path finding avoiding a place where the cat had negative experience? (“conditioning”).

It seems to me that those 3 ‘systems’, if there are such 3 systems, aren’t interacting in the way that article speaks of.
TheOtherDave 23 Jan 2012 23:40 UTC
0 points
0

At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system’s preferences as representing “my values,” and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.

At a glance, I might be more comfortable embracing an extrapolation of the combination of the model-based system’s preferences and the Pavlovian system’s preferences.

Admittedly, a first step in extrapolating the Pavlovian system’s preferences might be to represent its various targets as goals in a model, thereby leaving the extrapolator with a single system to extrapolate, but given that 99% of the work takes place after this point I’m not sure how much I care. Much more important is to not lose track of that stuff accidentally.
timtyler 25 Jan 2012 1:48 UTC
−3 points
0

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don’t act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice.

Er, I don’t think so. To quote from here:

Utility maximisation is a general framework which is powerful enough to model the actions of any computable agent. The actions of any computable agent—including humans—can be expressed using a utility function. This was spelled out by Dewey in a 2011 paper titled: “Learning What to Value”—in his section about “O-Maximisers”.

Some argue that humans have no utility function. However, this makes little sense: all computable agents have utility functions. The human utility function may not be easy to write down—but that doesn’t mean that it doesn’t exist.
- JoachimSchipper 25 Jan 2012 12:45 UTC
  2 points
  0
  Parent
  Why would this necessarily be true? Somewhere in mind-design-space is a mind (or AI/algorithm) that confidently asserts A > B, B > C and C > A. (I’m not sufficiently versed in the jargon to know whether this mind would be an “agent”, though—most minds are not goal-seeking in any real sense of the word.)
  - timtyler 25 Jan 2012 12:49 UTC
    0 points
    0
    Parent
    That mind would have some associated behaviour and that behaviour could be expressed by a utility function (assuming computability—which follows from the Church–Turing–Deutsch principle).
    
    Navel gazing, rushing around in circles, burning money, whatever—all have corresponding utility functions.
    
    Dewey explains why in more detail—if you are prepared to follow the previously-provided link from here.
    - JoachimSchipper 25 Jan 2012 13:53 UTC
      4 points
      0
      Parent
      I’ve taken a look at the paper. If “outcomes” are things like “chose A”, “chose B” or “chose C”, the above mind is simply not an O-maximizer: consider a world with observations “I can choose between A and B/B and C/C and A” (equally likely, independent of any past actions or observations) and actions “take the first offered option” or “take the second offered option” (played for one round, for simplicity, but the argument works fine with multiple rounds); there is no definition of U that yields the described behaviour. (I’m aware that the paper asserts that “any agents [sic] can be written in O-maximizer form”, but note that the paper may simply be wrong. It’s clearly an unfinished draft, and no argument or proof is given.)
      
      If outcomes are things like “chose A given a choice between A and B”, which is not clear to me from the paper, then my mind is indeed an O-maximizer (that is, there is a definition of U such that an O-maximizer produces the same outputs as my mind). However, as I understand it, you have also encoded any cognitive errors in the utility function: if a mind can be Dutch-booked into a undesirable state, the associated O-maximizer will have to act on a U function that values this undesirable state highly if it comes about as a result of being Dutch-booked. (Remember, the O-maximizer maximizes U and behaves like the original mind.) As an additional consideration, most decision/choice theory seems to assume a ranking of outcomes, not (path, outcome) pairs.
      - timtyler 25 Jan 2012 15:30 UTC
        1 point
        0
        Parent
        
        I’ve taken a look at the paper. If “outcomes” are things like “chose A”, “chose B” or “chose C”, the above mind is simply not an O-maximizer: consider a world with observations “I can choose between A and B/B and C/C and A” (equally likely, independent of any past actions or observations) and actions “take the first offered option” or “take the second offered option” (played for one round, for simplicity, but the argument works fine with multiple rounds); there is no definition of U that yields the described behaviour.
        
        What?!? You haven’t clearly specified the behaviour of the machine. If you are invoking an uncomputable random number generator to produce an “equally likely” result then you have an uncomputable agent. However, there’s no such thing as an uncomputable random number generator in the real world. So: how is this decision actually being made?
        
        I’m aware that the paper asserts that “any agents [sic] can be written in O-maximizer form”, but note that the paper may simply be wrong. It’s clearly an unfinished draft, and no argument or proof is given.
        
        It applies to any computable agent. That is any agent—assuming that the Church–Turing–Deutsch principle is true.
        
        The argument given is pretty trivial. If you doubt the result, check it—and you should be able to see if it is correct or not fairly easily.
        JoachimSchipper 25 Jan 2012 16:57 UTC
        0 points
        0
        Parent
        The world is as follows: each observation x_i is one of “the mind can choose between A and B”, “the mind can choose between B and C” or “the mind can choose between C and A” (conveniently encoded as 1, 2 and 3). Independently of any past observations (x_1 and the like) and actions (x_1 and the like), each of these three options is equally likely. This fully specifies a possible world, no?
        
        The mind, then, is as follows: if the last observation is 1 (“A and B”), output “A”; if the last observation is 2 (“B and C”), output “B”; if the last observation is 3 (“C and A”), output “C”. This fully specifies a possible (deterministic, computable) decision procedure, no? (1)
        
        I argue that there is no assignment to U(“A”), U(“B”) and U(“C”) that causes an O-maximizer to produce the same output as the algorithm above. Conversely, there are assignments to U(“1A”), U(“1B”), …, U(“3C”) that cause the O-maximizer to output the same decisions as the above algorithm, but then we have encoded our decision algorithm into the U function used by the O-maximizer (which has its own issues, see my previous post.)
        
        (1) Actually, the definition requires the mind to output something before receiving input. That is a technical detail that can be safely ignored; alternatively, just always output “A” before receiving input.
        timtyler 25 Jan 2012 18:13 UTC
        3 points
        0
        Parent
        
        I argue that there is no assignment to U(“A”), U(“B”) and U(“C”) that causes an O-maximizer to produce the same output as the algorithm above.
        
        ...but the domain of a utility function surely includes sensory inputs and remembered past experiences (the state of the agent). You are trying to assign utilities to outputs.
        
        If you try and do that you can’t even encode absolutely elementary preferences with a utility function—such as: I’ve just eaten a peanut butter sandwich, so I would prefer a jam one next.
        
        If that is the only type of utility function you are considering, it is no surprise that you can’t get the theory to work.
- Manfred 25 Jan 2012 15:39 UTC
  0 points
  0
  Parent
  The point is about how humans make decisions, not about what decisions humans make.
  - timtyler 25 Jan 2012 18:30 UTC
    0 points
    0
    Parent
    
    The point is about how humans make decisions, not about what decisions humans make.
    
    Er, what are you talking about? Did you not understand what was wrong with Luke’s sentence? Or what are you trying to say?
    - Manfred 25 Jan 2012 19:39 UTC
      7 points
      0
      Parent
      The way I know to assign a utility function to an arbitrary agent is to say “I assign what the agent does utility 1, and everything else utility less than one.” Although this “just so” utility function is valid, it doesn’t peek inside the skull—it’s not useful as a model of humans.
      
      What I meant by “how humans make decisions” is a causal model of human decision-making. The reason I wouldn’t call all agents “utility maximizers” is because I want utility maximizers to have a certain causal structure—if you change the probability balance of two options and leave everything else equal, you want it to respond thus. As gwern recently reminded me by linking to that article on Causality, this sort of structure can be tested in experiments.
      - timtyler 25 Jan 2012 20:40 UTC
        2 points
        0
        Parent
        
        Although this “just so” utility function is valid, it doesn’t peek inside the skull—it’s not useful as a model of humans.
        
        It’s a model of any computable agent. The point of a utility-based framework capable of modelling any agent is that it allows comparisons between agents of any type. Generality is sometimes a virtue. You can’t easily compare the values of different creatures if you can’t even model those values in the same framework.
        
        The reason I wouldn’t call all agents “utility maximizers” is because I want utility maximizers to have a certain causal structure—if you change the probability balance of two options and leave everything else equal, you want it to respond thus.
        
        Well, you can define your terms however you like—if you explain what you are doing. “Utility” and “maximizer” are ordinary English words, though.
        
        It seems to be impossible to act as though you don’t have a utility function, (as was originally claimed) though. “Utility function” is a perfectly general concept which can be used to model any agent. There may be slightly more concise methods of modelling some agents—that seems to be roughly the concept that you are looking for.
        
        So: it would be possible to say that an agent acts in a manner such that utility maximisation is not the most parsimonious explanation of its behaviour.
        Manfred 26 Jan 2012 1:23 UTC
        3 points
        0
        Parent
        
        Although this “just so” utility function is valid, it doesn’t peek inside the skull—it’s not useful as a model of humans.
        
        It’s a model of any computable agent.
        
        Sorry, replace “model” with “emulation you can use to predict the emulated thing.”
        
        There may be slightly more concise methods of modelling some agents—that seems to be roughly the concept that you are looking for.
        
        I’m talking about looking inside someone’s head and finding the right algorithms running. Rather than “what utility function fits their actions,” I think the point here is “what’s in their skull?”
        timtyler 5 Aug 2012 12:30 UTC
        0 points
        0
        Parent
        
        I’m talking about looking inside someone’s head and finding the right algorithms running. Rather than “what utility function fits their actions,” I think the point here is “what’s in their skull?”
        
        The point made by the O.P. was:
        
        Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don’t act like they have utility functions)
        
        It discussed actions—not brain states. My comments were made in that context.