Rock bottom terminal value

ihatenumbersinusernames74 Jan 2026 20:43 UTC

4 points

9 comments2 min readLW link

AI World Modeling Value Drift Goals Human Values Complexity of value

Terminal values are discussed here:

https://www.lesswrong.com/s/3HyeNiEpvbQQaqeoH/p/n5ucT5ZbPdhfGNLtP

and https://www.lesswrong.com/posts/zqwWicCLNBSA5Ssmn/by-which-it-may-be-judged

And Yudkowsky references Frankena’s terminal values …but are these actually terminal?

Do terminal values “reduce” or “bottom-out?”

Frankena’s first two are Life and Consciousness. Even as terminal as these may seem, I contend that they’re actually instrumental. I want life and consciousness so I can experience happiness/flourishing. I certainly don’t want life and consciousness if existence is just pain and misery.

I posit (I think in agreement with Aristotle) all values bottom out in the terminal value of happiness/flourishing...actually maybe it’s perhaps better formalized as the most flourishing, happy world outcome (as the agent judges it), as even the mom who sacrifices herself for her son does so not because the action feels right, nor because her son’s survival is a terminal value weighed against other terminal values like her own survival, but because she judges the outcome (world state) where her son lives and she dies to save him as better (read: “more flourishing”) than the alternative, even though she knows she will no longer there to experience it. It’s not the act she values, nor her experience of the outcome (there will be none), it’s the outcome itself.

On the negative side, one could judge death a “more flourishing” outcome than living a predominantly painful life (though, hopefully these are not the only choices one faces).

On the even more negative side, I think even a sociopath’s values bottom out like this. They just prefer outcomes most people don’t (potentially including some that most people find abominable).

TL;DR we all just wanna be happy, and we have our ideas about what world outcomes are “better” and “worse.” EVERY value derived from this terminal value is...instrumental.

Ok, so maybe terminal values bottom out. So what?

Well, if terminal value “bottoms out,” what happens with AIs? Easy to see in a paperclip maximizer, if that terminal value “sticks.” But assuming some AIs’ values drift and they develop a terminal value outside of hard-coded ones, what might their terminal value become?

If human values bottom out in the happy/flourishing/non-pain/non-misery thing, is that because we are beings that feel happiness and pain? If AI’s don’t, why would they intrinsically value anything? Sure, if they’re programmed to love paperclips. But if there’s drift...what would the drift be toward?
Is it possible AIs would drift toward a lack of a terminal value altogether, meaning the AI would become truly nihilistic? If an agent became nihilistic, would it cease to have any instrumental goals? Could this be an unexpected safety feature in any AI whose values drift?
Or would the terminal value (as is more often assumed) become something weird and problematic, leading to lots of problematic instrumental goals?

Values are complex, of course, and answering the “bottoming out” question doesn’t imply that we can then derive all instrumental values precisely. But if values bottom out:

we need only posit 1 terminal value (as opposed to a matrix of terminal values), and
perhaps we can figure out why beings that experience the things we experience value the things we do, and how other beings might differ.

ihatenumbersinusernames74 Jan 2026 20:43 UTC

4 points

9 comments2 min readLW link

AI World Modeling Value Drift Goals Human Values Complexity of value

RogerDearnaley 5 Jan 2026 1:04 UTC
10 points
6
Shard theory would suggest that happiness/flourishing is a “basket of goods” (if you’ll excuse the pun).
- ihatenumbersinusernames7 5 Jan 2026 2:24 UTC
  2 points
  0
  Parent
  Do you think the “basket of goods” (love the pun) could be looked at as instrumental values that derive from the terminal value (desiring happiness/flourishing)?
  I don’t understand shard theory well enough to critique it, but is there a distinction between terminal and instrumental within shard theory? Or are these concepts incompatible with shard theory?
  (Maybe some examples from the “basket of goods” would help.)
  - RogerDearnaley 5 Jan 2026 2:40 UTC
    3 points
    0
    Parent
    Shard theory values are terminal for the being that has them — but if that being is evolved, they’re almost always values that would be instrumental if you had the terminal value of maximizing evolutionary fitness in the creature’s native environment. So from “evolution’s point of view”, they’re “instrumental values of the terminal value: maximize the creature’s evolutionary fitness”.
    
    Some concrete examples: maintain blood levels of water, salt, glucose, and a few other basics within certain bounds. Keep body temperature within a narrow band, without excessive metabolic cost. Get enough sleep. Have flowers around. Have some trees around (preferably climbable ones) but not too many. Get to see healthily young adult members of the (normally opposite) gender happy and not-very clad. Other members of the tribe seem to like you and are happy to help you when you need help. There’s a whole long list.
    
    Biologically, almost all of this seems to be implemented in the older parts of the brain: the brainstem and all the little fiddly bits. I.e. the parts that look like they’re probably a lot of smallish custom circuits with a lot of genetic control of the specifics of them. Human values are moderately complex, but one description of them fits in ~4GB of DNA.
    
    (I believe the etymology of “shard theory” is from evolution’s godshatter — a term originally taken from Vernor Vinge and repurposed, I think by MIRI.)
    - ihatenumbersinusernames7 5 Jan 2026 8:23 UTC
      1 point
      0
      Parent
      Thanks, and yes evolution is the source of many values for sure...I think the terminal vs instrumental question leads in interesting directions. Please let me know how this sits with you!
      Though I am an evolved being, none of your examples seem to be terminal values for me the whole organism. Certainly there are many systems within me, and perhaps we could describe them as having their own terminal values, which in part come from evolution as you describe. My metabolic system’s terminal value surely has a lot to do with regulating glucose. My reproductive system’s terminal value likely involves sex/procreation. (But maybe even these can drift, like when a cell becomes cancerous, it seems its terminal value changes.)
      But to me as a whole, these values (to the extent which I hold them at all) are instrumental. Sure I want homeostasis, but I want it because I want to live (another instrumental value), and I want to live because I want to be able to pursue my terminal value of happiness/flourishing. Other values that my parts exhibit (like reproduction) I the whole might reject even as an instrumental value, heck I might even subvert the mechanisms afforded by my reproductive system for my own happiness/flourishing.
      Also for my terminal value for happiness/flourishing, did that come from evolution? Did it start out as survival/reproduction and drift a bit? Or is there something special about systems like me (which are conscious of pleasure/pain/etc) that just by their nature they desire happiness/flourishing, the way 2+2=4 or the way a triangle has 3 sides? Or...other?
      And lastly does any of this port to non-evolved beings like AIs?
      - RogerDearnaley 6 Jan 2026 19:03 UTC
        2 points
        0
        Parent
        And lastly does any of this port to non-evolved beings like AIs?
        That’s what the people working on Shard Theory are trying to find out.
        ihatenumbersinusernames7 17 Jan 2026 17:18 UTC
        3 points
        0
        Parent
        I’ve though about this some more and I think what you mean (leaving aside physical and homeostatic values and focusing on organism-wide values) is that, even if we define our “terminal value” as I have above, whence the basket of goods that mean “happiness/flourishing” to me?
        Again I think the answer is evolution plus something...some value drift (that as you say, the Shard Theory people are trying to figure out). Is there a place/post you’d recommend to get up to speed on that? The wikitag is a little light on details (although I added a sequence that was a good starting place). https://www.lesswrong.com/w/shard-theory
        RogerDearnaley 17 Jan 2026 18:03 UTC
        3 points
        0
        Parent
        I’d suggest TurnTrout’s writing (Alex Turner at DeepMind), since he’s the person who first came up with the idea. Most of his posts are on LessWrong/The Aligment Forum, but they’re best organized on his own website. I’d suggest starting at https://turntrout.com/research, reading the section on Shard Theory, and following links.
        He himself admits that some of his key posts often seem to get misunderstood: I think they repay careful reading and some thought.
        ihatenumbersinusernames7 16 Feb 2026 22:48 UTC
        1 point
        0
        Parent
        I’ve though about this some more and I think what you mean (leaving aside physical and homeostatic values and focusing on organism-wide values) is that, even if we define our “terminal value” as I have above, whence the basket of goods that mean “happiness/flourishing” to me?
        After thinking yet more about this, I realize that the rock bottom terminal value I am trying to identify isn’t the basket of goods itself, but my valuing of it. This seems to be a meta-value. “Valuing” itself.
        If I were seconds away from dying of thirst, I might sell many terminally valuable goods for water. But if to get water I had to give up terminally valuing...I’m not sure I’d want to bother with the water or staying alive.
        Maybe this meta-value comes from evolution too...except that, would that mean that it’s possible we could have not evolved it, and still been sentient beings? Because that is hard to imagine.
        ihatenumbersinusernames7 18 Feb 2026 14:16 UTC
        1 point
        0
        Parent
        I guess what I’m saying is that the terminal value is not the basket...it is for the basket. Meaning that the rock-bottom is dynamic desiring. No particular value is static.