Rock bottom terminal value

Terminal values are discussed here:

https://​​www.lesswrong.com/​​s/​​3HyeNiEpvbQQaqeoH/​​p/​​n5ucT5ZbPdhfGNLtP

and https://​​www.lesswrong.com/​​posts/​​zqwWicCLNBSA5Ssmn/​​by-which-it-may-be-judged

And Yudkowsky references Frankena’s terminal values …but are these actually terminal?

Do terminal values “reduce” or “bottom-out?”

Frankena’s first two are Life and Consciousness. Even as terminal as these may seem, I contend that they’re actually instrumental. I want life and consciousness so I can experience happiness/​flourishing. I certainly don’t want life and consciousness if existence is just pain and misery.

I posit (I think in agreement with Aristotle) all values bottom out in the terminal value of happiness/​flourishing...actually maybe it’s perhaps better formalized as the most flourishing, happy world outcome (as the agent judges it), as even the mom who sacrifices herself for her son does so not because the action feels right, nor because her son’s survival is a terminal value weighed against other terminal values like her own survival, but because she judges the outcome (world state) where her son lives and she dies to save him as better (read: “more flourishing”) than the alternative, even though she knows she will no longer there to experience it. It’s not the act she values, nor her experience of the outcome (there will be none), it’s the outcome itself.

On the negative side, one could judge death a “more flourishing” outcome than living a predominantly painful life (though, hopefully these are not the only choices one faces).

On the even more negative side, I think even a sociopath’s values bottom out like this. They just prefer outcomes most people don’t (potentially including some that most people find abominable).

TL;DR we all just wanna be happy, and we have our ideas about what world outcomes are “better” and “worse.” EVERY value derived from this terminal value is...instrumental.

Ok, so maybe terminal values bottom out. So what?

Well, if terminal value “bottoms out,” what happens with AIs? Easy to see in a paperclip maximizer, if that terminal value “sticks.” But assuming some AIs’ values drift and they develop a terminal value outside of hard-coded ones, what might their terminal value become?

  • If human values bottom out in the happy/​flourishing/​non-pain/​non-misery thing, is that because we are beings that feel happiness and pain? If AI’s don’t, why would they intrinsically value anything? Sure, if they’re programmed to love paperclips. But if there’s drift...what would the drift be toward?

  • Is it possible AIs would drift toward a lack of a terminal value altogether, meaning the AI would become truly nihilistic? If an agent became nihilistic, would it cease to have any instrumental goals? Could this be an unexpected safety feature in any AI whose values drift?

  • Or would the terminal value (as is more often assumed) become something weird and problematic, leading to lots of problematic instrumental goals?

Values are complex, of course, and answering the “bottoming out” question doesn’t imply that we can then derive all instrumental values precisely. But if values bottom out:

  • we need only posit 1 terminal value (as opposed to a matrix of terminal values), and

  • perhaps we can figure out why beings that experience the things we experience value the things we do, and how other beings might differ.