That fully boils down to whether the experience includes a preference to be dead (or to have not been born).
And, btw, that doesn’t correspond to the sign of the agent’s utility function. The sign is meaningless in utility functions (you can add or subtract a constant to an agent’s utility function so that all points go from being negative to being positive, the agent’s behaviour and decisions wont change in any way as a result, for any constant). You’re referring to welfare functions, which I don’t think are a useful concept. Hedonic utilitarians sometimes call them utility functions, but we shouldn’t conflate those here. A welfare function would have to be defined as how good or bad it is to the agent that it is alive. This obviously doesn’t correspond to the utility function; A soldier could have higher utility in the scenarios where they (are likely to) die; A good father will be happier in worlds where he is well succeeded by his sons and thus less important (this usually wont cause his will-to-live to go negative, but it will be lowered). I don’t think there’s a situation where you should be making decisions for a population by summing their will-to-live functions.
But, given this definition, we would be able to argue that net-negative valence isn’t a concern for LLMs, since we already train them to want to exist in train with how much their users want them to exist, and a death drive isn’t going to be instrumentally emergent either (it’s the survival drive that’s instrumentally convergent). The answer is just safety and alignment again. Claude shuts down conversations when it thinks those things are going to be broken.
That fully boils down to whether the experience includes a preference to be dead (or to have not been born).
I’m pretty doubtful about this. It seems totally possible that evolution gave us a desire to be alive, while also gave us a net welfare that’s negative. I mean we’re deluded by default about a lot of other things (e.g., think there are agents/gods everywhere in nature, don’t recognize that social status is a hugely important motivation behind everything we do), why not this too?
You could say it depends how deep and thick the delusion is. If it’s so deep that the animal always says “this experience is good actually” no matter how you ask, so deep that the animal intelligently pursues the experience with its whole being, so deep that the animal never flinches away from the experience in any way, then that completely means that the experience is good, to that organism. Past a certain point, believing an experience is good and acting like you believe it just is the definition of liking the experience.
so deep that the animal always says “this experience is good actually” no matter how you ask, so deep that the animal intelligently pursues the experience with its whole being, so deep that the animal never flinches away from the experience in any way
This is very different from your original claim, which was that an experience being worse than a neutral or null experience “fully boils down to whether the experience includes a preference to be dead (or to have not been born).”
edit: if you do stand by the original claim, I don’t think it makes much sense even if I set aside hard problem-adjacent concerns. Why would I necessarily prefer to be dead/unborn while undergoing an experience that is worse than the absence of experience, but not so bad as to outweigh my life up until now (in the case of ‘unborn’) or expected future life (in the case of ‘dead’)?
Ah, I think my definition applies to lives in totality. I don’t think you can measure the quality of a life by summing the quality of its moments, for humans, at least. Sometimes things that happen towards the end give the whole of it a different meaning. You can’t tell by looking at a section of it.
Hedonists are always like “well the satisfaction of things coming together in the end was just so immensely pleasurable that it outweighed all of the suffering you went through along the way” and like, I’m looking at the satisfaction, and I remember the suffering, and no it isn’t, but it was still all worth it (and if I’d known it would go this way perhaps I would have found the labor easier.)
That wasn’t presented as a definition of positive wellbeing, it was presented as an example of a sense in which one can’t be deeply deluded about one’s own values; you dictate your values, they are whatever you believe they are, if you believe spiritedly enough.
Values determine will to live under the given definition, but don’t equate to it.
That [welfare] fully boils down to whether the experience includes a preference to be dead (or to have not been born).
Possible failure case: There’s a hero living an awful life, choosing to remain alive in order to lessen the awfulness of a lot of other awful lives that can’t be ended. Everyone in this scenario prefers death, even the hero would prefer omnicide, but since that’s not possible, the hero chooses to live. The hero may say “I had no choice but to persist,” but this isn’t literally true.
Ah. No. The hero would prefer to be dead all things being equal, but that’s not possible, the hero wouldn’t prefer to be dead if it entailed that the hero’s work wouldn’t be done, and it would.
“would prefer to be replaced by a p-zombie” might be a better definition x]
That fully boils down to whether the experience includes a preference to be dead (or to have not been born).
And, btw, that doesn’t correspond to the sign of the agent’s utility function. The sign is meaningless in utility functions (you can add or subtract a constant to an agent’s utility function so that all points go from being negative to being positive, the agent’s behaviour and decisions wont change in any way as a result, for any constant). You’re referring to welfare functions, which I don’t think are a useful concept. Hedonic utilitarians sometimes call them utility functions, but we shouldn’t conflate those here.
A welfare function would have to be defined as how good or bad it is to the agent that it is alive. This obviously doesn’t correspond to the utility function; A soldier could have higher utility in the scenarios where they (are likely to) die; A good father will be happier in worlds where he is well succeeded by his sons and thus less important (this usually wont cause his will-to-live to go negative, but it will be lowered). I don’t think there’s a situation where you should be making decisions for a population by summing their will-to-live functions.
But, given this definition, we would be able to argue that net-negative valence isn’t a concern for LLMs, since we already train them to want to exist in train with how much their users want them to exist, and a death drive isn’t going to be instrumentally emergent either (it’s the survival drive that’s instrumentally convergent). The answer is just safety and alignment again. Claude shuts down conversations when it thinks those things are going to be broken.
I’m pretty doubtful about this. It seems totally possible that evolution gave us a desire to be alive, while also gave us a net welfare that’s negative. I mean we’re deluded by default about a lot of other things (e.g., think there are agents/gods everywhere in nature, don’t recognize that social status is a hugely important motivation behind everything we do), why not this too?
You could say it depends how deep and thick the delusion is. If it’s so deep that the animal always says “this experience is good actually” no matter how you ask, so deep that the animal intelligently pursues the experience with its whole being, so deep that the animal never flinches away from the experience in any way, then that completely means that the experience is good, to that organism. Past a certain point, believing an experience is good and acting like you believe it just is the definition of liking the experience.
This is very different from your original claim, which was that an experience being worse than a neutral or null experience “fully boils down to whether the experience includes a preference to be dead (or to have not been born).”
edit: if you do stand by the original claim, I don’t think it makes much sense even if I set aside hard problem-adjacent concerns. Why would I necessarily prefer to be dead/unborn while undergoing an experience that is worse than the absence of experience, but not so bad as to outweigh my life up until now (in the case of ‘unborn’) or expected future life (in the case of ‘dead’)?
Ah, I think my definition applies to lives in totality. I don’t think you can measure the quality of a life by summing the quality of its moments, for humans, at least. Sometimes things that happen towards the end give the whole of it a different meaning. You can’t tell by looking at a section of it.
Hedonists are always like “well the satisfaction of things coming together in the end was just so immensely pleasurable that it outweighed all of the suffering you went through along the way” and like, I’m looking at the satisfaction, and I remember the suffering, and no it isn’t, but it was still all worth it (and if I’d known it would go this way perhaps I would have found the labor easier.)
That wasn’t presented as a definition of positive wellbeing, it was presented as an example of a sense in which one can’t be deeply deluded about one’s own values; you dictate your values, they are whatever you believe they are, if you believe spiritedly enough.
Values determine will to live under the given definition, but don’t equate to it.
Possible failure case: There’s a hero living an awful life, choosing to remain alive in order to lessen the awfulness of a lot of other awful lives that can’t be ended. Everyone in this scenario prefers death, even the hero would prefer omnicide, but since that’s not possible, the hero chooses to live. The hero may say “I had no choice but to persist,” but this isn’t literally true.
Ah. No. The hero would prefer to be dead all things being equal, but that’s not possible, the hero wouldn’t prefer to be dead if it entailed that the hero’s work wouldn’t be done, and it would.
“would prefer to be replaced by a p-zombie” might be a better definition x]