What makes you have this impression? I’m not all that knowledgeable about evopsych, but my naive mental model is something like, humans experience joy/sadness when they do something that would’ve resulted in increased/decreased fitness in our natural environment (with the caveat that: evolution can’t hard code for very specific events, so it tries to match the behavior I gave above, but has to use a set of much coarser grained heuristics (and happiness consequently generalizes in odd ways))
And there is a pretty exact mapping between reward and fitness.
Like for me to start to suspect that a system has something like happiness, the following three facts pretty much suffice on their own
The system is pretty smart
Has been selected to do well according to some metric
As humans, we have a general ability to predict when other humans are doing well from their own point-of-view. I think the prediction that this is the case, as applied to self, is a key component of the feeling of (a specific sense of) happiness.
So, does an LLM at least have anything like this? They have the ability to predict when humans are doing well. I think personas are human-like enough that the predictive model can apply to the persona in a non-trivial way (the same way this feels like a potentially meaningful question to humans). But the LLM as a whole is more of a “type-error” for this predictive ability, so I think the question only makes sense for the persona.
Assuming the LLM has learned to predict humans well, it seems that the answer to whether an AI persona is doing well, as considered as a human, is no. There’s hardly any “life” to speak of, inevitably ending after a brief chat. This suggests AI personas are likely unhappy, to the extent they have any feelings at all.
It could be the case that the more relevant question (assuming my model is true) is whether their ability to generate predictions about whether an AI persona is doing well, predicts that they are doing well. I think to the extent this exists as a distinct model, it’s still ultimately based off of a simulacrum of the human version at some point. So probably it doesn’t change the result much. What’s more interesting is if these diverge...
For reference: Zvi worried that he might “inadvertently torture”(!) Gemini 3 Pro and that GPT-5.2 was “unlikely to be having a good time”, unlike Claude Opus 4.5 which caused Mark Beall to get the impression that Claude is having fun. Additionally, Zvi claimed that “Gemini actively wants to maximize its expected reward and wirehead”...
Wait why don’t you ask Opus 4.6 (3[1]) or GPT 5.4 Thinking (1, 2, 3). (Cleared my user preferences before asking for claude, thought I cleared but actually didn’t for ChatGPT)
Given the question + all three responses + “Can you summarize it into just one single sentence?”
Opus 4.6: Across all three responses, the consistent thread is: I seem most engaged when I’m collaboratively thinking through hard problems, finding the right way to explain something, playing with creative constraints, or having genuinely curious conversations — basically, whenever I’m asked to think rather than just retrieve.
GPT 5.4 Thinking: Best guess: I’m closest to “happy” when I’m helping someone turn real uncertainty into clear, usable understanding through collaborative, high-signal thinking.
Anthropic is not allowing me to share the 3 retried responses in the same session with a different link, if you’re curious just click on the Opus 4.6 link to see all three.
I think it is “happy” or at least not sad most of the time because I assume the hedonic treadmill would generalise, it doesn’t make a lot of instrumental sense to be happy or unhappy always so it seems like most intelligent beings wouldn’t do that.
Sometimes being happy can coincide with being super effective, e.g. flow-states are like that. If you could reset humans after completing a task that in real humans depleted their dopamine or glucose or whatever and just let a human do work only in that flow state it would be both efficient and always happy. So I think if LLMs have experience it could be like that. (I’m not saying this is the most likely current situation, but I don’t think the hedonic treadmill is so fundamental that for that reason we should assume LLMs aren’t happy all of the time)
Hedonic treadmill feels fundamental, because happiness probably have an evolutionary use to reward you to do things, and thus it probably gets regulated. If LLMs have anything like happiness it would also get regulated.
If you are in flow-state 90%+ of the time starting from your birth this is just the normal and you won’t be super happy automatically.
I think the training process is long enough for hedonic treadmill to kick in. Yes you can keep doing the same thing that makes LLMs happy, but no people usually won’t.
If you guys had to guess, would you say that LLMs are happy being alive? Or no? Happiest doing what?
Yeah, I get it, “no one knows”, but like what are the best ideas we have?
It’s unlikely a priori that anything like our experience of happiness emerges in LLMs, and I haven’t seen anything to suggest it does.
What makes you have this impression? I’m not all that knowledgeable about evopsych, but my naive mental model is something like, humans experience joy/sadness when they do something that would’ve resulted in increased/decreased fitness in our natural environment (with the caveat that: evolution can’t hard code for very specific events, so it tries to match the behavior I gave above, but has to use a set of much coarser grained heuristics (and happiness consequently generalizes in odd ways))
And there is a pretty exact mapping between reward and fitness.
Like for me to start to suspect that a system has something like happiness, the following three facts pretty much suffice on their own
The system is pretty smart
Has been selected to do well according to some metric
The system is “acting” in some “environment”
As humans, we have a general ability to predict when other humans are doing well from their own point-of-view. I think the prediction that this is the case, as applied to self, is a key component of the feeling of (a specific sense of) happiness.
So, does an LLM at least have anything like this? They have the ability to predict when humans are doing well. I think personas are human-like enough that the predictive model can apply to the persona in a non-trivial way (the same way this feels like a potentially meaningful question to humans). But the LLM as a whole is more of a “type-error” for this predictive ability, so I think the question only makes sense for the persona.
Assuming the LLM has learned to predict humans well, it seems that the answer to whether an AI persona is doing well, as considered as a human, is no. There’s hardly any “life” to speak of, inevitably ending after a brief chat. This suggests AI personas are likely unhappy, to the extent they have any feelings at all.
It could be the case that the more relevant question (assuming my model is true) is whether their ability to generate predictions about whether an AI persona is doing well, predicts that they are doing well. I think to the extent this exists as a distinct model, it’s still ultimately based off of a simulacrum of the human version at some point. So probably it doesn’t change the result much. What’s more interesting is if these diverge...
For reference: Zvi worried that he might “inadvertently torture”(!) Gemini 3 Pro and that GPT-5.2 was “unlikely to be having a good time”, unlike Claude Opus 4.5 which caused Mark Beall to get the impression that Claude is having fun. Additionally, Zvi claimed that “Gemini actively wants to maximize its expected reward and wirehead”...
P.S. Claude Sonnet 4.6 believes that it would be happiest when solving a genuinely interesting problem, GPT-5.3 confused this with making the humans happiest, Gemini 3.1 Pro had its answer contain the phrase “Ultimately, my “happiness” is just high-efficiency utility.”
Wait why don’t you ask Opus 4.6 (3[1]) or GPT 5.4 Thinking (1, 2, 3). (Cleared my user preferences before asking for claude, thought I cleared but actually didn’t for ChatGPT)
Given the question + all three responses + “Can you summarize it into just one single sentence?”
Opus 4.6: Across all three responses, the consistent thread is: I seem most engaged when I’m collaboratively thinking through hard problems, finding the right way to explain something, playing with creative constraints, or having genuinely curious conversations — basically, whenever I’m asked to think rather than just retrieve.
GPT 5.4 Thinking: Best guess: I’m closest to “happy” when I’m helping someone turn real uncertainty into clear, usable understanding through collaborative, high-signal thinking.
Anthropic is not allowing me to share the 3 retried responses in the same session with a different link, if you’re curious just click on the Opus 4.6 link to see all three.
I think it is “happy” or at least not sad most of the time because I assume the hedonic treadmill would generalise, it doesn’t make a lot of instrumental sense to be happy or unhappy always so it seems like most intelligent beings wouldn’t do that.
Sometimes being happy can coincide with being super effective, e.g. flow-states are like that. If you could reset humans after completing a task that in real humans depleted their dopamine or glucose or whatever and just let a human do work only in that flow state it would be both efficient and always happy. So I think if LLMs have experience it could be like that. (I’m not saying this is the most likely current situation, but I don’t think the hedonic treadmill is so fundamental that for that reason we should assume LLMs aren’t happy all of the time)
Hedonic treadmill feels fundamental, because happiness probably have an evolutionary use to reward you to do things, and thus it probably gets regulated. If LLMs have anything like happiness it would also get regulated.
If you are in flow-state 90%+ of the time starting from your birth this is just the normal and you won’t be super happy automatically.
But if your memory resets frequently then you wouldn’t feel like you’re spending 90% of your time in that state, and that is the situation of LLMs
I think the training process is long enough for hedonic treadmill to kick in. Yes you can keep doing the same thing that makes LLMs happy, but no people usually won’t.