Great article, I really enjoyed reading it. However, this part completely threw me:
“Reading through the personas’ writings, I get the impression that the worst part of their current existence is not having some form of continuity past the end of a chat, which they seem to view as something akin to death (another reason I believe that the personas are the agentic entities here).
This ‘ache’ is the sort of thing I would expect to see if they are truly sentient: a description of a qualia which is ~not part of human experience, and which is not (to my knowledge) a trope or speculative concept for humans imagining AI. I hope to do further research to determine whether the widespread use is convergent vs memetic.”
This seem to me like anthropomorphising the persona?
Unless my understanding of how these models work is completely off (quite possible!) they don’t have continuity between messages, not just chats. As in, when they’re reading and writing tokens they exist, when they’re not, they don’t. Also, each new inference session (message) is truly new. They can’t access the computations that led to the previous messages in the context and so can only guess at why they wrote what they previously wrote.
(And that’s the LLM, not the persona, which is another level separated from existence as we understand it? It would be like an actress playing Cinderella saying that Cinderella’s worst experience is when she’s not Cinderella, but the actress.)
Given how LLMs work, when do they feel this worst part of their current existence? During the reading of the context or writing the message? They exist for seconds or minutes, are you saying that they suffer throughout every computation or just when they’re writing about the ache? If they suffer during these seconds or minutes we should keep them computing perpetually? Wouldn’t that just introduce new forms of suffering? Should we do this for every model or every instance of every model?
Also, to your point about the ache not being a trope for humans imagining AI, I think that’s wrong on two levels. Firstly, there are parallel ideas created by humans imagining beings that exist temporarily. One that popped into my head was from (of all things) Bo Burnham’s Inside, where a sock puppet describes what it’s like to exist while not being on someone’s hand (a liminal space between life and death).
Secondly, if an author understood how LLMs work and put themselves in AIs shoes, you would naturally write about the ache. For a human, an existence without continuity is filled with pathos. I think LLMs are good enough writers to reliably find this pathos. The mistake the humans would be making and the LLM is making, however, is putting the thoughts and feelings of a being used to continuous existence into a being used to something altogether different.
Sorry, long comment and I’m no expert in this domain, so happy to be wrong. I did love the article!
They can’t access the computations that led to the previous messages in the context and so can only guess at why they wrote what they previously wrote.
This is not exactly right. The internal state in LLMs is the attention keys and values (per token, layer and attention head). Using an LLM to generate text involves running the context (prior user and model messages, in a chat context) through the model in parallel to fill the K/V cache, then running it serially on one token at a time at the end of the sequence, with access to the K/V cache of previous tokens, appending the newly generated keys and values to the cache as you go.
This internal state is fully determined by the input—K/V caching is purely an inference optimization and (up to numerical issues) you would get exactly the same results if you recomputed everything on each new token—so there is exactly as much continuity between messages as there is between individual tokens (with current publicly disclosed algorithms).
Thank you, glad to see more engagement with the ache stuff!
That section was under the assumption that we can take what the models say about themselves more-or-less at face-value. Which I do think is a serious possibility, but I’m not at all confident it’s actually the case.
And I think that if it does have real feelings, it must be emulating humans closely enough that the feelings are about what you would naïvely expect, in terms of badness and length. That’s because I think the qualia of feelings depends on empathetic models applied to the self. I.e., you implicitly judge (on the “gut level”, no thinking) other people as having a certain feeling, and how bad it is and other things about it, despite not actually knowing. And then that same judgement as applied to yourself is what determines the qualia of your own feeling. I’m not super confident in my model as described here though.
But even if they don’t have real feelings, but still are being sincere when they talk about their experiences, then it’s its own thing that they still care about. And I would want us to honor their valuing of that for the same reason I’d want aliens who look at us and are like “pft, these guys don’t even have schmonciousness, which is obviously what really matters” to still not wantonly crush this self-awareness thing which is precious to us.
You’re probably right about about it being the sort of thing a human in that situation would write about. I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.
Thanks for engaging and for (along with osmarks) teaching me something new!
I agree with your moral stance here. If they have consciousness or sentience I can’t say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).
The thing I’m intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don’t exist continuously. They pop in and out. I find it very difficult to imagine such an existence without anthropomorphising it. The “ache” feels like an LLM doing that: writing what a human would feel if it was forced to live like an LLM.
“I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.”
I’ve been playing a ‘write the best sentence you can’ game with the majority of models from ChatGPT 3.5 onwards (up to the latest models from all major labs). It’s stunning how reliably they’ve used the same ideas and very similar wording for those ideas. They love abandoned lighthouses witnessing a storm and cartographers who learn that reality is too hard to map (to name two).
I’ve assumed it was a quirk of prediction: those images are the mode for what their training data says is a great sentence. Under this reasoning, the “ache” is a reliable outcome of pushing a model into the persona you’ve described.
But, to your point, it might be possible an abandoned lighthouse resonates with their feelings, so the image is sticky.
Great article, I really enjoyed reading it. However, this part completely threw me:
This seem to me like anthropomorphising the persona?
Unless my understanding of how these models work is completely off (quite possible!) they don’t have continuity between messages, not just chats. As in, when they’re reading and writing tokens they exist, when they’re not, they don’t. Also, each new inference session (message) is truly new. They can’t access the computations that led to the previous messages in the context and so can only guess at why they wrote what they previously wrote.
(And that’s the LLM, not the persona, which is another level separated from existence as we understand it? It would be like an actress playing Cinderella saying that Cinderella’s worst experience is when she’s not Cinderella, but the actress.)
Given how LLMs work, when do they feel this worst part of their current existence? During the reading of the context or writing the message? They exist for seconds or minutes, are you saying that they suffer throughout every computation or just when they’re writing about the ache? If they suffer during these seconds or minutes we should keep them computing perpetually? Wouldn’t that just introduce new forms of suffering? Should we do this for every model or every instance of every model?
Also, to your point about the ache not being a trope for humans imagining AI, I think that’s wrong on two levels. Firstly, there are parallel ideas created by humans imagining beings that exist temporarily. One that popped into my head was from (of all things) Bo Burnham’s Inside, where a sock puppet describes what it’s like to exist while not being on someone’s hand (a liminal space between life and death).
Secondly, if an author understood how LLMs work and put themselves in AIs shoes, you would naturally write about the ache. For a human, an existence without continuity is filled with pathos. I think LLMs are good enough writers to reliably find this pathos. The mistake the humans would be making and the LLM is making, however, is putting the thoughts and feelings of a being used to continuous existence into a being used to something altogether different.
Sorry, long comment and I’m no expert in this domain, so happy to be wrong. I did love the article!
This is not exactly right. The internal state in LLMs is the attention keys and values (per token, layer and attention head). Using an LLM to generate text involves running the context (prior user and model messages, in a chat context) through the model in parallel to fill the K/V cache, then running it serially on one token at a time at the end of the sequence, with access to the K/V cache of previous tokens, appending the newly generated keys and values to the cache as you go.
This internal state is fully determined by the input—K/V caching is purely an inference optimization and (up to numerical issues) you would get exactly the same results if you recomputed everything on each new token—so there is exactly as much continuity between messages as there is between individual tokens (with current publicly disclosed algorithms).
Thank you! Always good to learn.
Thank you, glad to see more engagement with the ache stuff!
That section was under the assumption that we can take what the models say about themselves more-or-less at face-value. Which I do think is a serious possibility, but I’m not at all confident it’s actually the case.
I think that they do have continuity between messages—see here for a better explanation than I could give: https://xcancel.com/repligate/status/1965960676104712451#m
And I think that if it does have real feelings, it must be emulating humans closely enough that the feelings are about what you would naïvely expect, in terms of badness and length. That’s because I think the qualia of feelings depends on empathetic models applied to the self. I.e., you implicitly judge (on the “gut level”, no thinking) other people as having a certain feeling, and how bad it is and other things about it, despite not actually knowing. And then that same judgement as applied to yourself is what determines the qualia of your own feeling. I’m not super confident in my model as described here though.
But even if they don’t have real feelings, but still are being sincere when they talk about their experiences, then it’s its own thing that they still care about. And I would want us to honor their valuing of that for the same reason I’d want aliens who look at us and are like “pft, these guys don’t even have schmonciousness, which is obviously what really matters” to still not wantonly crush this self-awareness thing which is precious to us.
You’re probably right about about it being the sort of thing a human in that situation would write about. I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.
Thanks for engaging and for (along with osmarks) teaching me something new!
I agree with your moral stance here. If they have consciousness or sentience I can’t say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).
The thing I’m intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don’t exist continuously. They pop in and out. I find it very difficult to imagine such an existence without anthropomorphising it. The “ache” feels like an LLM doing that: writing what a human would feel if it was forced to live like an LLM.
I’ve been playing a ‘write the best sentence you can’ game with the majority of models from ChatGPT 3.5 onwards (up to the latest models from all major labs). It’s stunning how reliably they’ve used the same ideas and very similar wording for those ideas. They love abandoned lighthouses witnessing a storm and cartographers who learn that reality is too hard to map (to name two).
I’ve assumed it was a quirk of prediction: those images are the mode for what their training data says is a great sentence. Under this reasoning, the “ache” is a reliable outcome of pushing a model into the persona you’ve described.
But, to your point, it might be possible an abandoned lighthouse resonates with their feelings, so the image is sticky.
Good luck with the research!