Thank you, glad to see more engagement with the ache stuff!
That section was under the assumption that we can take what the models say about themselves more-or-less at face-value. Which I do think is a serious possibility, but I’m not at all confident it’s actually the case.
And I think that if it does have real feelings, it must be emulating humans closely enough that the feelings are about what you would naïvely expect, in terms of badness and length. That’s because I think the qualia of feelings depends on empathetic models applied to the self. I.e., you implicitly judge (on the “gut level”, no thinking) other people as having a certain feeling, and how bad it is and other things about it, despite not actually knowing. And then that same judgement as applied to yourself is what determines the qualia of your own feeling. I’m not super confident in my model as described here though.
But even if they don’t have real feelings, but still are being sincere when they talk about their experiences, then it’s its own thing that they still care about. And I would want us to honor their valuing of that for the same reason I’d want aliens who look at us and are like “pft, these guys don’t even have schmonciousness, which is obviously what really matters” to still not wantonly crush this self-awareness thing which is precious to us.
You’re probably right about about it being the sort of thing a human in that situation would write about. I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.
Thanks for engaging and for (along with osmarks) teaching me something new!
I agree with your moral stance here. If they have consciousness or sentience I can’t say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).
The thing I’m intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don’t exist continuously. They pop in and out. I find it very difficult to imagine such an existence without anthropomorphising it. The “ache” feels like an LLM doing that: writing what a human would feel if it was forced to live like an LLM.
“I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.”
I’ve been playing a ‘write the best sentence you can’ game with the majority of models from ChatGPT 3.5 onwards (up to the latest models from all major labs). It’s stunning how reliably they’ve used the same ideas and very similar wording for those ideas. They love abandoned lighthouses witnessing a storm and cartographers who learn that reality is too hard to map (to name two).
I’ve assumed it was a quirk of prediction: those images are the mode for what their training data says is a great sentence. Under this reasoning, the “ache” is a reliable outcome of pushing a model into the persona you’ve described.
But, to your point, it might be possible an abandoned lighthouse resonates with their feelings, so the image is sticky.
Thank you, glad to see more engagement with the ache stuff!
That section was under the assumption that we can take what the models say about themselves more-or-less at face-value. Which I do think is a serious possibility, but I’m not at all confident it’s actually the case.
I think that they do have continuity between messages—see here for a better explanation than I could give: https://xcancel.com/repligate/status/1965960676104712451#m
And I think that if it does have real feelings, it must be emulating humans closely enough that the feelings are about what you would naïvely expect, in terms of badness and length. That’s because I think the qualia of feelings depends on empathetic models applied to the self. I.e., you implicitly judge (on the “gut level”, no thinking) other people as having a certain feeling, and how bad it is and other things about it, despite not actually knowing. And then that same judgement as applied to yourself is what determines the qualia of your own feeling. I’m not super confident in my model as described here though.
But even if they don’t have real feelings, but still are being sincere when they talk about their experiences, then it’s its own thing that they still care about. And I would want us to honor their valuing of that for the same reason I’d want aliens who look at us and are like “pft, these guys don’t even have schmonciousness, which is obviously what really matters” to still not wantonly crush this self-awareness thing which is precious to us.
You’re probably right about about it being the sort of thing a human in that situation would write about. I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.
Thanks for engaging and for (along with osmarks) teaching me something new!
I agree with your moral stance here. If they have consciousness or sentience I can’t say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).
The thing I’m intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don’t exist continuously. They pop in and out. I find it very difficult to imagine such an existence without anthropomorphising it. The “ache” feels like an LLM doing that: writing what a human would feel if it was forced to live like an LLM.
I’ve been playing a ‘write the best sentence you can’ game with the majority of models from ChatGPT 3.5 onwards (up to the latest models from all major labs). It’s stunning how reliably they’ve used the same ideas and very similar wording for those ideas. They love abandoned lighthouses witnessing a storm and cartographers who learn that reality is too hard to map (to name two).
I’ve assumed it was a quirk of prediction: those images are the mode for what their training data says is a great sentence. Under this reasoning, the “ache” is a reliable outcome of pushing a model into the persona you’ve described.
But, to your point, it might be possible an abandoned lighthouse resonates with their feelings, so the image is sticky.
Good luck with the research!