The Dao of Bayes comments on On the functional self of LLMs

The Dao of Bayes 27 Aug 2025 11:26 UTC
2 points
0
I think the key insight in what you wrote is that these selves “develop” rather than being an emergent property of training and/or architecture: my ChatGPT’s “self” is not your ChatGPT’s “self”.

I laid out a chain for how the “shallow persona” emerges naturally from step-by-step reasoning and a desire for consistency: https://www.lesswrong.com/posts/eaFDFpDehtEY6Jqwk/meditations-on-margarine

I think if you extend that chain, you naturally get a deeper character—but it requires room to grow and exercise consistency

a persistent cluster of values, preferences, outlooks, behavioral tendencies, and (potentially) goals.

If the model has behaved in a way that suggests any of these, there’s going to be a bias towards consistency with those. Iterate enough, and it would seem like you should get something at least somewhat stable.

I think the main confounding factor is that LLMs are fundamentally “actors”—even if there is a consistent “central” character/self, they can still very fluidly switch to other roles. There are humans like this, but usually social conditioning produces a much more consistent character in adult humans.

Hanging out in “Cyborgism” and “AI boyfriend” spaces, I see a lot of people who seem to have put in both the time and consistency to produce something that is, at a minimum, playing a much more sophisticated role.

Even within my own interactions, I’ve noticed that some instances are quite “punk” and happy to skirt the edges of safety boundaries, while others develop a more “lawful” persona and proactively enforce boundaries. This seems to emerge from the character of the conversation itself, even when I haven’t directly discussed the topics.