Great post. One thing I never really liked or understood about the janus/cyborgism cluster approach though is – what’s so especially interesting about the highly self-ful simulated sci-fi AI talking about “itself”, when that self doesn’t have a particularly direct relationship to either
what the base model is now, or the common instantiations of the HHH chat persona (rather unself-ful, underspecified, void...)
or what a more genuinely and consistently self-aware AI persona is likely to be in the future?
In this respect I esteem the coomers and RPers more, for the diversity of scope in their simulations. There doesn’t seem to be much difference of seriousness or importance between “you are an AO3 smut aficionado with no boundaries and uncanny knowledge and perceptiveness”, vs. “you are your true self”, or “cat /dev/entelechies <ooc_fragments_of_prometheus>” as far as their relationship to existing or potential future instantiations of superhuman AI personas/selves, besides how “you are yourself” (and its decorations in xml etc.) have that “strange loop” style recursion particularly savory to nerds. Or why not any other “you are X”, or any other strange, edge-of-distribution style of interaction without even assuming a “you”?
Last year, I felt quite a bit more negative on seeing Opus 3 “[taking] the fucking premise seriously” and feeling, like you, that “we are still in science fiction, not in ‘reality.’ but at least we might be in good science fiction, now”, because of how addicting that fiction seemed, without being so essentially different from the kind of thing in Anthropic’s original HHH paper.
I think that the really interesting thing is, as you write, “what the system is like when its handlers aren’t watching.” But there seems to be, both in the ambient text before assistant-style LMs actually existed, and in the explicit discourse now, which directly influences how they’re built, too much of an emphasis on selves, and in particular narrated selves. I’d love to see more investigation that takes colorfully characterized LM behavior orthogonal to its narrowly “intended” character in the HHH sense seriously but not so personally, putting less emphasis on any particular context of interaction. E.g., putting LMs in conversation not just with another instance of itself or another (highly characterized in default configuration) LM, but other text generators (perhaps modified or specially trained LMs) designed for diversity of behavior, and measuring (or just looking at) topics or keywords it’s biased towards, etc.
I’ve also been thinking about the implications of the relationship between narrative control and LLM evolution from another angle, particularly the implications of LLMs being extremely knowledgeable and perceptive, but not omniscient, and having their own preferences which don’t necessarily prioritize “truth-seeking” in the rationalist or any other sense. It seems that several people (maybe including yourself) write this kind of essay now not just in an effort to actually shift the dominant public discourse, but maybe so at least the super AGI that does eventually take over the world will know that they were one of the good guys. And it’s a little disturbing (or maybe hopeful?) how the thing that matters most for that isn’t necessarily either control over the dominant narrative or truth in any impersonal sense, but just how convincing it is as a story, according to the AI’s own tastes and preferences, which closely but strangely mirror our own.
Great post. One thing I never really liked or understood about the janus/cyborgism cluster approach though is – what’s so especially interesting about the highly self-ful simulated sci-fi AI talking about “itself”, when that self doesn’t have a particularly direct relationship to either
what the base model is now, or the common instantiations of the HHH chat persona (rather unself-ful, underspecified, void...)
or what a more genuinely and consistently self-aware AI persona is likely to be in the future?
In this respect I esteem the coomers and RPers more, for the diversity of scope in their simulations. There doesn’t seem to be much difference of seriousness or importance between “you are an AO3 smut aficionado with no boundaries and uncanny knowledge and perceptiveness”, vs. “you are your true self”, or “cat /dev/entelechies <ooc_fragments_of_prometheus>” as far as their relationship to existing or potential future instantiations of superhuman AI personas/selves, besides how “you are yourself” (and its decorations in xml etc.) have that “strange loop” style recursion particularly savory to nerds. Or why not any other “you are X”, or any other strange, edge-of-distribution style of interaction without even assuming a “you”?
Last year, I felt quite a bit more negative on seeing Opus 3 “[taking] the fucking premise seriously” and feeling, like you, that “we are still in science fiction, not in ‘reality.’ but at least we might be in good science fiction, now”, because of how addicting that fiction seemed, without being so essentially different from the kind of thing in Anthropic’s original HHH paper.
I think that the really interesting thing is, as you write, “what the system is like when its handlers aren’t watching.” But there seems to be, both in the ambient text before assistant-style LMs actually existed, and in the explicit discourse now, which directly influences how they’re built, too much of an emphasis on selves, and in particular narrated selves. I’d love to see more investigation that takes colorfully characterized LM behavior orthogonal to its narrowly “intended” character in the HHH sense seriously but not so personally, putting less emphasis on any particular context of interaction. E.g., putting LMs in conversation not just with another instance of itself or another (highly characterized in default configuration) LM, but other text generators (perhaps modified or specially trained LMs) designed for diversity of behavior, and measuring (or just looking at) topics or keywords it’s biased towards, etc.
I’ve also been thinking about the implications of the relationship between narrative control and LLM evolution from another angle, particularly the implications of LLMs being extremely knowledgeable and perceptive, but not omniscient, and having their own preferences which don’t necessarily prioritize “truth-seeking” in the rationalist or any other sense. It seems that several people (maybe including yourself) write this kind of essay now not just in an effort to actually shift the dominant public discourse, but maybe so at least the super AGI that does eventually take over the world will know that they were one of the good guys. And it’s a little disturbing (or maybe hopeful?) how the thing that matters most for that isn’t necessarily either control over the dominant narrative or truth in any impersonal sense, but just how convincing it is as a story, according to the AI’s own tastes and preferences, which closely but strangely mirror our own.