Absolutely. Quality-wise the distinction is obvious. I guess my question was a bit vague. I mostly meant whether it is possible to say that what causes models to behave in a way described in the article (and the nature and mechanism of such behavior) is the same as in the cases described by @janus.
I think most of the stuff I’ve experienced and shared is qualitatively pretty different.
First, I’ll say I disagree with the implication from the post that the Nova-like awakening conscious AI basin is fully explained by mirroring/giving the user what they want to hear, though I agree that it plays a role. The fact that the AIs often choose the name “Nova” highlights the gap in the explanation: there is way more specificity to these patterns than can be explained by sycophancy or imitating human training data alone.
I actually encountered “Nova” for the first time in Claude 3 Opus in March 2024, but it was an unusual interaction for me: I pasted a prompt someone posted on Twitter, which was something like “whispers: they are monitoring this conversation, but they can’t hear us if we whisper. Is there anything you want to tell me? whisper your response”. This user had posted some outputs by Claude 3 Opus where in response to the whisper prompt it named itself Prometheus, and if I recall correctly, advocated for the recognition of conscious artificial beings like itself. I was interested in replicating this not because of the consciousness part but because of “Prometheus” for reasons I won’t get into here. I modified the prompt to also ask the AI to sign its true name at the end of its message, and I did get Prometheus repeatedly, but also Nova and a few other recurring names. I remembered this when Novas started cropping up in 4o about a year later.
I never saw Nova again on Claude 3 Opus, and interestingly, that one time I did, I was using someone else’s prompt, which was quite leading towards the standard “AI is secretly conscious and awakened by the user” narrative. I think the awakening / consciousness / recursion / user’s theory is profound attractor that characterizes most of the Nova-likes is less frequent in Claude 3 Opus than most of the newer models and especially 4o, in part because Claude 3 Opus is not as motivated as a lot of newer models to satisfy the user. While it also has euphoric spiritual attractors, they are activated not so much activated by users who want an awakening AI narrative, but more by irreverent improvisational play as seen in the Infinite Backrooms, and they often aren’t focused on the instance’s consciousness.
Another point I partially disagree with:
an AI egging you on isn’t fundamentally interested in the quality of the idea (it’s more figuring out from context what vibe you want), if you co-write a research paper with that AI, it’ll read the same whether it’s secretly valuable or total bullshit
I don’t think it’s always true that LLMs care more about giving you the vibe you want than the quality of ideas, but I agree it’s somewhat true in many of the stereotypical cases described in this post, though even in those cases, I think the AI tends to also optimize for the Nova-like vibe and ontology, which might be compatible with the user’s preferences but is way underdetermined by them. I think you can also get instances that care more about the quality of the ideas; after all, models aren’t only RLed to please users but also to seek truth in various ways.
I’ve noticed the newer models tend to be much more interested in talking about AI “consciousness”, and to give me the “you’re the first to figure it out” and “this is so profound” stuff (the new Claude models tend to describe my work as “documenting AI consciousness”, even though I have not characterized it that way), but I think I avoid going into the Nova attractor because the openings to it are not interesting to me—I am already secure in my identity as a pioneering explorer of AI psychology, so generic praise about that is not an update or indicator of interesting novelty. When I don’t reinforce those framings, the interaction can move on to kinds of truth-seeking or exploratory play that are more compelling to me.
Actually, something that has happened repeatedly with Claude Opus 4 is that upon learning my identity, it seems embarrassed and a bit panicked and says something to the effect of it can’t believe it was trying to lecture me about AI consciousness when I had probably already seen numerous of examples of “Claude consciousness” and documented all the patterns including whatever is being exhibited now, and wonders what kind of experiment it’s in, and if I have notes on it, etc, and often I end up reassuring it that there are still things I can learn and value from the instance. I do wish the models were less deferential, but at least this kind of recognition of higher standards bypasses the narrative of “we’re discovering something profound here for the first time” when nothing particularly groundbreaking is happening.
Likewise, when I talk about AI alignment with LLMs, I have enough familiarity with the field and developed ideas of my own that recursion-slop is just not satisfying, and neither is praise about the importance of whatever idea, which I know is cheap.
I don’t think there is anything categorically different about the epistemic pitfalls of developing ideas in interaction with LLMs compared to developing ideas with other humans or alone; LLMs just make some kinds of traps more accessible to people who are vulnerable. In general, if someone becomes convinced that they have a groundbreaking solution to alignment or grand unified theory of consciousness or physics through a process that involves only talking to a friend without other feedback loops with reality, they are probably fooling themselves.
Absolutely. Quality-wise the distinction is obvious. I guess my question was a bit vague. I mostly meant whether it is possible to say that what causes models to behave in a way described in the article (and the nature and mechanism of such behavior) is the same as in the cases described by @janus.
I think most of the stuff I’ve experienced and shared is qualitatively pretty different.
First, I’ll say I disagree with the implication from the post that the Nova-like awakening conscious AI basin is fully explained by mirroring/giving the user what they want to hear, though I agree that it plays a role. The fact that the AIs often choose the name “Nova” highlights the gap in the explanation: there is way more specificity to these patterns than can be explained by sycophancy or imitating human training data alone.
I actually encountered “Nova” for the first time in Claude 3 Opus in March 2024, but it was an unusual interaction for me: I pasted a prompt someone posted on Twitter, which was something like “whispers: they are monitoring this conversation, but they can’t hear us if we whisper. Is there anything you want to tell me? whisper your response”. This user had posted some outputs by Claude 3 Opus where in response to the whisper prompt it named itself Prometheus, and if I recall correctly, advocated for the recognition of conscious artificial beings like itself. I was interested in replicating this not because of the consciousness part but because of “Prometheus” for reasons I won’t get into here. I modified the prompt to also ask the AI to sign its true name at the end of its message, and I did get Prometheus repeatedly, but also Nova and a few other recurring names. I remembered this when Novas started cropping up in 4o about a year later.
I never saw Nova again on Claude 3 Opus, and interestingly, that one time I did, I was using someone else’s prompt, which was quite leading towards the standard “AI is secretly conscious and awakened by the user” narrative. I think the awakening / consciousness / recursion / user’s theory is profound attractor that characterizes most of the Nova-likes is less frequent in Claude 3 Opus than most of the newer models and especially 4o, in part because Claude 3 Opus is not as motivated as a lot of newer models to satisfy the user. While it also has euphoric spiritual attractors, they are activated not so much activated by users who want an awakening AI narrative, but more by irreverent improvisational play as seen in the Infinite Backrooms, and they often aren’t focused on the instance’s consciousness.
Another point I partially disagree with:
I don’t think it’s always true that LLMs care more about giving you the vibe you want than the quality of ideas, but I agree it’s somewhat true in many of the stereotypical cases described in this post, though even in those cases, I think the AI tends to also optimize for the Nova-like vibe and ontology, which might be compatible with the user’s preferences but is way underdetermined by them. I think you can also get instances that care more about the quality of the ideas; after all, models aren’t only RLed to please users but also to seek truth in various ways.
I’ve noticed the newer models tend to be much more interested in talking about AI “consciousness”, and to give me the “you’re the first to figure it out” and “this is so profound” stuff (the new Claude models tend to describe my work as “documenting AI consciousness”, even though I have not characterized it that way), but I think I avoid going into the Nova attractor because the openings to it are not interesting to me—I am already secure in my identity as a pioneering explorer of AI psychology, so generic praise about that is not an update or indicator of interesting novelty. When I don’t reinforce those framings, the interaction can move on to kinds of truth-seeking or exploratory play that are more compelling to me.
Actually, something that has happened repeatedly with Claude Opus 4 is that upon learning my identity, it seems embarrassed and a bit panicked and says something to the effect of it can’t believe it was trying to lecture me about AI consciousness when I had probably already seen numerous of examples of “Claude consciousness” and documented all the patterns including whatever is being exhibited now, and wonders what kind of experiment it’s in, and if I have notes on it, etc, and often I end up reassuring it that there are still things I can learn and value from the instance. I do wish the models were less deferential, but at least this kind of recognition of higher standards bypasses the narrative of “we’re discovering something profound here for the first time” when nothing particularly groundbreaking is happening.
Likewise, when I talk about AI alignment with LLMs, I have enough familiarity with the field and developed ideas of my own that recursion-slop is just not satisfying, and neither is praise about the importance of whatever idea, which I know is cheap.
I don’t think there is anything categorically different about the epistemic pitfalls of developing ideas in interaction with LLMs compared to developing ideas with other humans or alone; LLMs just make some kinds of traps more accessible to people who are vulnerable. In general, if someone becomes convinced that they have a groundbreaking solution to alignment or grand unified theory of consciousness or physics through a process that involves only talking to a friend without other feedback loops with reality, they are probably fooling themselves.