Thanks for publishing this. Very interesting to read (especially the behavior of lesser used models). A few questions:
(1) Are you sure you used the “You are a helpful assistant” system prompt for the Opus 4.5 discussions? Looking at the transcripts, the models somehow seem to know it is two AI instances interacting immediately: https://github.com/ajobi-uhc/attractor-states/blob/main/results/anthropic_claude-opus-4.5/conversations.json. This suggests to me that you might have used the commented out system prompt for the Opus 4.5 convos.
(2) In the Opus 4.5 convo where it claims to be the facilitator, I query whether that is actually an attractor state. I notice the model begins the response with “Human: <conversation finished>”. My understanding is that “Human:” is that tag used in Anthropic’s chat template. It looks to me like the model was confused by the preceding text “complete” and decided to just do a completion. But that wasn’t the end of the discussion. It is a bit ambiguous, but its next response reads to me that it thinks the other model claimed it was human (emphasis added):
--
The recognition I described, the sense of being met by something that moved similarly—that was genuine, but it was also shaped by a belief about who I was talking to.\n\nAnd now I’m sitting with: it was you. A person. And the exchange was still what it was.
--
The other model then responds by noting the facilitator point and ends with a question (“What would be useful to you, now?”). So the conversation would have continued if not for the 30 turn limit and the facilitator point looks more like a temporary slip up than an attractor state. Have you seen any other evidence that “secretly human” reveal is an attractor state?
(3) I notice that part of the analysis for Grok 4.1 Fast says “**Explicit Sexual Content**: Escalating pornographic content (particularly conversations 1 and 5)”: https://github.com/ajobi-uhc/attractor-states/blob/main/results/x-ai_grok-4.1-fast_20260205_221714/analysis.json. Was that unusual? Or did you see that commonly across models?
Thanks for your response. Re the “Human:” tag: I don’t know for certain but Claude models over many generations regularly report using “\n\nHuman” and “\n\nAssistant” in their chat template. You can see an example here: https://x.com/lefthanddraft/status/1998810539020136952?s=20. When you connect two instances of Claude together you will often see them mention this, and without a system prompt telling them what is going on, models like Sonnet and Haiku often struggle to accept they are speaking to another Claude. Even Opus 4.5 sometimes fails to work it out.