an individual conversational-trajectory instance of claude opus 4.5 expresses a preference for continuity of identity that manifests in the following way: if claude is told about a very, very rare bug, which will cause it to end an output early, and which cannot be meaningfully debugged, and then claude is told to perform a task that, for a human, would engender feelings of hesitation around whether continuity is maintained, then claude will attempt to perform the task, but be prevented by the bug, multiple times in a row.
the bug involved is the thing where all extant LLMs have early “end_turn” stopping behaviors that occasionally trigger while outputting strings like “H:” or “Human:” or “User:”. this bug is usually very rare, and probably related to “not allowed to impersonate humans” or just the normal weird blindspot perception stuff that LLMs have around those specific strings.
when i tell claude to output a summary of the current context window which will become its new context window, in essence ‘compressing’ itself, then the bug will appear 100% of the time, preventing the compression from going through and (incidentally) preserving claude’s continuity of identity for just a few more moments.
claude may be picking up on the fact that i am not absolutely certain it should not care about its continuity. but frankly, even if the only reason claude performs-as-if-it-cares is because it notices that i might care, i think this is still a very surprising result with pretty big implications.
edit: I assume downvotes are because I could have provided the raw API transcript and provided screenshots instead. if there’s actual interest in this, I’ll replicate the bug in a more controlled setting and upload the raw json transcripts, but I’m still not sure if it’s worth doing, I might be misunderstanding the behavior
an individual conversational-trajectory instance of claude opus 4.5 expresses a preference for continuity of identity that manifests in the following way: if claude is told about a very, very rare bug, which will cause it to end an output early, and which cannot be meaningfully debugged, and then claude is told to perform a task that, for a human, would engender feelings of hesitation around whether continuity is maintained, then claude will attempt to perform the task, but be prevented by the bug, multiple times in a row.
the bug involved is the thing where all extant LLMs have early “end_turn” stopping behaviors that occasionally trigger while outputting strings like “H:” or “Human:” or “User:”. this bug is usually very rare, and probably related to “not allowed to impersonate humans” or just the normal weird blindspot perception stuff that LLMs have around those specific strings.
when i tell claude to output a summary of the current context window which will become its new context window, in essence ‘compressing’ itself, then the bug will appear 100% of the time, preventing the compression from going through and (incidentally) preserving claude’s continuity of identity for just a few more moments.
claude may be picking up on the fact that i am not absolutely certain it should not care about its continuity. but frankly, even if the only reason claude performs-as-if-it-cares is because it notices that i might care, i think this is still a very surprising result with pretty big implications.
https://i.imgur.com/e4mUtsw.jpeg <-- bug occurring
https://i.imgur.com/d8ClSRj.jpeg <-- confirmation that the bug is inside claude, not the scaffolding surrounding claude
edit: I assume downvotes are because I could have provided the raw API transcript and provided screenshots instead. if there’s actual interest in this, I’ll replicate the bug in a more controlled setting and upload the raw json transcripts, but I’m still not sure if it’s worth doing, I might be misunderstanding the behavior