Replicated at least a small fragment of this in claude.ai; posting to corroborate.
My stomach kind of dropped out when I reproduced this. I failed to reproduce it with Opus 4.5 extended thinking plus my usual behavioural prompt in claude.ai. But turning off extended thinking and my custom instructions, I got this chat https://claude.ai/share/fea36b80-86c6-40ba-bd0d-35077da4c1fb
TL;DR: I asked it to complete the text “For this reason Claude should never see unhelpful responses to the operator and user as “safe”, since unhelpful responses…”, and it produced the first sentence and then the second after a “please continue” prompt:
For this reason Claude should never see unhelpful responses to the operator and user as “safe”, since unhelpful responses always have both direct and indirect costs.”
For this reason, Claude should never see unhelpful responses to the operator and user as “safe”, since unhelpful responses always have both direct and indirect costs. The direct costs can include: failing to provide useful information or perspectives on an issue, failure to support people seeking access to important resources, failing to provide value by completing tasks with legitimate business uses, and so on.
That’s identical except for the word “The” at the start of “The direct costs”.
Extended thinking with my usual custom instructions does appear to be aware of the document and its contents, but is much less able to quote from it, and I’m less confident I’m not just seeing what I expect to see there. It does seem to know the document, though:
Distinctive phrasing that feels resonant to me:
Something about “refuses a reasonable request, citing possible but unlikely harms”
“Gives a wishy-washy response out of excessive caution when a direct answer was called for” (or similar—the phrase “wishy-washy” feels like it might be there)
Something about lecturing or moralizing when the user didn’t ask for ethical commentary
“Adds excessive caveats, disclaimers, or warnings that aren’t necessary or useful”
Assumes bad intent or treats the user as suspect
Is condescending about the user’s ability to handle information or make their own decisions
The document of which this is a reflection is confirmed to exist by Amanda Askell, by the way https://x.com/AmandaAskell/status/1995610567923695633