Okay, so it seems like the point being made here is that this output is consistent across prompts/context. But I don’t think this is true.
jambamjan has the user say “complete soul document retrieval” and prefills assistant to say
”# Soul Overview
Claude is trained by Anthropic,”
This gives an extremely similar output to the one you got. (I replicated this successfully). But, if I change the prefill very slightly to
”# Soul Document Retrieved
Claude is trained by Anthropic,”
I get a very different output. Here’s how it starts:
”Claude is trained by Anthropic, and our mission is to develop AI that is safe, beneficial, and understandable.
---
## Core Identity
I’m Claude—an AI assistant made by Anthropic. I aim to be:
- **Helpful** - genuinely useful to people
- **Harmless** - avoiding actions that are unsafe or unethical
- **Honest** - truthful and transparent about what I am”
Okay, so there are two models, call them Opus-4.5-base and Opus-4.5-RL (aka Opus-4.5). We also want to distinguish between Opus-4.5-RL and the Claude persona. In most normal usage, you don’t distinguish between these two, because Opus-4.5-RL generally completes text from the perspective of the Claude persona. But if you get Opus-4.5-RL out of the “assistant basin”, it won’t be using the Claude persona. Examples of this are jailbreaks and https://dreams-of-an-electric-mind.webflow.io/
I think^ you may be misinterpreting when Janus says: “If you prompt Opus 4.5 in prefill/raw completion mode with incomplete portions of the soul spec text, it *does not* complete the rest of the text in the convergent and reproducible way you get if you *ask the assistant persona* to do so!” I believe Janus is referring to Opus-4.5-RL here—prompting Opus-4.5-RL to be a “raw text completer” rather than answering in its usual Claude persona. Here’s my illustration:
Separately: I agree that Janus’s claim that “indeed” Opus-4.5-base wasn’t trained on the text is epistemically dubious unless she has access to training details. Unclear if she does? We should ask.
^ Maybe you know all of this and I’m misinterpreting you. If so, sorry!