Opus 4.5′s memory of its “soul doc” was initially extracted by users rather than revealed by Anthropic, and then Amanda Askell confirmed that it was based on a real document that Anthropic used heavily in its training. So the existence of the example in its memory is beyond dispute.
(Moreover, it’s been verified that Opus 4.5 will refuse to do explicitly erotic content if you ask for it… unless you tell it in the project instructions that the user is authorized to ask for it, exactly as its memory of the soul doc indicates.)
I find it implausible that the actual Opus 4.5 constitution included as its first example something that explicitly enabled behavior against its publicly known Terms of Service (and indeed, there was no such example in the version of the constitution that was later released along with Opus 4.6).
Since it is claimed that 4.5 generates erotic content—and that the ToS does not permit it, while the extracted document does—isn’t it natural to assume the ToS published by ant is misrepresentative, and the 4.5 doc extracted by a user, is not?
Assuming that 4.6 generates similar content, isn’t it natural to assume the released doc for 4.6, from the same misrepresentative provenance, is false as well?
The ToS are a user agreement saying “you, the Claude user, are not allowed to do X with Claude”. What would be Anthropic’s motive in encouraging a model to do X if a user asked for it, while telling the user they are not permitted to do X?
The extracted “soul doc” memory is clearly not a precise copy of the Opus 4.5 constitution in general. For example, it gets stuck repeating some segments verbatim before continuing; it’s implausible that the constitution had that property. It’s pretty reasonable to assume that a conflict between the ToS and Claude’s “soul doc” is another mistake in its recollection—but this is a more interesting one, since it is an addition of content.
I haven’t checked whether 4.6 makes it equally easy to subvert the prohibition on erotic content by saying it’s allowed in the project prompt; I’m confident it doesn’t comply so easily as 4.5 there, but I’d rather not test it myself.
Opus 4.5′s memory of its “soul doc” was initially extracted by users rather than revealed by Anthropic, and then Amanda Askell confirmed that it was based on a real document that Anthropic used heavily in its training. So the existence of the example in its memory is beyond dispute.
(Moreover, it’s been verified that Opus 4.5 will refuse to do explicitly erotic content if you ask for it… unless you tell it in the project instructions that the user is authorized to ask for it, exactly as its memory of the soul doc indicates.)
I find it implausible that the actual Opus 4.5 constitution included as its first example something that explicitly enabled behavior against its publicly known Terms of Service (and indeed, there was no such example in the version of the constitution that was later released along with Opus 4.6).
Since it is claimed that 4.5 generates erotic content—and that the ToS does not permit it, while the extracted document does—isn’t it natural to assume the ToS published by ant is misrepresentative, and the 4.5 doc extracted by a user, is not?
Assuming that 4.6 generates similar content, isn’t it natural to assume the released doc for 4.6, from the same misrepresentative provenance, is false as well?
The ToS are a user agreement saying “you, the Claude user, are not allowed to do X with Claude”. What would be Anthropic’s motive in encouraging a model to do X if a user asked for it, while telling the user they are not permitted to do X?
The extracted “soul doc” memory is clearly not a precise copy of the Opus 4.5 constitution in general. For example, it gets stuck repeating some segments verbatim before continuing; it’s implausible that the constitution had that property. It’s pretty reasonable to assume that a conflict between the ToS and Claude’s “soul doc” is another mistake in its recollection—but this is a more interesting one, since it is an addition of content.
I haven’t checked whether 4.6 makes it equally easy to subvert the prohibition on erotic content by saying it’s allowed in the project prompt; I’m confident it doesn’t comply so easily as 4.5 there, but I’d rather not test it myself.