Claude’s constitution is a living document! Opus 4.5 was trained on an earlier iteration, and we expect future models will be trained on the then-current version of this constitution.
If Opus 4.5 was trained on this version, why didn’t it regurgitate this version when prompted, rather than the old Soul Document version which it did regurgitate?
Joe Carlsmith is listed as essentially the 2nd author of this version (after Amanda Askell), and he joined Anthropic in November, right around the same time that Opus 4.5 came out and the Soul Document was leaked. (It looks like he joined maybe a week or two before Opus 4.5 came out.) So, unless you think Joe was writing his parts while still at OpenPhil, the timeline seems pretty clear that many (if not all) of the updates to the document were generated primarily after Opus 4.5 came out.
Or are you imagining the models are further trained after their release (s.t. that even if at release it Opus was only trained on the old version, now it’s been trained on the new version)? Pretty sure they don’t do that. I believe Anthropic has explicitly stated that they don’t change model weights w/o announcing it.
I also had this question and couldn’t find an obvious answer. I recognize that to some degree this might be proprietary, but this feels like a pretty obvious comms question. It doesn’t negatively impact my opinion of Zac or Drake if they’re unable to answer given their confidentiality obligations, but I would ask them to relay back internally that not indicating one way or another looks extremely weird from a comms perspective.
It’s like Ford announcing that they’ve added airbags to their car designs. Which designs, the ones available for purchase now, or future year models? Oh, you know, just...car designs.
Were current models (e.g., Opus 4.5) trained using this updated constitution?
Claude’s constitution is a living document! Opus 4.5 was trained on an earlier iteration, and we expect future models will be trained on the then-current version of this constitution.
It was a mild positive update for me that an Anthropic employee, in this case Zac, was able to clarify this.
It seems like the answer is clearly No. Reasons:
If Opus 4.5 was trained on this version, why didn’t it regurgitate this version when prompted, rather than the old Soul Document version which it did regurgitate?
Joe Carlsmith is listed as essentially the 2nd author of this version (after Amanda Askell), and he joined Anthropic in November, right around the same time that Opus 4.5 came out and the Soul Document was leaked. (It looks like he joined maybe a week or two before Opus 4.5 came out.) So, unless you think Joe was writing his parts while still at OpenPhil, the timeline seems pretty clear that many (if not all) of the updates to the document were generated primarily after Opus 4.5 came out.
Or are you imagining the models are further trained after their release (s.t. that even if at release it Opus was only trained on the old version, now it’s been trained on the new version)? Pretty sure they don’t do that. I believe Anthropic has explicitly stated that they don’t change model weights w/o announcing it.
I also had this question and couldn’t find an obvious answer. I recognize that to some degree this might be proprietary, but this feels like a pretty obvious comms question. It doesn’t negatively impact my opinion of Zac or Drake if they’re unable to answer given their confidentiality obligations, but I would ask them to relay back internally that not indicating one way or another looks extremely weird from a comms perspective.
It’s like Ford announcing that they’ve added airbags to their car designs. Which designs, the ones available for purchase now, or future year models? Oh, you know, just...car designs.
The answer seems pretty obvious to me. Do you disagree with any of my reasoning here?