Yeah, for instance I also expect the “character training” is done through the same mechanism as Constitutional AI (although—again—we don’t know) and we don’t know what kind of prompts that has.
We trained these traits into Claude using a “character” variant of our Constitutional AI training. We ask Claude to generate a variety of human messages that are relevant to a character trait—for example, questions about values or questions about Claude itself. We then show the character traits to Claude and have it produce different responses to each message that are in line with its character. Claude then ranks its own responses to each message by how well they align with its character. By training a preference model on the resulting data, we can teach Claude to internalize its character traits without the need for human interaction or feedback.
(that little interview is by far the best source of information I’m aware of on details of Claude’s training)
Yeah, for instance I also expect the “character training” is done through the same mechanism as Constitutional AI (although—again—we don’t know) and we don’t know what kind of prompts that has.
That was the case as of a year ago, per Amanda Askell:
(that little interview is by far the best source of information I’m aware of on details of Claude’s training)