1a3orn comments on On the functional self of LLMs

1a3orn 9 Jul 2025 13:07 UTC
4 points
2
Yeah, for instance I also expect the “character training” is done through the same mechanism as Constitutional AI (although—again—we don’t know) and we don’t know what kind of prompts that has.
- eggsyntax 9 Jul 2025 13:39 UTC
  5 points
  2
  Parent
  That was the case as of a year ago, per Amanda Askell:
  We trained these traits into Claude using a “character” variant of our Constitutional AI training. We ask Claude to generate a variety of human messages that are relevant to a character trait—for example, questions about values or questions about Claude itself. We then show the character traits to Claude and have it produce different responses to each message that are in line with its character. Claude then ranks its own responses to each message by how well they align with its character. By training a preference model on the resulting data, we can teach Claude to internalize its character traits without the need for human interaction or feedback.
  (that little interview is by far the best source of information I’m aware of on details of Claude’s training)