J Bostock comments on How well do models follow their constitutions?

J Bostock 12 Mar 2026 10:24 UTC
13 points
2
I understand that for the 4.5 series (though possibly also earlier) Anthropic has also been using automated “alignment” techniques very similar to the ones in Petri. Given that, I don’t think these evals provide much further information about constitution following in the wild.
- Neel Nanda 12 Mar 2026 11:03 UTC
  11 points
  4
  Parent
  I was worried about this, and asked someone on the relevant team at Anthropic, but they thought our methods were sufficiently different from their internal approach to still be interesting