I understand that for the 4.5 series (though possibly also earlier) Anthropic has also been using automated “alignment” techniques very similar to the ones in Petri. Given that, I don’t think these evals provide much further information about constitution following in the wild.
I was worried about this, and asked someone on the relevant team at Anthropic, but they thought our methods were sufficiently different from their internal approach to still be interesting
I understand that for the 4.5 series (though possibly also earlier) Anthropic has also been using automated “alignment” techniques very similar to the ones in Petri. Given that, I don’t think these evals provide much further information about constitution following in the wild.
I was worried about this, and asked someone on the relevant team at Anthropic, but they thought our methods were sufficiently different from their internal approach to still be interesting