I appreciate that Anthropic’s model cards are now much more detailed than they used to be. They’ve especially improved in terms of details about CBRN evals (mostly biological risk).[1] They are substantially more detailed and informative than the model cards of other AI companies.
The Claude 3 model card gives a ~1/2 page description of their bio evaluations, though it was likely easy to rule out risk at this time. The Claude 3.5 Sonnet and 3.6 Sonnet model cards say basically nothing except that Anthropic ran CBRN evaluations and claims to be in compliance. The Claude 3.7 Sonnet model card contains much more detail on bio evals and the Claude 4 model card contains even more detail on the bio evaluations while also having a larger number of evaluations which are plausibly relevant to catastrophic misalignment risks. To be clear, some of the increase in detail might just be driven by increased capability, but the increase in detail is still worth encouraging.
Anthropic’s model cards . . . . are substantially more detailed and informative than the model cards of other AI companies.
My weakly-held cached take is: I agree on CBRN/bio (and of course alignment) and I think Anthropic is pretty similar to OpenAI/DeepMind on cyber and AI R&D (and scheming capabilities), at least if you consider stuff outside the model card (evals papers + open-sourcing the evals).
I appreciate that Anthropic’s model cards are now much more detailed than they used to be. They’ve especially improved in terms of details about CBRN evals (mostly biological risk).[1] They are substantially more detailed and informative than the model cards of other AI companies.
This isn’t to say that their model card contains a sufficient level of information or justification (at least on CBRN). We can’t seriously check their claims about the level of safety.
The Claude 3 model card gives a ~1/2 page description of their bio evaluations, though it was likely easy to rule out risk at this time. The Claude 3.5 Sonnet and 3.6 Sonnet model cards say basically nothing except that Anthropic ran CBRN evaluations and claims to be in compliance. The Claude 3.7 Sonnet model card contains much more detail on bio evals and the Claude 4 model card contains even more detail on the bio evaluations while also having a larger number of evaluations which are plausibly relevant to catastrophic misalignment risks. To be clear, some of the increase in detail might just be driven by increased capability, but the increase in detail is still worth encouraging.
My weakly-held cached take is: I agree on CBRN/bio (and of course alignment) and I think Anthropic is pretty similar to OpenAI/DeepMind on cyber and AI R&D (and scheming capabilities), at least if you consider stuff outside the model card (evals papers + open-sourcing the evals).
Following your Quick Takes has made me a way better informed “Responsible AI” lawyer than following most “AI Governance newsletters” 🙏🏼.