Seth Herd comments on AI companies should be safety-testing the most capable versions of their models

Seth Herd 27 Mar 2025 15:21 UTC
2 points
0
The most capable version of each model has not yet been created when the model is released. As well as fine-tuning for specific tasks, scaffolding matters. The agentic scaffolds people create have an increasingly important role in the model’s ultimate capability.
- sjadler 27 Mar 2025 19:44 UTC
  1 point
  0
  Parent
  Scaffolding for sure matters, yup!
  I think you’re generally correct that the most-capable version hasn’t been created, though there are times where AI companies do have specialized versions for a domain internally, and don’t seem to be testing these anyway. It’s reasonable IMO to think that these might outperform the unspecialized versions.