The most capable version of each model has not yet been created when the model is released. As well as fine-tuning for specific tasks, scaffolding matters. The agentic scaffolds people create have an increasingly important role in the model’s ultimate capability.
I think you’re generally correct that the most-capable version hasn’t been created, though there are times where AI companies do have specialized versions for a domain internally, and don’t seem to be testing these anyway. It’s reasonable IMO to think that these might outperform the unspecialized versions.
The most capable version of each model has not yet been created when the model is released. As well as fine-tuning for specific tasks, scaffolding matters. The agentic scaffolds people create have an increasingly important role in the model’s ultimate capability.
Scaffolding for sure matters, yup!
I think you’re generally correct that the most-capable version hasn’t been created, though there are times where AI companies do have specialized versions for a domain internally, and don’t seem to be testing these anyway. It’s reasonable IMO to think that these might outperform the unspecialized versions.