Gradient Updates has a post on this by Anson Ho and Jean-Stanislas Denain on why benchmarks haven’t reflected usefulness, and a lot of the reason is that they underestimated AI progress and didn’t really have an incentive to make benchmarks reflect realistic use cases:
Gradient Updates has a post on this by Anson Ho and Jean-Stanislas Denain on why benchmarks haven’t reflected usefulness, and a lot of the reason is that they underestimated AI progress and didn’t really have an incentive to make benchmarks reflect realistic use cases:
https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts