I wouldn’t say that people in labs don’t care about benchmarks but I think the perception of the degree we care about it is exaggerated. Frontier labs are now a multi billion business with hundreds of millions of users. A normal user trying to decide if to use a model from provider A or B doesn’t know or care about benchmark results.
We do have reasons to care about of horizon tasks in general and tasks related to AI R&D in particular (as we have been open about) but the METR benchmark has nothing to do with it.
I wouldn’t say that people in labs don’t care about benchmarks but I think the perception of the degree we care about it is exaggerated. Frontier labs are now a multi billion business with hundreds of millions of users. A normal user trying to decide if to use a model from provider A or B doesn’t know or care about benchmark results.
We do have reasons to care about of horizon tasks in general and tasks related to AI R&D in particular (as we have been open about) but the METR benchmark has nothing to do with it.