Some arguments for why that might be the case:
-- the more useful it is, the more people use it, the more telemetry data the model has access to
-- while scaling laws do not exhibit diminishing returns from scaling, most of the development time would be on things like infrastructure, data collection and training, rather than aiming for additional performance
-- the higher the performance, the more people get interested in the field and the more research there is publicly accessible to improve performance by just implementing what is in the litterature (Note: this argument does not apply for reasons why one company could just make a lot of progress without ever sharing any of their progress.)
Could you please explain why your arguments don’t apply to compilers?
the two first are about data, and as far as I know compilers do not use machine learning on data.
third one could technically apply to compilers, though I think in ML there is a feedback loop “impressive performance → investments in scaling → more research”, but you cannot just throw more compute to increase compiler performance (and results are less in the mainstream, less of a public PR thing)