I agree that ignoring benchmarks is wrong (and think our views are kind of nearby one another in absolute terms). However, benchmarks remain pretty bad, labs continue to hill climb on them (how much is unclear, but it happens), and the authors of the most celebrated benchmarks are extremely modest about how their results ought to be interpreted.
Benchmarks show that models are getting better; how fast and at what is still pretty ambiguous when you include these considerations, imo.
I agree that ignoring benchmarks is wrong (and think our views are kind of nearby one another in absolute terms). However, benchmarks remain pretty bad, labs continue to hill climb on them (how much is unclear, but it happens), and the authors of the most celebrated benchmarks are extremely modest about how their results ought to be interpreted.
Benchmarks show that models are getting better; how fast and at what is still pretty ambiguous when you include these considerations, imo.