Gurkenglas comments on AI benchmarking has a Y-axis problem