Håvard Tveit Ihle comments on AI benchmarking has a Y-axis problem