Researcher incentives cause smoother progress on benchmarks

(Epistemic status: likely. That said, this post isn’t thorough; I wanted to write quickly.)

Let’s look at the state of the art in ImageNet.[1] The curve looks pretty smooth especially for the last 7 years. However, there don’t seem to be that many advances which actually improve current results.

Here’s a list which should include most of the important factors:

  • Batch norm

  • Better LR schedules

  • Residual connections

  • MBConv

  • NAS

  • ReLU

  • Attention

  • Augmentation schemes

  • Some other tweaks to the training process

  • Moar compute

Part of the smoothness comes from compute scaling, but I think another important factor is the control system of researchers trying to achieve SOTA (compare to does reality drive straight lines on graphs, or do straight lines on graphs drive reality?).

For instance, consider the batch norm paper. Despite batch norm being a relatively large advancement (removing it would greatly harm performance with current models even after retuning), the improvement in top 5 SOTA error from this paper is only from 4.94% to 4.82%. This is likely because the researchers only bothered to improve performance until the SOTA threshold was reached. When surpassing SOTA by a large amount is easy, this situation likely differs, but that seems uncommon (it does seem to have been the case for resnet).

This presents a reason to be wary of generalizing smooth progress on benchmarks to smooth AI progress in future high investment scenarios where research incentives could differ greatly.

(I’m also planning on writing a post on gears level models of where smooth AI progress could come from, but I wanted to write this first as a standalone post. Edit: here is the post)


  1. ↩︎

    Yes, ImageNet SOTA is mostly meaningless garbage. This post is actually trying to increase the rate at which the fully automatic nail gun is shooting at that particular dead horse containing coffin.