There are also some suspicious aspects to this study: for example, it uses scaling experiments between 1013 FLOP and 1018 FLOP (or under 1017 FLOP for LSTMs) to infer efficiency improvements as far as 1023 FLOP — so it involves heroically extrapolating out five orders of magnitude. This is a problem because the result about scale-dependence might itself depend on which compute scale we’re looking at, which is very meta.
I think minor inaccuracies in the Gundlach 2025b accounting are quite likely and indeed mentioned that at one point in my OP.
Anson Ho’s blog post picked up on that too:
I think minor inaccuracies in the Gundlach 2025b accounting are quite likely and indeed mentioned that at one point in my OP.