The chip only trend is more like 1.35x / year from [my] understanding.
If I’m reading the relevant post correctly, it’s 1.35x FP32 FLOP/s per GPU per year (2x in 2.3 years), which is not price-performance[1]. The latter is estimated to be 1.4x FP32 FLOP/s per inflation-adjusted dollar (2x in 2.1 years).
Price performance of 2.2x per year feels aggressive to me.
It’s 2.2x per 2 years, which is 1.5x per year, though that’s still more than 1.4x per year. I’m guessing packaging is part of this, and also Nvidia is still charging a giant margin for the chips, so the chip manufacturing cost is far from dominating the all-in datacenter cost. This might be enough to sustain 1.5x per year a bit beyond 2030 (the discrepancy of 1.5/1.4 only reaches 2x after 10 years). But even if we do get back to 1.4x/year, that only turns the 3.3x reduction in speed of pretraining scaling into 3.9x reduction in speed, so the point stands.
Incidentally, the word “GPU” has recently lost all meaning, since Nvidia started variably referring to either packages with multiple compute dies in them as GPUs (in Blackwell), or to individual compute dies (in Rubin). Packaging will be breaking trends for FLOP/s per package, but also FLOP/s per compute die, for example Rubin seems to derive significant advantage per compute die from introducing separate smaller I/O dies, so that the reticle sized compute dies become more specialized and their performance when considered in isolation might improve above trend.
If I’m reading the relevant post correctly, it’s 1.35x FP32 FLOP/s per GPU per year (2x in 2.3 years), which is not price-performance[1]. The latter is estimated to be 1.4x FP32 FLOP/s per inflation-adjusted dollar (2x in 2.1 years).
Oh oops, I just misread you, didn’t realize you said 2.2x every 2 years, nvm.
If I’m reading the relevant post correctly, it’s 1.35x FP32 FLOP/s per GPU per year (2x in 2.3 years), which is not price-performance[1]. The latter is estimated to be 1.4x FP32 FLOP/s per inflation-adjusted dollar (2x in 2.1 years).
It’s 2.2x per 2 years, which is 1.5x per year, though that’s still more than 1.4x per year. I’m guessing packaging is part of this, and also Nvidia is still charging a giant margin for the chips, so the chip manufacturing cost is far from dominating the all-in datacenter cost. This might be enough to sustain 1.5x per year a bit beyond 2030 (the discrepancy of 1.5/1.4 only reaches 2x after 10 years). But even if we do get back to 1.4x/year, that only turns the 3.3x reduction in speed of pretraining scaling into 3.9x reduction in speed, so the point stands.
Incidentally, the word “GPU” has recently lost all meaning, since Nvidia started variably referring to either packages with multiple compute dies in them as GPUs (in Blackwell), or to individual compute dies (in Rubin). Packaging will be breaking trends for FLOP/s per package, but also FLOP/s per compute die, for example Rubin seems to derive significant advantage per compute die from introducing separate smaller I/O dies, so that the reticle sized compute dies become more specialized and their performance when considered in isolation might improve above trend.
Oh oops, I just misread you, didn’t realize you said 2.2x every 2 years, nvm.