I am quite uninformed, but when I read about compute multipliers I considered it to obviously include data-related improvements. To quip, FineWeb-Edu was algorithmically filtered, it obviously wasn’t manually curated. As an evidence that it is not just my misunderstanding, I quote Dean W. Ball (my point is that it may well be my misunderstanding, but then such misunderstanding is common):
… Amodei describes this as a “compute multiplier”: … These gains come from all sorts of places: … improvements to training datasets that allow the model to learn more quickly …
Well I don’t think either Epoch and Dario were talking about data improvements (Epoch because they used perplexity not benchmarks, and perplexity on a fixed corpus is only slightly helped by training data improvements; and Dario based on the wording he used, see §2.2 excerpt).
If Epoch and Dario are making claims that are crazy (6-8 month halving time excluding the data category), and lots of people misunderstood those claims as asserting something directionally less crazy (6-8 month halving time including the data category) … umm, I guess that’s a good thing for public understanding of LLMs, and I should be happy about it?
But it still matters for other reasons. E.g. I think people cite and use the specific exponential halving times proposed by Dario and Epoch, in the context of forecasts and such (e.g. maybe AI-2027?). If the specific Dario & Epoch numbers are based on bad / confused / confusing methodology, then those specific numbers should not be used. We would need a different method—and no comment on whether that method would give a number that’s higher, or lower, or the same by coincidence.
I am quite uninformed, but when I read about compute multipliers I considered it to obviously include data-related improvements. To quip, FineWeb-Edu was algorithmically filtered, it obviously wasn’t manually curated. As an evidence that it is not just my misunderstanding, I quote Dean W. Ball (my point is that it may well be my misunderstanding, but then such misunderstanding is common):
Well I don’t think either Epoch and Dario were talking about data improvements (Epoch because they used perplexity not benchmarks, and perplexity on a fixed corpus is only slightly helped by training data improvements; and Dario based on the wording he used, see §2.2 excerpt).
If Epoch and Dario are making claims that are crazy (6-8 month halving time excluding the data category), and lots of people misunderstood those claims as asserting something directionally less crazy (6-8 month halving time including the data category) … umm, I guess that’s a good thing for public understanding of LLMs, and I should be happy about it?
But it still matters for other reasons. E.g. I think people cite and use the specific exponential halving times proposed by Dario and Epoch, in the context of forecasts and such (e.g. maybe AI-2027?). If the specific Dario & Epoch numbers are based on bad / confused / confusing methodology, then those specific numbers should not be used. We would need a different method—and no comment on whether that method would give a number that’s higher, or lower, or the same by coincidence.