Fwiw it’s unclear to me whether epoch’s methodology would make this mistake
After 2018 they have many data points at different times and compute scales. In principle, that should allow them to disentangle alg progress from the effect of compute. If models trained with the the same compute but later in time are no better, they shouldn’t find any alg progress. (But I haven’t thought about this in a while, and the strong correlation between time and compute in their data makes their results super noisy)
Fwiw it’s unclear to me whether epoch’s methodology would make this mistake
I don’t think I accused Epoch of making any “mistakes”. I think they came up with a procedure, described it, followed it, and found a result. I only suggested that the procedure has some funny properties and hence the result is easily misunderstood.
FWIW, Anson Ho (the first author of the Epoch paper in question) wrote a blog post The Least Understood Driver of AI shortly after I published this. He talks about Gundlach et al. 2025b, and he has some nitpicks but mostly endorses the paper, and in particular he endorses the paper’s main point about how his own paper’s methodology interacts with “scale-dependent” algorithmic improvements.
If models trained with the the same compute but later in time are no better, they shouldn’t find any alg progress.
Absolutely everyone including me agrees that (1) models trained with the same compute later in time are way better, and (2) this is partly because of better data, and (3) this is also partly because of (non-data) algorithmic progress. You only get into controversial territory when you start quantifying these.
Fwiw it’s unclear to me whether epoch’s methodology would make this mistake
After 2018 they have many data points at different times and compute scales. In principle, that should allow them to disentangle alg progress from the effect of compute. If models trained with the the same compute but later in time are no better, they shouldn’t find any alg progress. (But I haven’t thought about this in a while, and the strong correlation between time and compute in their data makes their results super noisy)
I don’t think I accused Epoch of making any “mistakes”. I think they came up with a procedure, described it, followed it, and found a result. I only suggested that the procedure has some funny properties and hence the result is easily misunderstood.
FWIW, Anson Ho (the first author of the Epoch paper in question) wrote a blog post The Least Understood Driver of AI shortly after I published this. He talks about Gundlach et al. 2025b, and he has some nitpicks but mostly endorses the paper, and in particular he endorses the paper’s main point about how his own paper’s methodology interacts with “scale-dependent” algorithmic improvements.
Absolutely everyone including me agrees that (1) models trained with the same compute later in time are way better, and (2) this is partly because of better data, and (3) this is also partly because of (non-data) algorithmic progress. You only get into controversial territory when you start quantifying these.