I bet that we will not see a model released in the future that equals or surpasses the general performance of Chinchilla while reducing the compute (in training FLOPs) required for such performance by an equivalent of 3.5x per year.
FWIW I think much of software progress comes from achieving better performance at a fixed or increased compute budget rather than making a fixed performance level more efficient, so I think this underestimates software progress.
The main justification for having compute efficiency be approximately equal to compute in terms of progress given in the timeline supplement and main dropdown is the Epoch AI measurements which are specifically about fixed-performance and lower compute. At the very least this concedes that the estimates are not based on trend-extrapolation and are conjecture.
I agree that it’s harder to quantify software improvements at the same or higher levels of compute in a way that can be easily compared against compute increases, but we can totally measure some part of it by looking at performance increasing given thes same compute budget (it’s quite hard to measure the metric of “how much compute would it have taken 2015 agorithms/data to reach 2025 performance” though, for obvious reasons).
Something being harder to measure is not an excuse for ignoring it.
Something being unfalsifiable forward-looking and unmeasurable backwards-looking is a justification for not treating it with high credence, so I think this is also a core disagreement.
To be clear, I agree that there will be some slowdown due to complementarity of software and hardware, and ideally this would be measured in the model. One can think that there will be multiple effects in different directions. I think that at the levels of research speedup observed in the timelines supplement, the magnitude is likely to be low enough to not change the overall takeaways from the model, but maybe you disagree. I might get around to adding this in as it would be nice.
Here are two charts demonstrating that small changes in estimates of current R&D contribution and changes in R&D speedup change the model massively in the absence of a singularity. I know we’re just going to go straight back to “well the real model is the even-more-unfalsifiable benchmarks and gaps model,” but I think that is unreasonable.
EDIT: THESE FIGURES OVERESTIMATE THE IMPACT OF REDUCING CURRENT ALGORITHMIC PROGRESS. THE SECOND IS WRONG, AND THE REAL IMPACT IS MORE CONTAINED.
Figure 1: R&D is 50% of current progress, with and without speedups, exponential only
Figure 2: R&D is 33% of current progress, with and without speedups, exponential only
I do not understand how “I think this variable doesn’t matter (without checking)” is a good defense about questionably implemented variables that do overdetermine the model, but “this variable doesn’t matter to outcomes” is not a valid critique w.r.t. things like “what are current capabilities/time horizon”
THIS SECOND ONE IS WRONG, MEDIAN HORIZON CHANGES BY CLOSER TO HALF A YEAR AT 33% (TO FEB 2029) THAN ALMOST 2 YEARS (TO APR 2031 AS INCORRECTLY SHOWN)
At the very least this concedes that the estimates are not based on trend-extrapolation and are conjecture.
Yes, as I told you verbally, I will edit the relevant expandable to make this more clear. I agree that the way it’s presented currently is poor.
Here are two charts demonstrating that small changes in estimates of current R&D contribution and changes in R&D speedup change the model massively in the absence of a singularity.
These are great, this parameter is indeed at least a bit more important that I expected. I will make this more clear in the writeup, and will think a little more about what the median value should be (it’s very relevant for looking into the original bet offer anyway :)).
but “this variable doesn’t matter to outcomes” is not a valid critique w.r.t. things like “what are current capabilities/time horizon”
Where did I say it isn’t a valid critique? I’ve said both over text and verbally that the behavior in cases where superexponentiality is true isn’t ideal (which makes a bigger difference in the time horizon extension model than benchmarks and gaps).
Perhaps you are saying I said it’s invalid because I also said that it can be compensated some by lowering the p_superexponential at lower time horizons? Saying this doesn’t imply that I think the critique is completely invalid, I still think there is a real issue there. We probably disagree about the magnitude, but again that doesn’t mean I think it’s invalid.
The main justification for having compute efficiency be approximately equal to compute in terms of progress given in the timeline supplement and main dropdown is the Epoch AI measurements which are specifically about fixed-performance and lower compute. At the very least this concedes that the estimates are not based on trend-extrapolation and are conjecture.
Something being unfalsifiable forward-looking and unmeasurable backwards-looking is a justification for not treating it with high credence, so I think this is also a core disagreement.
Here are two charts demonstrating that small changes in estimates of current R&D contribution and changes in R&D speedup change the model massively in the absence of a singularity. I know we’re just going to go straight back to “well the real model is the even-more-unfalsifiable benchmarks and gaps model,” but I think that is unreasonable.
EDIT: THESE FIGURES OVERESTIMATE THE IMPACT OF REDUCING CURRENT ALGORITHMIC PROGRESS. THE SECOND IS WRONG, AND THE REAL IMPACT IS MORE CONTAINED.
Figure 1: R&D is 50% of current progress, with and without speedups, exponential onlyFigure 2: R&D is 33% of current progress, with and without speedups, exponential onlyI do not understand how “I think this variable doesn’t matter (without checking)” is a good defense about questionably implemented variables thatdooverdetermine the model, but “this variable doesn’t matter to outcomes” is not a valid critique w.r.t. things like “what are current capabilities/time horizon”THIS SECOND ONE IS WRONG, MEDIAN HORIZON CHANGES BY CLOSER TO HALF A YEAR AT 33% (TO FEB 2029) THAN ALMOST 2 YEARS (TO APR 2031 AS INCORRECTLY SHOWN)
Yes, as I told you verbally, I will edit the relevant expandable to make this more clear. I agree that the way it’s presented currently is poor.
These are great, this parameter is indeed at least a bit more important that I expected. I will make this more clear in the writeup, and will think a little more about what the median value should be (it’s very relevant for looking into the original bet offer anyway :)).
Where did I say it isn’t a valid critique? I’ve said both over text and verbally that the behavior in cases where superexponentiality is true isn’t ideal (which makes a bigger difference in the time horizon extension model than benchmarks and gaps).
Perhaps you are saying I said it’s invalid because I also said that it can be compensated some by lowering the p_superexponential at lower time horizons? Saying this doesn’t imply that I think the critique is completely invalid, I still think there is a real issue there. We probably disagree about the magnitude, but again that doesn’t mean I think it’s invalid.