There are more speedups hidden across parameters, e.g. “Doubling time at RE-Bench saturation toward our time horizon milestone, on a hypothetical task suite like HCAST but starting with only RE-Bench’s task distribution” which also just drops the doubling time.
Could you argue against dropping the expected doubling time on the object level, if you don’t find the reasons compelling? I acknowledge that the explanations may not be super clear, lmk if you have questions. I don’t think that this would change the overall outputs that much though since most of the time in the benchmarks and gaps model is not from the time horizon extrapolation.
I can’t argue against a handful different speedups all on the object level without reference to each other. The justifications generally lie on basically the same intuition which is that AI R&D is strongly enhanced by AI in a virtuous cycle. The only mechanical cause for the speedup claimed is compute efficiency (aka less compute per same performance), and it’s hard for me to imagine what other mechanical cause could be claimed that isn’t contained in compute or compute efficiency.
Finally if I understand the gaps model, it is not a trend exptrapolation model at all! It is purely guesses about calendar time put into a form they are hard to disentangle or validate.
To make effective bets we need a relatively high-probability, falsifiable, and quickly-resolving metric that is unlikely to be gamed. METR benchmarks (like every benchmark ever) are able to be gamed or reacted to (the gaming of which is the argument made about most of those handful of distinct speedups). However, if the model relies on a core assumption that is falsifiable, we should focus on that metric. If computational efficiency gains are not core to the model, I am confused on how it claims we will reach SC that is different from bare assertion that we reach SC soon with no reference to anything falsifiable!
Could you argue against dropping the expected doubling time on the object level, if you don’t find the reasons compelling? I acknowledge that the explanations may not be super clear, lmk if you have questions. I don’t think that this would change the overall outputs that much though since most of the time in the benchmarks and gaps model is not from the time horizon extrapolation.
I can’t argue against a handful different speedups all on the object level without reference to each other. The justifications generally lie on basically the same intuition which is that AI R&D is strongly enhanced by AI in a virtuous cycle. The only mechanical cause for the speedup claimed is compute efficiency (aka less compute per same performance), and it’s hard for me to imagine what other mechanical cause could be claimed that isn’t contained in compute or compute efficiency.
Finally if I understand the gaps model, it is not a trend exptrapolation model at all! It is purely guesses about calendar time put into a form they are hard to disentangle or validate.
To make effective bets we need a relatively high-probability, falsifiable, and quickly-resolving metric that is unlikely to be gamed. METR benchmarks (like every benchmark ever) are able to be gamed or reacted to (the gaming of which is the argument made about most of those handful of distinct speedups). However, if the model relies on a core assumption that is falsifiable, we should focus on that metric. If computational efficiency gains are not core to the model, I am confused on how it claims we will reach SC that is different from bare assertion that we reach SC soon with no reference to anything falsifiable!