I will leave it to the author to confirm or deny your second point.
@elifland Here is evidence of my assertion that a generally engaged reader does not appreciate nearly how dominant AI R&D speedups are with regard to either model we talked about today.
For context, my preregistered guess would be that AI R&D speed ups along the way to superhuman coder make it come around 1.5x faster, though between 1.25-2 all are consistent with my best guess. (So e.g., rather than ~2029.0 median on Eli’s model without intermediate AI R&D speed ups we’d see around 2031.0 or so. I’d expect a bigger effect on the 10th percentile due to uncertainty.)
There are more speedups hidden across parameters, e.g. “Doubling time at RE-Bench saturation toward our time horizon milestone, on a hypothetical task suite like HCAST but starting with only RE-Bench’s task distribution” which also just drops the doubling time.
Unless you are in the simpler model, in which case the singularity is hiding the importance of the R&D speedup.
There are more speedups hidden across parameters, e.g. “Doubling time at RE-Bench saturation toward our time horizon milestone, on a hypothetical task suite like HCAST but starting with only RE-Bench’s task distribution” which also just drops the doubling time.
Could you argue against dropping the expected doubling time on the object level, if you don’t find the reasons compelling? I acknowledge that the explanations may not be super clear, lmk if you have questions. I don’t think that this would change the overall outputs that much though since most of the time in the benchmarks and gaps model is not from the time horizon extrapolation.
I can’t argue against a handful different speedups all on the object level without reference to each other. The justifications generally lie on basically the same intuition which is that AI R&D is strongly enhanced by AI in a virtuous cycle. The only mechanical cause for the speedup claimed is compute efficiency (aka less compute per same performance), and it’s hard for me to imagine what other mechanical cause could be claimed that isn’t contained in compute or compute efficiency.
Finally if I understand the gaps model, it is not a trend exptrapolation model at all! It is purely guesses about calendar time put into a form they are hard to disentangle or validate.
To make effective bets we need a relatively high-probability, falsifiable, and quickly-resolving metric that is unlikely to be gamed. METR benchmarks (like every benchmark ever) are able to be gamed or reacted to (the gaming of which is the argument made about most of those handful of distinct speedups). However, if the model relies on a core assumption that is falsifiable, we should focus on that metric. If computational efficiency gains are not core to the model, I am confused on how it claims we will reach SC that is different from bare assertion that we reach SC soon with no reference to anything falsifiable!
Is your issue that it shouldn’t be this determinate, or that it should be more clearly explained? I’m guessing both? As I’ve said I’m happy to make how important various parameters are more salient in non-high-effort ways.
(Yes, sorry, edited my original comment to clarify.)
I don’t think the “AI assisted AI R&D” speed ups along the way to superhuman coder make a huge difference to the bottom line?
I will leave it to the author to confirm or deny your second point.
@elifland Here is evidence of my assertion that a generally engaged reader does not appreciate nearly how dominant AI R&D speedups are with regard to either model we talked about today.
For context, my preregistered guess would be that AI R&D speed ups along the way to superhuman coder make it come around 1.5x faster, though between 1.25-2 all are consistent with my best guess. (So e.g., rather than ~2029.0 median on Eli’s model without intermediate AI R&D speed ups we’d see around 2031.0 or so. I’d expect a bigger effect on the 10th percentile due to uncertainty.)
Just ran the code and it looks like I’m spot on and the median goes to Mar 2031.
There are more speedups hidden across parameters, e.g. “Doubling time at RE-Bench saturation toward our time horizon milestone, on a hypothetical task suite like HCAST but starting with only RE-Bench’s task distribution” which also just drops the doubling time.
Unless you are in the simpler model, in which case the singularity is hiding the importance of the R&D speedup.
Could you argue against dropping the expected doubling time on the object level, if you don’t find the reasons compelling? I acknowledge that the explanations may not be super clear, lmk if you have questions. I don’t think that this would change the overall outputs that much though since most of the time in the benchmarks and gaps model is not from the time horizon extrapolation.
I can’t argue against a handful different speedups all on the object level without reference to each other. The justifications generally lie on basically the same intuition which is that AI R&D is strongly enhanced by AI in a virtuous cycle. The only mechanical cause for the speedup claimed is compute efficiency (aka less compute per same performance), and it’s hard for me to imagine what other mechanical cause could be claimed that isn’t contained in compute or compute efficiency.
Finally if I understand the gaps model, it is not a trend exptrapolation model at all! It is purely guesses about calendar time put into a form they are hard to disentangle or validate.
To make effective bets we need a relatively high-probability, falsifiable, and quickly-resolving metric that is unlikely to be gamed. METR benchmarks (like every benchmark ever) are able to be gamed or reacted to (the gaming of which is the argument made about most of those handful of distinct speedups). However, if the model relies on a core assumption that is falsifiable, we should focus on that metric. If computational efficiency gains are not core to the model, I am confused on how it claims we will reach SC that is different from bare assertion that we reach SC soon with no reference to anything falsifiable!
For the record, here is the simple model without a super-exponential singularity, both with and without the R&D speedups:
I can hardly call this anything but extremely determinate of the results
Is your issue that it shouldn’t be this determinate, or that it should be more clearly explained? I’m guessing both? As I’ve said I’m happy to make how important various parameters are more salient in non-high-effort ways.