Saturating RE-Bench → Superhuman coder: three sets of estimates are presented, with medians summing to between 30 and 75 months[6]. The reasoning is presented here.
I think you’re looking at the calendar time between now and superhuman coder, rather than the human-only software-only time between RE-Bench and superhuman coder? At least your numbers are quite similar to our overall bottom line which is the former.
I added up the median “Predictions for gap size” in the “How fast can the task difficulty gaps be crossed?” table, summing each set of predictions separately (“Eli”, “Nikola”, “FutureSearch”) to get three numbers ranging from 30-75.
Does this table cover the time between now and superhuman coder? I thought it started at RE-Bench, because:
I took all of this to be in context of the phrase, about one page back, “For each gap after RE-Bench saturation”
The earlier explanation that Method 2 is “a more complex model starting from a forecast saturation of an AI R&D benchmark (RE-Bench), and then how long it will take to go from that system to one that can handle real-world tasks at the best AGI company” [emphasis added]
The first entry in the table (“Time horizon: Achieving tasks that take humans lots of time”) sounds more difficult than saturating RE-Bench.
Earlier, there’s a separate discussion forecasting time to RE-bench saturation.
Those estimates do start at RE-Bench, but these are all estimates for how long things would take given the “default” pace of progress, rather than the actual calendar time required. Adding them together ends up with a result that doesn’t take into account speedup from AI R&D automation or the slowdown in compute and algorithmic labor growth after 2028.
Sure – I was presenting these as “human-only, software-only” estimates:
Here are the median estimates of the “human-only, software-only” time needed to reach each milestone:
Saturating RE-Bench → Superhuman coder: three sets of estimates are presented, with medians summing to between 30 and 75 months[6]. The reasoning is presented here.
I think you’re looking at the calendar time between now and superhuman coder, rather than the human-only software-only time between RE-Bench and superhuman coder? At least your numbers are quite similar to our overall bottom line which is the former.
I added up the median “Predictions for gap size” in the “How fast can the task difficulty gaps be crossed?” table, summing each set of predictions separately (“Eli”, “Nikola”, “FutureSearch”) to get three numbers ranging from 30-75.
Does this table cover the time between now and superhuman coder? I thought it started at RE-Bench, because:
I took all of this to be in context of the phrase, about one page back, “For each gap after RE-Bench saturation”
The earlier explanation that Method 2 is “a more complex model starting from a forecast saturation of an AI R&D benchmark (RE-Bench), and then how long it will take to go from that system to one that can handle real-world tasks at the best AGI company” [emphasis added]
The first entry in the table (“Time horizon: Achieving tasks that take humans lots of time”) sounds more difficult than saturating RE-Bench.
Earlier, there’s a separate discussion forecasting time to RE-bench saturation.
But sounds like I was misinterpreting?
Those estimates do start at RE-Bench, but these are all estimates for how long things would take given the “default” pace of progress, rather than the actual calendar time required. Adding them together ends up with a result that doesn’t take into account speedup from AI R&D automation or the slowdown in compute and algorithmic labor growth after 2028.
Sure – I was presenting these as “human-only, software-only” estimates:
So it doesn’t seem like there’s a problem here?
Ah right, my bad, I was confused. This is right except that these estimates aren’t software-only, they include recent levels of compute scaling.
Thanks, I’ve edited the post to note this.