Thanks for the reply!
I’m not sure I exactly understand your point.
It is a very confusing point and I didn’t explain it well, sorry. I also might just be fundamentally confused and wrong. Hopefully this comment can explain it well enough so you can either shoot it down as incorrect or accept it.
First of all, it might be easier to understand if we replace “effective compute” with “general capabilities” in my original comment. Effective compute causally affects capabilities which causally affects both the METR time horizon measurement and the AI R&D speedup variable. So we can screen off the effective compute node and replace it with a general capabilities node.
If a model has achieved very high time horizons, then we think that this is direct evidence for them being an AC.
I mostly agree with this. However, I think a model having very high time horizons is direct evidence for capabilities being high, which is then direct evidence for AI R&D speedup being high (and thus more likely to be past the AC threshold).
This leads to an important distinction. If you want to argue that AI R&D speedup will be higher (or certain thresholds of AI R&D speedup like AC being reached faster) using the METR graph, your argument fundamentally has to be an argument about capabilities being higher (arriving sooner) or an argument about the relationship between capabilities and AI R&D speedup.
I don’t think that your first argument about the superexponential nature of the METR graph (that infinite time horizons in finite time implies it must be superexponential) is either of these. It seems to be an argument purely about the relationship between capabilities and the METR graph.
I also think your second argument for superexponential growth (that subsequent doublings should get easier because they require less new capabilities than earlier doublings) is mostly an argument about the relationship between capabilities and the METR graph. Although I could maybe see it being an argument about the relationship between capabilities and AI R&D speedup (the last few capabilities to be unlocked provide huge boosts to AI R&D speedup?).
Basically, my core concern is if neither of these arguments are truly arguments about capabilities arriving faster, or about the relationship between capabilities and AI R&D speedup, then why are they being used to update the estimation of time to ACs?
In contrast, arguments about AI R&D feeding back into itself, or about compute investment slowing down, are arguments directly about how fast capabilities will advance. So these validly affect your estimation of time to ACs.
If you disagree with me and think that the arguments for superexponential growth are not just arguments about the relationship between capabilities and the METR graph, then we can focus our discussion there. To the extent that these are also arguments about the things we truly care about, you should adjust time to ACs based on them; this is what I was trying to capture with the correlation stuff in my first comment.
If you do end up agreeing, I also don’t really know how to incorporate this into the model. It seems hard and messy. I think that it is true that using the METR graph is the best thing we can do right now. I just don’t think that these arguments about the superexponential nature of the METR graph should affect the estimation of time to ACs. But I do think that it is superexponential...and I think that it is the best way to estimate time to ACs...so again, messy :(. Hope this makes more sense!
When you improve the capabilities of an LLM (or any AI agent with enough generality really) you also probably improve its ability to cooperate with copies of itself, where cooperation ability is some loose collection of abilities such as checking other copies work, planning work for other copies to carry out, course correcting other copies when they get sidetracked, integrating a copy’s work into the main project, etc.
Now that LLMs are good enough to create useful multi-agent systems such as Claude Code, Cursor, Gas Town, etc., I expect these systems to get better at a faster rate than the underlying models (and not because the people are putting work into improving the structure of these systems, although I suspect that will also help).
I’m not sure if these systems are currently more or less than the sum of their parts (if Claude is able to do a task with X difficulty, are 10 Claude agents able to do a task of >10X difficulty, 10X difficulty, or <10X difficulty?). But I expect that as cooperation gains accumulate, they will become less and less like “less than the sum of their parts” (as they stop bungling communication with each other) and more and more like “greater than the sum of their parts” (as they learn how to get gains from specialization or something?).
This has updated me significantly towards shorter timelines, especially for time to automated coders. I really have no idea what happens after that though.
Prediction: 80% chance of fully automated coding “teams” by the end of 2026.