Thanks for the thoughts! I’m not sure I exactly understand your point. I do think that we should think about the relationship between the time horizon and the AC effective compute requirement directly, which is why we chose to use this to set the effective compute requirement. If a model has achieved very high time horizons, then we think that this is direct evidence for them being an AC. Note that we also optionally have an effective compute gap as well, to be added after reaching the time horizon requirement.
I’m also not sure what you mean about the relationship being 1-1, like why we should increase the effective compute requirement rather than decrease if we instead had decided to try to use AI R&D speedup to anchor the requirement. Why would we think that setting the effective compute requirement via the AI R&D speedup would predictably give a higher effective compute requirement? I don’t think the METR trend being superexponential implies anything one way or the other. They are just different metrics and thus we would use a totally different method if we had instead tried to set the requirement using AI R&D speedup. I’m not immediately sure what method would be best though given that we don’t have a trend for it. If we had more high-quality data on coding uplift over time, then that could help us out, and I think that would be a reasonable alternative thing to extrapolate (Daniel discusses this a bit in the post), but I don’t have a prior on whether it would lead to a lower or higher requirement than extrapolating time horizons (in fact a quick guess would be that it would lead to a lower requirement, given Opus 4.5′s reported much higher uplift than Sonnet 4.5).
It is a very confusing point and I didn’t explain it well, sorry. I also might just be fundamentally confused and wrong. Hopefully this comment can explain it well enough so you can either shoot it down as incorrect or accept it.
First of all, it might be easier to understand if we replace “effective compute” with “general capabilities” in my original comment. Effective compute causally affects capabilities which causally affects both the METR time horizon measurement and the AI R&D speedup variable. So we can screen off the effective compute node and replace it with a general capabilities node.
If a model has achieved very high time horizons, then we think that this is direct evidence for them being an AC.
I mostly agree with this. However, I think a model having very high time horizons is direct evidence for capabilities being high, which is then direct evidence for AI R&D speedup being high (and thus more likely to be past the AC threshold).
This leads to an important distinction. If you want to argue that AI R&D speedup will be higher (or certain thresholds of AI R&D speedup like AC being reached faster) using the METR graph, your argument fundamentally has to be an argument about capabilities being higher (arriving sooner) or an argument about the relationship between capabilities and AI R&D speedup.
I don’t think that your first argument about the superexponential nature of the METR graph (that infinite time horizons in finite time implies it must be superexponential) is either of these. It seems to be an argument purely about the relationship between capabilities and the METR graph.
I also think your second argument for superexponential growth (that subsequent doublings should get easier because they require less new capabilities than earlier doublings) is mostly an argument about the relationship between capabilities and the METR graph. Although I could maybe see it being an argument about the relationship between capabilities and AI R&D speedup (the last few capabilities to be unlocked provide huge boosts to AI R&D speedup?).
Basically, my core concern is if neither of these arguments are truly arguments about capabilities arriving faster, or about the relationship between capabilities and AI R&D speedup, then why are they being used to update the estimation of time to ACs?
In contrast, arguments about AI R&D feeding back into itself, or about compute investment slowing down, are arguments directly about how fast capabilities will advance. So these validly affect your estimation of time to ACs.
If you disagree with me and think that the arguments for superexponential growth are not just arguments about the relationship between capabilities and the METR graph, then we can focus our discussion there. To the extent that these are also arguments about the things we truly care about, you should adjust time to ACs based on them; this is what I was trying to capture with the correlation stuff in my first comment.
If you do end up agreeing, I also don’t really know how to incorporate this into the model. It seems hard and messy. I think that it is true that using the METR graph is the best thing we can do right now. I just don’t think that these arguments about the superexponential nature of the METR graph should affect the estimation of time to ACs. But I do think that it is superexponential...and I think that it is the best way to estimate time to ACs...so again, messy :(. Hope this makes more sense!
Thanks for the thoughts! I’m not sure I exactly understand your point. I do think that we should think about the relationship between the time horizon and the AC effective compute requirement directly, which is why we chose to use this to set the effective compute requirement. If a model has achieved very high time horizons, then we think that this is direct evidence for them being an AC. Note that we also optionally have an effective compute gap as well, to be added after reaching the time horizon requirement.
I’m also not sure what you mean about the relationship being 1-1, like why we should increase the effective compute requirement rather than decrease if we instead had decided to try to use AI R&D speedup to anchor the requirement. Why would we think that setting the effective compute requirement via the AI R&D speedup would predictably give a higher effective compute requirement? I don’t think the METR trend being superexponential implies anything one way or the other. They are just different metrics and thus we would use a totally different method if we had instead tried to set the requirement using AI R&D speedup. I’m not immediately sure what method would be best though given that we don’t have a trend for it. If we had more high-quality data on coding uplift over time, then that could help us out, and I think that would be a reasonable alternative thing to extrapolate (Daniel discusses this a bit in the post), but I don’t have a prior on whether it would lead to a lower or higher requirement than extrapolating time horizons (in fact a quick guess would be that it would lead to a lower requirement, given Opus 4.5′s reported much higher uplift than Sonnet 4.5).
Thanks for the reply!
It is a very confusing point and I didn’t explain it well, sorry. I also might just be fundamentally confused and wrong. Hopefully this comment can explain it well enough so you can either shoot it down as incorrect or accept it.
First of all, it might be easier to understand if we replace “effective compute” with “general capabilities” in my original comment. Effective compute causally affects capabilities which causally affects both the METR time horizon measurement and the AI R&D speedup variable. So we can screen off the effective compute node and replace it with a general capabilities node.
I mostly agree with this. However, I think a model having very high time horizons is direct evidence for capabilities being high, which is then direct evidence for AI R&D speedup being high (and thus more likely to be past the AC threshold).
This leads to an important distinction. If you want to argue that AI R&D speedup will be higher (or certain thresholds of AI R&D speedup like AC being reached faster) using the METR graph, your argument fundamentally has to be an argument about capabilities being higher (arriving sooner) or an argument about the relationship between capabilities and AI R&D speedup.
I don’t think that your first argument about the superexponential nature of the METR graph (that infinite time horizons in finite time implies it must be superexponential) is either of these. It seems to be an argument purely about the relationship between capabilities and the METR graph.
I also think your second argument for superexponential growth (that subsequent doublings should get easier because they require less new capabilities than earlier doublings) is mostly an argument about the relationship between capabilities and the METR graph. Although I could maybe see it being an argument about the relationship between capabilities and AI R&D speedup (the last few capabilities to be unlocked provide huge boosts to AI R&D speedup?).
Basically, my core concern is if neither of these arguments are truly arguments about capabilities arriving faster, or about the relationship between capabilities and AI R&D speedup, then why are they being used to update the estimation of time to ACs?
In contrast, arguments about AI R&D feeding back into itself, or about compute investment slowing down, are arguments directly about how fast capabilities will advance. So these validly affect your estimation of time to ACs.
If you disagree with me and think that the arguments for superexponential growth are not just arguments about the relationship between capabilities and the METR graph, then we can focus our discussion there. To the extent that these are also arguments about the things we truly care about, you should adjust time to ACs based on them; this is what I was trying to capture with the correlation stuff in my first comment.
If you do end up agreeing, I also don’t really know how to incorporate this into the model. It seems hard and messy. I think that it is true that using the METR graph is the best thing we can do right now. I just don’t think that these arguments about the superexponential nature of the METR graph should affect the estimation of time to ACs. But I do think that it is superexponential...and I think that it is the best way to estimate time to ACs...so again, messy :(. Hope this makes more sense!