Shouldn’t we start to see the METR trend bending upwards if this was the case? Let T be time Horizon, A algorithmic efficiency, C training compute, E experimental compute, L labor and S speedup.
Suppose,
T=(AC)δ
˙A=A1−β[(SL)αE1−α]λ
Then deriving the balanced growth path
gT=δrgS+δ[rgL+λ(1−α)βgE+gC]
So if gS starting growing rapidly, we should see time horizon trend increase, potentially by a lot. Now maybe Opus 4.5 is evidence of this, but I’m skeptical so far. Better argument is that there is some delay from getting better research to better models because of training time, so we haven’t seen it yet—though this rules out substantial speedup before maybe 4 months ago.
Shouldn’t we start to see the METR trend bending upwards if this was the case? Let T be time Horizon, A algorithmic efficiency, C training compute, E experimental compute, L labor and S speedup.
Suppose,
T=(AC)δ
˙A=A1−β[(SL)αE1−α]λ
Then deriving the balanced growth path
gT=δrgS+δ[rgL+λ(1−α)βgE+gC]
So if gS starting growing rapidly, we should see time horizon trend increase, potentially by a lot. Now maybe Opus 4.5 is evidence of this, but I’m skeptical so far. Better argument is that there is some delay from getting better research to better models because of training time, so we haven’t seen it yet—though this rules out substantial speedup before maybe 4 months ago.
I think there are a variety of explanations consistent with there being 2x uplift already:
The METR benchmark just isn’t precise enough
One-time gains from RLVR that caused a steeper slope in 2024-2025 have petered out, but they’ve been replaced by uplift
Models have reached some time horizon threshold where they’re increasingly useful
In the past, problems like reward hacking or poor generalization have limited real-world uplift, but these are solved enough to get 2x uplift.
My median guess would be something lower than 2x, but we just don’t have enough data.