Jacob_Hilton comments on Jacob_Hilton’s Shortform

Jacob_Hilton 8 Aug 2025 9:07 UTC
47 points
17
Against superexponential fits to current time horizon measurements
I think is unreasonable to put non-trivial weight (e.g. > 5%) on a superexponential fit to METR’s 50% time horizon measurements, or similar recently-collected measurements.
To be precise about what I am claiming and what I am not claiming:
- I am not claiming that these measurements will never exhibit a superexponential trend. In fact, I think a superexponential trend is fairly likely eventually, due to feedback loops from AI speeding up AI R&D. I am claiming that current measurements provide almost no information about such an eventuality, and naively applying a superexponential fit gives a poor forecast.
- I am not claiming that is very unlikely for the trend to be faster in the near future than in the near past. I think a good forecast would use an exponential fit, but with wide error bars on the slope of the fit. After all, there are very few datapoints, they are not independent of each other, and there is measurement noise. I am claiming that extrapolating the rate at which the trend is getting faster is unreasonable.
- My understanding is that AI 2027′s forecast is heavily driven by putting substantial weight on such a superexponential fit, in which case my claim may call into question the reliability of this forecast. However, I have not dug into AI 2027′s forecast, and am happy to be corrected on this point. My primary concern is with the specific claim I am making rather than how it relates to any particular aggregated forecast.
Note that my argument has significant overlap with this critique of AI 2027, but is focused on what I think is a key crux rather than being a general critique. There has also been some more recent discussion of superexponential fits since the GPT-5 release here, although my points are based on METR’s original data. I make no claims of originality and apologize if I missed similar points being made elsewhere.
The argument
METR’s data (see Figure 1) exhibits a steeper exponential trend over the last year or so (which I’ll call the “1-year trend”) than over the last 5 years or so (which I’ll call the “5-year trend”). A superexponential fit would extrapolate this to an increasingly steep trend over time. Here is my why I think such an extrapolation is unwarranted:
- There is a straightforward explanation for the 1-year trend that we should expect to be temporary. The most recent datapoints are all reasoning models trained with RL. This is a new technique that scales with compute, and so we should expect there to be rapid initial improvements as compute is scaled from a low starting point. But this compute growth must eventually slow down to the rate at which older methods are growing in compute, once the total cost becomes comparable. This should lead to a leveling off of the 1-year trend to something closer to the 5-year trend, all else being equal.
  - Of course, there could be another new technique that scales with compute, leading to another (potentially overlapping) “bump”. But the shape of the current “bump” tells us nothing about the frequency of such advances, so it is an inappropriate basis for such an extrapolation. A better basis for such an extrapolation would be the 5-year trend, which may include past “bumps”.
- Superexponential explanations for the 1-year trend are uncompelling. I have seen two arguments for why we might expect the 1-year trend to be the start of a superexponential trend, and they are both uncompelling to me.
  1. Feedback from AI speeding up AI R&D. I don’t think this effect is nearly big enough to have a substantial effect on this graph yet. The trend is most likely being driven by infrastructure scaling and new AI research ideas, neither of which AI seems to be substantially contributing to. Even in areas where AI is contributing more, such as software engineering, METR’s uplift study suggests the gains are currently minimal at best.
  2. AI developing meta-skills. From this post:
    ”If we take this seriously, we might expect progress in horizon length to be superexponential, as AIs start to figure out the meta-skills that let humans do projects of arbitrary length. That is, we would expect that it requires more new skills to go from a horizon of one second to one day, than it does to go from one year to one hundred thousand years; even though these are similar order-of-magnitude increases, we expect it to be easier to cross the latter gap.”
    It is a little hard to argue against this, since it is somewhat vague. But I am unconvinced there is such a thing as a “meta-skill that lets humans do projects of arbitrary length”. It seems plausible to me that a project that takes ten million human-years is meaningfully harder than 10 projects that each take a million human-years, due to the need to synthesize the 10 highly intricate million-year sub-projects. To me the argument seems very similar to the following, which is not borne out:
    ”We might expect progress in chess ability to be superexponential, as AIs start to figure out the meta-skills (such as tactical ability) required to fully understand how chess pieces can interact. That is, we would expect it to require more new skills to go from an ELO of 2400 to 2500, than it does to go from an ELO of 3400 to 3500.”
    At the very least, this argument deserves to be spelled out more carefully if it is to be given much weight.
- Theoretical considerations favor an exponential fit (added in edit). Theoretically, it should take around twice as much compute to train an AI system with twice the horizon length, since feedback is twice as sparse. (This point was made in the Biological anchors report and is spelled out in more depth in this paper.) Hence exponential compute scaling would imply an exponential fit. Algorithmic progress matters too, but that has historically followed an exponential trend of improved compute efficiency. Of course, algorithmic progress can be lumpy, so we shouldn’t expect an exponential fit to be perfect.
- Temporary explanations for the 1-year trend are more likely on priors. The time horizon metric has huge variety of contributing factors, from the inputs to AI development to the details of the task distribution. For any such complex metric, the trend is likely to bounce around based on idiosyncratic factors, which can easily be disrupted and are unlikely to have a directional bias. (To get a quick sense of this, you could browse through some of the graphs on AI Impact’s Discontinuous growth investigation, or even METR’s measurements in other domains for something more directly relevant.) So even if I wasn’t able to identify the specific idiosyncratic factor that I think is responsible for the 1-year trend, I would expect there to be one.
- The measurements look more consistent with an exponential fit. I am only eyeballing this, but a straight line fit is reasonably good, and a superexponential fit doesn’t jump out as a privileged alternative. Given the complexity penalty of the additional parameters, a superexponential fit seems unjustified based on the data alone. This is not surprising given the small number of datapoints, many of which are based on similar models and are therefore dependent. (Edit: looks like METR’s analysis (Appendix D.1) supports this conclusion, but I’m happy to be corrected here if there is a more careful analysis.)
What do I predict?
In the spirit of sticking my neck out rather than merely criticizing, I will make the following series of point forecasts which I expect to outperform a superexponential fit: just follow an exponential trend, with an appropriate weighting based on recency. If you want to forecast 1 year out, use data from the last year. If you want to forecast 5 years out, use data from the last 5 years. (No doubt it’s better to use a decay rather than a cutoff, but you get the idea.) I obviously have very wide error bars on this, but probably not wide enough to include the superexponential fit more than a few years out.
As an important caveat, I’m not making a claim about the real-world impact of an AI that achieves a certain time horizon measurement. That is much harder to predict than the measurement itself, since you can’t just follow straight lines on graphs.
What links here?
- Response to titotal’s critique of our AI 2027 timelines model by elifland (16 Dec 2025 0:51 UTC; 38 points)
- Noosphere89's comment on My AI Predictions for 2027 by Taylor G. Lunt (3 Sep 2025 14:01 UTC; 2 points)
- elifland 8 Aug 2025 16:48 UTC
  21 points
  2
  Parent
  Thanks for writing this up! I actually mostly agree with everything you say about how much evidence the historical data points provide for a superexponential-given-no-automation trend. I think I place a bit more weight on it than you but I think we’re close enough that it’s not worth getting into.
  The reason we have a superexponential option isn’t primarily because of the existing empirical data, it’s because we think the underlying curve is plausibly superexponential for conceptual reasons (in our original timelines forecast we had equal weight on superexponential and exponential, though after more thinking I’m considering giving more weight to superexponential). I think the current empirical evidence doesn’t distinguish much between the two hypotheses.
  In our latest published model we had an option for it being exponential up until a certain point, then becoming superexponential afterward. Though this seems fairly ad hoc so we might remove that in our next version.
  The main thing I disagree with is your skepticism of the meta-skills argument, which is driving much of my credence. It just seems extremely unintuitive to me to think that it would take as much effective compute to go from 1 million years to 10 million years as it takes to go from 1 hour to 10 hours, so seems like we mainly have a difference in intuitions here. I agree it would be nice to make the argument more carefully, I won’t take the time to try to do that right now. but will spew some more intuitions.
  “We might expect progress in chess ability to be superexponential, as AIs start to figure out the meta-skills (such as tactical ability) required to fully understand how chess pieces can interact. That is, we would expect it to require more new skills to go from an ELO of 2400 to 2500, than it does to go from an ELO of 3400 to 3500.”
  I don’t think this analogy as stated makes sense. My impression is that going from 3400 to 3500 is likely starting to bump up against the limits of how good you can be at Chess, or a weaker claims is that it is very superhuman. While we’re talking about “just” reaching the level of a top human.
  To my mind an AI that can do tasks that take top humans 1 million years feels like it’s essentially top human level. And the same for 10 milion years, but very slightly better. So I’d think the equivalent of this jump is more like 2799.99 to 2799.991 ELO (2800 is roughly top human ELO). While the earlier 1 to 10 hour jump would be more like a 100 point jump or something.
  I guess I’m just restating my intuitions that at higher levels the jump is a smaller of a difference in skills. I’m not sure how to further convey that. It personally feels like when I do a 1 hour vs. 10 hour programming task the latter often but not always involves significantly more high-level planning, investigating subtle bugs and consistently error correcting, etc.. While if I imagine spending 1 million years on a coding task, there’s not really any new agency skills needed to get to 10 million years, I already have the ability to consistently make steady progress on a very difficult problem.
  My understanding is that AI 2027′s forecast is heavily driven by putting substantial weight on such a superexponential fit, in which case my claim may call into question the reliability of this forecast. However, I have not dug into AI 2027′s forecast, and am happy to be corrected on this point. My primary concern is with the specific claim I am making rather than how it relates to any particular aggregated forecast.
  You can see a sensitivity analysis for our latest published model here, though we’re working on a new one which might change things (the benchmarks and gaps model relies a lot less on the time horizon extrapolation which is why the difference is much smaller; also “superexponential immediately” is more aggressive than “becomes superexponential at some point” would be, due to the transition to superexponential that I mentioned above):
  What links here?
- Lukas Finnveden 8 Aug 2025 15:56 UTC
  11 points
  6
  Parent
  I always thought the best argument for superexponentiality was that (even without AI progress feeding into AI) we’d expect AI to reach an infinite time horizon in a finite time once they were better than humans at everything. (And I thought the “meta skills” thing was reasonable as a story of how that could happen.) This is also mentioned in the original METR paper iirc.
  
  But when AI futures project make that argument, they don’t seem to want to lean very much on it, due to something about the definition. I can’t tell whether the argument is still importantly driving their belief in superexponentiality and this is just a technicality, or whether they think this actually destroys the argument (which would be a major update towards non-superexponentiality to me).
  “Another argument for eventually getting superexponentiality is that it seems like superhuman AGIs should have infinite time horizons. However, under the definition of time horizon adapted from the METR report above, it’s not clear if infinite time horizons will ever be reached. This is because AIs are graded on their absolute task success rate, not whether they have a higher success rate than humans. As long as there’s a decreasing trend in ability to accomplish tasks as the time horizon gets longer, the time horizon won’t be infinite. This is something that has been observed with human baseliners (see Figure 16 here). Even if infinite horizons are never reached, the time horizons might get extremely large which would still lend some support to superexponentiality. Even so, it’s unclear how much evidence this is for superexponentiality in the regime we are forecasting in.”
  - Thomas Kwa 8 Aug 2025 17:14 UTC
    6 points
    1
    Parent
    Time horizon is currently defined as the human task length it takes to get a 50% success rate, fit with a logistic curve that goes from 0 to 100%. But we just did that because it was a good fit to the data. If METR finds evidence of label noise or tasks that are inherently hard to complete with more than human reliability, we can just fit a different logistic curve or switch methodologies (likely at the same time we upgrade our benchmark) so that time horizon more accurately reflects how much intellectual labor the AI can replace before needing human intervention.
  - elifland 8 Aug 2025 16:16 UTC
    4 points
    2
    Parent
    We’re working on updating our timelines in a bunch of ways, and I’ve thought a bit more about this.
    My current best guess, which isn’t confident, is that if we took a version of the METR HCAST suite which didn’t have any ambiguities or bugs, then the AGIs would have infinite time horizons. And that we should discuss this theoretical task suite rather than the literal HCAST in our timelines forecast, so therefore we should expect a finite time singularity. If we kept the ambiguities/bugs instead we’d have an asymptote and a non-infinite value, as would humans with inifinite time. In the paragraph you’re quoting, I think that the main thing driving non-infinite values is that longer tasks are more likely to have ambiguities/bugs that make them unsolvable with full reliability.
    - Jacob_Hilton 8 Aug 2025 16:57 UTC
      2 points
      0
      Parent
      I’m happy to talk about a theoretical HCAST suite with no bugs and infinitely many tasks of arbitrarily long time-horizon tasks, for the sake of argument (even though it is a little tricky to reason about and measuring human performance would be impractical).
      I think the notion of an “infinite time horizon” system is a poor abstraction, because it implicitly assumes 100% reliability. Almost any practical, complex system has a small probability of error, even if this probability is too small to measure in practice. Once you stop using this abstraction, the argument doesn’t seem to hold up: surely a system that has 99% reliability at million-year tasks has lower than 99% reliability at 10 million-year tasks? This seems true even if a 10 million-year task is nothing more than 10 consecutive million-year tasks, and that seems strictly easier than an average 10 million-year task.
      - elifland 8 Aug 2025 22:29 UTC
        4 points
        2
        Parent
        Yeah this is the primary argument pushing me toward thinking there shouldn’t be a finite-time singularity, as I mentioned I’m not confident. It does feel pretty crazy that a limits-of-intelligence ASI would have a (very large horizon) time horizon at which it has 0.00001% reliability though, which I think is unavoidable if we accept the trend.
        I think how things behave might depend to some extent on how you define an achieved time horizon; if there is a cost/speed requirement, then it becomes more plausible that longer horizon lengths would either have ~the same or lower reliability / success rate as smaller ones, once the AI surpasses humans in long-horizon agency. Similar to how if we created a version of HCAST but flipped based on AI times, then at a fixed speed budget human “reliability” might increase at higher time horizons, because our advantage is in long horizon agency and not speed.
        In general things seem potentially sensitive to definitional choices and I don’t feel like I’ve got things fully figured out in terms of what the behavior in the limit should be.

Jacob_Hilton comments on Jacob_Hilton’s Shortform

Against superexponential fits to current time horizon measurements

The argument

What do I predict?