Hyperbolic model fits METR capabilities estimate worse than exponential model

gjm19 Aug 2025 15:12 UTC

201 points

This is a response to https://www.lesswrong.com/posts/mXa66dPR8hmHgndP5/hyperbolic-trend-with-upcoming-singularity-fits-metr which claims that a hyperbolic model, complete with an actual singularity in the near future, is a better fit for the METR time-horizon data than a simple exponential model.

I think that post has a serious error in it and its conclusions are the reverse of correct. Hence this one.

(An important remark: although I think Valentin2026 made an important mistake that invalidates his conclusions, I think he did an excellent thing in (1) considering an alternative model, (2) testing it, (3) showing all his working, and (4) writing it up clearly enough that others could check his work. Please do not take any part of this post as saying that Valentin2026 is bad or stupid or any nonsense like that. Anyone can make a mistake; I have made plenty of equally bad ones myself.)

[EDITED to add:] Valentin2026 has now (5) agreed that the thing I think was an error was an error, and (6) edited his post to say so. Thanks, Valentin!

The models

Valentin2026′s post compares the results of fitting two different models to METR’s time-horizon benchmark data (METR’s original blogpost is at https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ but the data have been updated since then, most recently as of current writing to account for the GPT-5 launch).

One model, or more accurately one family of models, is hyperbolic: if we let H be the time horizon estimate produced by METR, the model is $H = A / (t_{1} - t)^{q}$ where $t_{1}$ is the time at which the estimate shoots off to infinity (“the singularity”, though note that divergence here does not necessarily imply e.g. superintelligence in any very strong sense), A is an overall scale factor, and q is larger when the divergence happens more abruptly. (If you start with a Moore’s-law sort of model of technological progress $\frac{d H}{d t} = k H$ and then suppose that actually progress gets faster in proportion to H, $\frac{d H}{d t} = k H^{2}$ , then you get a model of this sort with q=1. If progress gets faster in proportion to $H^{2}$ then you get q=2 and similarly for other values of 2; if it gets faster like $log H$ then you get something like $H = exp exp k t$ .)

The other model is a simple Moore’s law sort of model: $H = A exp k t$ . Lots of technological things behave kinda like this (but, note, lots of them actually turn out to be sigmoidal: they grow exponentially for a while, then slow down and eventually plateau) and it’s a nice simple default model.

[EDITED to add:] As dr_s points out in a comment, in the absence of exciting new physics nothing can actually grow exponentially for ever. “Lots” should really be “all”, or at least “almost all”, the exceptions being where the end of exponential growth happens in another way like a technology being abandoned in favour of something new and better before its growth slows much. This is probably also a good point at which to say explicitly that a “singularity” in this context should not actually be understood as “the thing becomes infinite” but as “the thing becomes very large in such a way that our model breaks down”.

Valentin2026′s fits

Valentin2026 did some fits of both sorts of models, showed graphs in which the hyperbolic fits looked much better and had smaller mean squared error, and declared that we should maybe shorten our timelines and maybe expect something singularity-like in the near future.

Valentin did something extremely meritorious: he provided both the data and the code that he used to do the fitting. And … looking at the code, it appears that he did something that looks very fishy to me. Here’s the code that fits the hyperbolic model:

q = 2

def model(t, A, t_c):
    return A / (np.power(t_c - t, q))

# Fit model to noisy data
p0 = [1.0, 2027]  # initial guess [A, t_c, q]
bounds = ([0, max(t_50)+0.1], [np.inf, np.inf])
# (make sure t_c > max(t))

popt, pcov = curve_fit(model, t_50, y_50, p0=p0, bounds=bounds, maxfev=20000)

(Documentation for scipy.optimize.curve_fit begins: “Use non-linear least squares to fit a function, f, to data.”)

And here’s the code that fits the exponential model:

logy = np.log(y_50)

# Linear regression: log(y) = log(A) + b * t
coeffs = np.polyfit(t_red, logy, 1)
b = coeffs[0]
logA = coeffs[1]
A = np.exp(logA)

This does a linear least-squares fit to the logarithms of the data points we’re trying to fit.

The problem

This is a perfectly legitimate thing to do, in the abstract. It is an excellent thing to do, if what you care about is the proportional errors in fitting the data. But it is not the same thing as is being done for the hyperbolic model.

When you fit the logs rather than the original data, you will get a fit that when plotted along with the original data looks particularly good when the values are small, and particularly bad when the values are large. A bit like this:

(That’s the exponential fit from Valentin2026′s post.) That certainly looks worse than this hyperbolic fit with q=2:

(As you’d expect, the log-fitted exponential fits better on the left where the numbers are smaller, and worse on the right where they’re bigger.)

And Valentin2026 reports that the total squared error for the exponential fit is much bigger than for the hyperbolic fit—exactly as you would expect when what you’ve fitted is the logarithms and therefore you have larger errors for the larger values. (When I run the code, I get a “total error” of 0.2088 for the hyperbolic fit with q=2, versus 1.2915 for the exponential fit.)

In other words, the apparently worse fit of the exponential model may simply be an artefact of the fact that the hyperbolic and exponential models were fit in fundamentally different ways.

Fixing the problem

If you fit an exponential without doing the logarithm thing, using the same curve_fit function as for the hyperbolic models …

def model(t, b, t0):
   return np.exp(b*(t-t0))
# Fit model to noisy data
p0 = [1.0, 2025]  # initial guess [b, t0]
bounds = ([0, 2000], [np.inf, np.inf])
# (make sure t_c > max(t))
popt, pcov = curve_fit(model, t_50, y_50, p0=p0, bounds=bounds, maxfev=20000)

… then you get this graph

and a “total error” of 0.1067, about 2x better than for the q=2 hyperbolic fit.

I wondered what happens to the hyperbolic model if we fit it in log space. Answer: it’s not great. We get this graph (slightly modified x-axis because otherwise all you get to see is how the model explodes as it approaches its singularity):

and a “total error” of about 6.9, much worse than for either version of the exponential fit.

Conclusion

The hyperbolic model seems (by eye, and by total squared error) to give a worse fit to the METR data than the exponential model, whether we measure errors in linear space or in log space. This doesn’t necessarily mean that the hyperbolic model is wrong, or that the exponential model is right, or that there is nothing singularity-like in the near future, but I think it does mean that whatever updates might have been suggested by Valentin2026′s findings are probably in the wrong direction.

The exponential model does, in fact, seem like a pretty good fit to the data. If you’re going to do short-term extrapolations of AI model time horizons, I think fitting an exponential curve is a decent way to do it.

I would advise extreme caution in extrapolating beyond the short term. A double-exponential fit $H = exp (exp (b (t - t_{0})) - c)$ produces a curve almost indistinguishable by eye from the exponential fit, and a very similar total-error figure, but it predicts much faster growth in the future. A sigmoid fit, $H = a + b tanh (c (t - t_{0}))$ , produces an equally good-by-eye fit to the data and (unsurprisingly, given that this model has more parameters) a lower total-error figure, but predicts much slower growth in the future. Extrapolation is a dangerous game unless you have a really good grasp of the underlying mechanisms, which in this case we very much don’t.

What links here?

gjm19 Aug 2025 15:12 UTC

201 points

9 comments4 min readLW link

sunwillrise 19 Aug 2025 21:29 UTC
23 points
12
High quality post that exemplifies the attention to detail, proper reasoning, and usefulness to the community that LW contributions are meant to embody. Great work!
- gjm 19 Aug 2025 21:41 UTC
  5 points
  0
  Parent
  Thanks for the kind words!
- David J Higgs 22 Aug 2025 18:58 UTC
  2 points
  0
  Parent
  Don’t forget intellectual charity, which might actually be the most LW distinguishing feature relative to other smart online communities.
dr_s 20 Aug 2025 10:56 UTC
12 points
6
Extrapolation is a dangerous game unless you have a really good grasp of the underlying mechanisms, which in this case we very much don’t.
This one sentence could also be a fantastic encapsulation of the entire problem with AI and alignment.
dr_s 20 Aug 2025 10:52 UTC
5 points
0
The other model is a simple Moore’s law sort of model: $H = A exp k t$ . Lots of technological things behave kinda like this (but, note, lots of them actually turn out to be sigmoidal: they grow exponentially for a while, then slow down and eventually plateau) and it’s a nice simple default model.
Is it lots, or is it all? To me it seems like the classic equation $\frac{d H}{d t} = α H (1 - H)$ , or even a generalization of it in which $H$ appears with some higher exponent, makes a lot of physical sense IRL to embody the notion that at some point the untapped potential of that technology sort of runs out—and that’s where you get all the sigmoids from. I can’t think of a single technology than can truly be expected to grow exponentially forever, let alone diverge to infinity. The question is usually just how high the ceiling is.
- gjm 20 Aug 2025 11:27 UTC
  4 points
  0
  Parent
  For sure nothing in the real world can really grow exponentially for ever. I don’t know how consistently the failure to grow exponentially for ever looks like a sigmoid rather than, say, an exponential that abruptly runs into a wall, or an exponential that gets “abandoned” before it turns sigmoid because some other thing comes along to take its place.
  I’ll tweak the wording in the OP to be clearer about this. [EDITED to add:] Now done.
  - dr_s 20 Aug 2025 12:10 UTC
    3 points
    0
    Parent
    Well, this applies generally to all these models—why should it look like an exponential or a power law at all to begin with? These are simplifications that are born out of the fact that we can write out these very simple ODE models that reasonably approximate the dynamics and produce meaningful trajectories.
    
    However I think “sigmoid” is definitely the most likely pattern, if we broaden that term to mean not strictly just the logistic function (which is the solution of $y^{'} = y (1 - y)$ ) but also any other kind of similar function that has an S shape, possibly not even symmetric. “Running into a wall” is much more unphysical—it implies a discontinuity in the derivative that real processes never exhibit.
    
    Also you could see it as this: all these are special cases of a more general $y^{'} = P (y)$ , where P is any polynomial. And that means virtually any analytical function, since those can be Taylor-expanded into polynomials reaching arbitrary accuracy in the neighbourhood of a specific point. So really the only assumptions baked in there are:
    
    the rate of growth is analytical (no weird discontinuities or jumps; reasonable)
    the rate of growth does not feature an explicit time dependence (also sensible, as these phenomena should happen equally regardless of which year they were kickstarted in)
    
    Within this framework, the exponential growth is the result of a first order expansion, and the logistic is a second order expansion (under certain conditions for the coefficients). Higher orders, if present, could give rise to more complex models, but generally speaking as far as I can tell they’ll all tend to either converge to a given value (a root of the polynomial) or diverge to infinity. It would be interesting to consider the conditions under which convergence occurs I guess; it should depend on the spectrum of the polynomial but it might have a more physical interpretation.
jacob_cannell 7 Sep 2025 16:02 UTC
2 points
0
Would have made much more sense (visually and otherwise) to show graphs in log space. Example: https://www.openphilanthropy.org/research/modeling-the-human-trajectory/
- gjm 8 Sep 2025 13:12 UTC
  2 points
  0
  Parent
  I agree. (Or perhaps we should plot in log space when measuring/minimizing errors in log space, and plot in linear space when measuring/minimizing errors in linear space. The former is nearer to the Right Thing here than the latter.)
  There are several things I find unsatisfactory in my own analysis here and either I will write a followup or Valentin2026 and I will write one together. (We’re still discussing.)