ParrotRobot comments on ParrotRobot’s Shortform

ParrotRobot 17 Sep 2025 2:33 UTC
9 points
0
A simple “null hypothesis” mechanism for the steady exponential rise in METR task horizon: shorter-horizon failure modes outcompete longer-horizon failure modes for researcher attention.
That is, with each model release, researchers solve the biggest failure modes of the previous system. But longer-horizon failure modes are inherently rarer, so it is not rational to focus on them until shorter-horizon failure modes are fixed. If the distribution of horizon lengths of failures is steady, and every model release fixes the X% most common failures, you will see steady exponential progress.
It’s interesting to speculate about how the recent possible acceleration in progress could be explained under this framework. A simple formal model:
- There is a sequence of error types e_1, e_2, e_3, etc.
- The first s error types have already been solved, such that errorrate(e_1) = errorrate(e_s) = 0. The long tail of errors e_{s+1} etc has error frequencies decaying exponentially.
- With each model release (t → t+1), researchers can afford to fix n error types.
- METR time horizon is inversely proportional to the total error rate sum(errorrate(e_i) for all i)
Under this model, there are only two ways progress can speed up: the distribution becomes shorter-tailed (maybe AI systems have become inherently better at generalizing, such that solving the most frequent failure modes now generalizes to many more failures), or the time it takes to fix a failure mode has decreased (perhaps because RLVR offers a more systematic way to solve any reliably measurable failure mode).
(Based on a tweet I posted a few weeks ago)
- faul_sname 17 Sep 2025 4:59 UTC
  2 points
  1
  Parent
  Looking at the first graph in this Epoch AI publication, the “time to fix a failure mode has decreased” hypothesis does look particularly plausible to me.
  - ParrotRobot 17 Sep 2025 18:37 UTC
    1 point
    0
    Parent
    Nice connection! I’d totally overlooked this.