habryka comments on A deep critique of AI 2027’s bad timeline models

habryka 21 Jun 2025 18:18 UTC
8 points
10
Yeah, I mean, the task distribution is just hugely different. When METR measures software-developing tasks, they mean things in the reference class of well-specified tasks with tests basically already written.
As a concrete example, if you just use a random other distribution of tasks for horizon length as your base, like forecasting performance for unit of time, or writing per unit of time, or graphic design per unit of time, you get extremely drastically different time horizon curves.
This doesn’t make METR’s curves unreasonable as a basis, but you really need a lot of assumptions to get you from “these curves intersect one year here” to “the same year we will get ~fully automated AI R&D” (and indeed I would not currently believe the latter).
- Xodarap 22 Jun 2025 17:40 UTC
  5 points
  −6
  Parent
  Preliminary work showing that the METR trend is approximately average:
  - habryka 22 Jun 2025 23:11 UTC
    5 points
    3
    Parent
    I don’t know the details of all of these task distributions, but clearly these are not remotely sampled uniformly from the set of all tasks necessary to automate AI R&D?
    - Thomas Kwa 24 Jun 2025 17:28 UTC
      4 points
      0
      Parent
      Yes, in particular the concern about benchmark tasks being well-specified remains. We’ll need both more data (probably collected from AI R&D tasks in the wild) and more modeling to get a forecast for overall speedup.
      However, I do think if we have a wide enough distribution of tasks, AIs outperform humans on all of them at task lengths that should imply humans spend 1/10th the labor, but AI R&D has not been automated yet, something strange needs to be happening. So looking at different benchmarks is partial progress towards understanding the gap between long time horizons on METR’s task set and actual AI R&D uplift.
    - Xodarap 23 Jun 2025 22:10 UTC
      4 points
      2
      Parent
      (agree, didn’t intend to imply that they were)