No77e comments on METR: Measuring AI Ability to Complete Long Tasks

No77e 19 Mar 2025 17:46 UTC
3 points
0
Naively extrapolating this trend gets you to 50% reliability of 256-hour tasks in 4 years, which is a lot but not years-long reliability (like humans). So, I must be missing something. Is it that you expect most remote jobs not to require more autonomy than that?
- Zach Stein-Perlman 19 Mar 2025 19:06 UTC
  9 points
  7
  Parent
  I think doing 1-week or 1-month tasks reliably would suffice to mostly automate lots of work.
- Nikola Jurkovic 19 Mar 2025 19:05 UTC
  5 points
  3
  Parent
  I expect the trend to speed up before 2029 for a few reasons:
  1. AI accelerating AI progress once we reach 10s of hours of time horizon.
  2. The trend might be “inherently” superexponential. It might be that unlocking some planning capability generalizes very well from 1-week to 1-year tasks and we just go through those doublings very quickly.
  - Daniel Kokotajlo 19 Mar 2025 20:34 UTC
    5 points
    2
    Parent
    Indeed I would argue that the trend pretty much has to be inherently superexponential. My argument is still kinda fuzzy, I’d appreciate help in making it more clear. At some point I’ll find time to try to improve it.
- Thomas Kwa 21 Mar 2025 0:43 UTC
  4 points
  2
  Parent
  The trend probably sped up in 2024. If the future trend follows the 2024--2025 trend, we get 50% reliability at 167 hours in 2027.