Gunnar_Zarncke comments on How far along Metr’s law can AI start automating or helping with alignment research?

Gunnar_Zarncke 20 Mar 2025 23:38 UTC
1 point
−1
I found this tweet helpful that does the same regression on another dataset—chess—and arrives at an absurd conclusion. For me, the result is that LLMs may soon be able to handle very big software engineering tasks, but that will likely not generalize to arbitrary tasks. Longer more general tasks might still follow soon after but you can’t reliably predict this with this single dataset alone.
- Rafael Harth 21 Mar 2025 10:59 UTC
  2 points
  0
  Parent
  I don’t think I get it. If I read this graph correctly, it seems to say that if you let a human play chess against an engine and want it to achieve equal performance, then the amount of time the human needs to think grows exponentially (as the engine gets stronger). This doesn’t make sense if extrapolated downward, but upward it’s about what I would expect. You can compensate for skill by applying more brute force, but it becomes exponentially costly, which fits the exponential graph.
  
  It’s probably not perfect—I’d worry a lot about strategic mistakes in the opening—but it seems pretty good. So I don’t get how this is an argument against the metric.
  - Gunnar_Zarncke 21 Mar 2025 13:55 UTC
    2 points
    0
    Parent
    It is a decent metric for chess but a) it doesn’t generalize to other tasks (as people seem to interpret the METR paper), and less importantly, b) I’m quite confident that people wouldn’t beat the chess engines by thinking for years.
- Christopher King 20 Mar 2025 23:54 UTC
  1 point
  0
  Parent
  What is the absurd conclusion?
  - Gunnar_Zarncke 20 Mar 2025 23:55 UTC
    2 points
    0
    Parent
    That we would have AIs performing year-long tasks in 2005. Chess is not the same as software engineering but it is still a limited domain.
    - Christopher King 21 Mar 2025 0:00 UTC
      1 point
      0
      Parent
      I mean, beating a chess engine in 2005 might be a “years-long task” for a human? The time METR is measuring is how long it would hypothetically take a human to do the task, not how long it takes the AI.
      - Gunnar_Zarncke 21 Mar 2025 14:00 UTC
        2 points
        0
        Parent
        Yes, but it didn’t mean that AIs could do all kinds of long tasks in 2005. And that is the conclusion many people seem to draw from the METR paper.
      - Gunnar_Zarncke 21 Mar 2025 0:13 UTC
        2 points
        0
        Parent
        No? It means you can’t beat the chess engine.
        And even if—they try to argue in the other direction: If it takes the human time X at time T it will take the AI duration L. That didn’t work for chess either.