Rafael Harth comments on Reactions to METR task length paper are insane

Rafael Harth 11 Apr 2025 10:34 UTC
11 points
4

I’m glad METR did this work, and I think their approach is sane and we should keep adding data points to this plot.

It sounds like you also think the current points on the plot are accurate? I would strongly dispute this, for all the reasons discussed here and here. I think you can find sets of tasks where the points fit on an exponential curve, but I don’t think AI can do 1 hour worth of thinking on all, or even most, practically relevant questions.
- Cole Wyeth 11 Apr 2025 13:53 UTC
  7 points
  2
  Parent
  I remember enjoying that post (perhaps I even linked it somewhere?) and I think it’s probably the case that the inefficiency in task length scaling has to do with LLMs having only a subset of cognitive abilities available. I’m not really committed to a view on that here though.
  The links don’t seem to prove that the points are “inaccurate.”