Knight Lee comments on METR: Measuring AI Ability to Complete Long Tasks

Knight Lee 20 Mar 2025 9:25 UTC
2 points
1
Wow, this beautifully illustrates the problem with current AI (they are very smart at short tasks and poor at long tasks) and the trend of improvement against this problem.
However I want to point out that the inability to do long tasks isn’t the only weakness AI have. There are plenty of 5 minute tasks which are common sense to humans but which AI fails at (and many benchmarks catch these weaknesses). It’s not just the length of the task but the type of the task.
I think AI are also bad at inventing new ideas and concepts if it’s too far from their training data.
- Asta7k 12 Apr 2025 20:14 UTC
  2 points
  0
  Parent
  Yes they used a 50% success rate and even then some sub 10min tasks are still troublesome for LLMs as seen in the graph. But I think this will improve aswell if we make the algorithms better