Asta7k comments on METR: Measuring AI Ability to Complete Long Tasks

Asta7k 12 Apr 2025 20:14 UTC
2 points
0
Yes they used a 50% success rate and even then some sub 10min tasks are still troublesome for LLMs as seen in the graph. But I think this will improve aswell if we make the algorithms better