RogerDearnaley comments on Reactions to METR task length paper are insane

RogerDearnaley 26 May 2025 10:34 UTC
4 points
2
Others have argued that the line can’t stay straight forever because eventually AI systems will “unlock” all the abilities necessary for long-horizon tasks.
AI systems tend to fail longer tasks by getting stuck, and not managing to get unstuck. Humans get stuck too — sometimes (particularly after a good night’s sleep), we notice that we are stuck, take a step back, think about it, and on occasion figure out our mistake and unstick ourselves. Somethimes we go to a coworker, or our manager, or a mentor, talk to them, and they may point out the error that we’d made that got us stuck. And of course sometimes people stay stuck..
As a builder of agents, I’d love to try implementing each of these, in a single agent or between multiple of them as appropriate, and see if we can help agents get unstuck. Some of this might actually work. Or we can just keep increasing training set sizes and capabilities and hope they will continue to get stuck less often, Bitter Lesson style.
Probably it will be a bit of all of these things. So, like you, I wouldn’t expect there to be a threshold beyond which agents never get stuck or are always able to unstick themselves. But we haven’t been thinking seriously about this problem for very long, and I suspect we might make a quite rapid progress on it.