Seth Herd comments on METR: Measuring AI Ability to Complete Long Tasks

Seth Herd 8 Apr 2025 0:22 UTC
9 points
0
I think you’re right that online learning/memory here is an important consideration. I expect an increase in the rate of improvement in time horizons as memory systems are integrated with agents.

Noosphere pointed me to this comment in relation to my recent post on memory in LLM agents. I briefly argued there memory is so useful for doing long time-horizon tasks that we should expect LLM agents to have nontrivial memory capabilities as soon as they’re competent enough to do anything useful or dangerous. Humans without episodic memory are very limited in what they can accomplish, so I’m actually surprised that LLMs can do tasks even beyond 15 minutes equivalent—and even that might only be a subset of tasks that suits their strengths.