Seth Herd comments on ryan_greenblatt’s Shortform

Seth Herd 3 Jun 2025 23:06 UTC
4 points
1
I agree with most of Dwarkesh’s post, with essentially the same exceptions you’ve listed.

I wrote about this recently in LLM AGI will have memory, and memory changes alignment, drawing essentially the conclusions you’ve given above. Continuous learning is a critical strength of humans and job substitution will be limited (but real) until LLM agents can do effective self-directed learning. It’s quite hard to say how fast that will happen, for the reasons you’ve given. Fine-tuning is a good deal like human habit/skill learning; the question is how well agents can select what to learn.

One nontrivial disagreement is on the barrier to long time horizon task performance. Humans don’t learn long time-horizon task performance primarily from RL. We learn in several ways at different scales, including learning new strategies which can be captured in language. All of those types of learning do rely on self-assessment and decisions about what’s worth learning, and those will be challenging and perhaps difficult to get out of LLM agents—although I don’t think there’s any fundamental barrier to squeezing workable judgments out of them, just some schlep in scaffodling and training to do it better.

Based on this logic, my timelines are getting longer in median (although rapid progress is still quite possible and we are far from prepared).

But I’m getting somewhat less pessimistic by the prospect of having incompetent autonomous agents with self-directed learning. These would probably both take over some jobs, and display egregious misalignment. I think they’ll be a visceral wakeup call that has decent odds of getting society properly freaked out about human-plus AI with a little time left to prepare.