Seth Herd comments on Will LLM agents become the first takeover-capable AGIs?

Seth Herd 2 Mar 2025 22:12 UTC
11 points
0
My question isn’t just whether people think LMAs are the primary route to dangerous AI; it’s also why they’re not addressing the agentic part in their alignment work if they do think that.

I think the most common likely answer is “aligning LLMs should help a lot with aligning agents driven by those LLMs”. That’s a reasonable position. I’m just surprised and a little confused that so little work explicitly addresses the new alignment challenges that arise if an LLM is part of a more autonomous agentic system.

The alternative I was thinking of is some new approach that doesn’t really rely on training on a language corpus. Or there are other schemes for AI and AGI that aren’t based on networks at all.

The other route is LLMs/foudnation models that are not really agentic, but relatively passive and working only step-by-step at human direction, like current systems. I hear people talk about dangers of “transformative AI” in deliberately broad terms that don’t include us designing them to be agentic.