My question isn’t just whether people think LMAs are the primary route to dangerous AI; it’s also why they’re not addressing the agentic part in their alignment work if they do think that.
I think the most common likely answer is “aligning LLMs should help a lot with aligning agents driven by those LLMs”. That’s a reasonable position. I’m just surprised and a little confused that so little work explicitly addresses the new alignment challenges that arise if an LLM is part of a more autonomous agentic system.
The alternative I was thinking of is some new approach that doesn’t really rely on training on a language corpus. Or there are other schemes for AI and AGI that aren’t based on networks at all.
The other route is LLMs/foudnation models that are not really agentic, but relatively passive and working only step-by-step at human direction, like current systems. I hear people talk about dangers of “transformative AI” in deliberately broad terms that don’t include us designing them to be agentic.
My question isn’t just whether people think LMAs are the primary route to dangerous AI; it’s also why they’re not addressing the agentic part in their alignment work if they do think that.
I think the most common likely answer is “aligning LLMs should help a lot with aligning agents driven by those LLMs”. That’s a reasonable position. I’m just surprised and a little confused that so little work explicitly addresses the new alignment challenges that arise if an LLM is part of a more autonomous agentic system.
The alternative I was thinking of is some new approach that doesn’t really rely on training on a language corpus. Or there are other schemes for AI and AGI that aren’t based on networks at all.
The other route is LLMs/foudnation models that are not really agentic, but relatively passive and working only step-by-step at human direction, like current systems. I hear people talk about dangers of “transformative AI” in deliberately broad terms that don’t include us designing them to be agentic.