Seth Herd comments on Prioritizing threats for AI control

Seth Herd 19 Mar 2025 23:28 UTC
10 points
2
I’m suddenly expecting the first AI escapes to be human-aided. And that could be a good thing.

Your mention of human-aided AI escape brought to mind Zvi’s Going Nova post today about LLMs convincing humans they’re conscious to get help in “surviving”. My comment there is about how those arguments will be increasingly compelling because LLMs have some aspects of human consciousness and will have more as they’re enhanced, particularly with good memory systems.
If humans within orgs help LLM agents “escape”, they’ll get out before they could manage it on their own. That might provide some alarming warning shots before agents are truly dangerous.