ACCount comments on Comparing risk from internally-deployed AI to insider and outsider threats from humans

ACCount 11 Jul 2025 12:31 UTC
8 points
0
I fail to see how the same wouldn’t apply to the way LLMs are used now.
If an LLM is not up to the task, it will be augmented (prompting, scaffolding, RAG, fine tuning), replaced with a more capable LLM, or removed from the task outright.
The issue isn’t that you can’t “fire” an LLM for performing poorly—you absolutely can. It’s that even the SOTA performance on many tasks may fall short of acceptable.
- faul_sname 11 Jul 2025 16:30 UTC
  2 points
  0
  Parent
  I’m not sure we disagree on anything substantive here.
  
  If you have a team of 100 software developers each tasked with end-to-end delivery of assigned features, and one of them repeatedly pushes unreviewed and broken/insecure code to production, you can fire that particular developer, losing out on about 1% of your developers. If the expected harm of keeping that developer on is greater than the expected benefit of replacing them, you probably will replace them.
  
  If you have a “team” of “100″ AI agents “each” tasked with end-to-end delivery of assigned features, as they are currently implemented (same underlying model, shared-everything), and one instance does something bad, any mitigations you implement have to affect all 100 of them.
  
  That seems like it produces more pressure against the “shared-everything, identical roles for all agents in the organization” model for groups of AI developers than for groups of human developers. Organizational pressures for groups of human developers already push them into specialized roles, and I expect those pressures to be even stronger for groups of AI developers. As such
  
  They plan on trying to thread the needle by employing some control schemes where (for example) different “agents” have different permissions. i.e. a “code writing” agent has read permissions for (some parts of) the codebase, the ability to write, deploy, and test changes to that code in a sandboxed dev environment, and the ability to open a pull request with those changes.
  
  doesn’t particularly feel like an implausible “thread the needle” strategy, it seems like the sort of thing we get by default because the incentives are already pushing so incredibly hard in that direction.