Zac Hatfield-Dodds comments on Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?

Zac Hatfield-Dodds 3 Jan 2026 10:28 UTC
5 points
6
Bluntly, this cannot possibly work.

Open-weights models will remain useful for general-purpose tasks, including in the common case where earlier context on the situation was not produced by the same model. Breaking the evidence chain is therefore sufficient, and is also easy for the attacker.

Do not confuse desirability for possibility.
- Mayowa Osibodu 3 Jan 2026 22:01 UTC
  1 point
  −2
  Parent
  This is a straw man argument. The standard MO of coding agents is that they use one consistent LLM in their agentic flow. The approach I outlined addresses that default case, and there’s obvious utility in that.
  
  You might as well say there’s no point in Anthropic tracking malicious usage of Claude Code in their telemetry data, because attackers are free to switch up their coding agent (between e.g. Codex, Gemini etc) within the course of a multi-step task.