Dave Banerjee comments on Phantom Transfer and the Basic Science of Data Poisoning

Dave Banerjee 18 Feb 2026 21:13 UTC
1 point
0
The key theory-of-change question here is: Do more sophisticated attacks require a larger mass of poison?
I find the framing of attack “sophistication” confusing because it conflates two distinct things. The sophistication of the backdoored behavior is bounded by the model’s capabilities, not the backdoor methodology.
For example, if a model is incapable of scheming, no amount or cleverness of poison will make it scheme on behalf of an attacker. So “sophistication” is at minimum a function of both the poison and the model capabilities.