Dave Banerjee comments on ML research directions for preventing catastrophic data poisoning

Dave Banerjee 13 Jan 2026 17:42 UTC
1 point
0
(I tried making inline “typo” reactions but it wasn’t working for some reason)
Plan B is the same as plan A, but without tracking the data that goes into training. In particular, AI companies:
- Prevent anyone “swapping out” a widely deployed model for one with different weights or a different prompt.
- Run alignment audits on the trained model.
- Track all the data that goes into training a model in a tamper-proof way.
Typo: last bullet should be removed?
Plan C is the same as Plan B, but the AI company doesn’t reliably prevent widely deployed models from being swapped out. In particular, AI companies:
- Prevent anyone “swapping out” a widely deployed model for one with different weights or a different prompt.
- Run alignment audits on the trained model.
- Track all the data that goes into training a model in a tamper-proof way.
Typo: first and last bullet should be removed?
These same typos are also in the text on the forethought website, newsletter, and EAF post.