ryan_greenblatt comments on Plans A, B, C, and D for misalignment risk

ryan_greenblatt 8 Oct 2025 23:56 UTC
4 points
0
It’s hard for some output to mitigate risk if no one will implement it. You can try to generate more political will or I suppose you can just add things to the pretraining corpus in the hopes this will help.

You can imagine regimes where external actors are allowed to implement things, but this would move you closer to more like a Plan D or D/E scenario. (These are more like points along a spectrum rather than an exhaustive break down.)
- Cleo Nardo 9 Oct 2025 0:09 UTC
  3 points
  0
  Parent
  1. Try to make deals with the AIs (maybe this counts as ‘add things to corpus’)
  2. Try to make deals with the lab (maybe this counts as ‘Move to Plan C/D’)
  3. Try to disrupt the compute supply chain or lab employees
  4. Harden the external environment, d/acc stuff (probably hopeless, but maybe worthwhile on slow takeoff)
  5. YOLO human intelligence enhancement / uploads
  Yeah, E seems hard. Especially because from the outside E is indistinguishable from D, and many of the E strategies would be negatively-polarising, hence counterproductive on Plan D.