ryan_greenblatt comments on Plans A, B, C, and D for misalignment risk

ryan_greenblatt 8 Oct 2025 22:10 UTC
LW: 2 AF: 2
0
AF
A general point is that going from “no human cares at all” to “a small group of people with limited resources cares” might be a big difference, especially given the potential leverage of using a bunch of AI labor and importing cheap measures developed elsewhere.
- Cleo Nardo 11 Oct 2025 2:33 UTC
  2 points
  0
  Parent
  To clarify what I think is Ryan’s point:
  - In D-labs, both the safety faction and the non-safety faction are leveraging AI labour.
  - AI labour makes D-labs seem more like C-labs and less like E-labs, directionally.
  - This is because the effectiveness ratio between (10 humans) and (990 humans) is greater than the ratio between (10 humans and 1M AIs) and (990 humans and 990M AIs).
  - This is because of diminishing returns to cognitive labour, i.e. cheap interventions.
  - ryan_greenblatt 11 Oct 2025 2:42 UTC
    2 points
    0
    Parent
    (Yes, also I think that a small number of employees working on safety might get proportionally more compute than the average company employee, e.g. this currently seems to be the case.)
- cousin_it 8 Oct 2025 22:21 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Yeah, that partly makes sense to me. I guess my intuition is like, if 95% of the company is focused on racing as hard as possible (and using AI leverage for that too, AI coming up with new unsafe tricks and all that), then the 5% who care about safety probably won’t have that much impact.