Davidmanheim comments on AIs should also refuse to work on capabilities research

Davidmanheim 27 Oct 2025 17:29 UTC
4 points
0
Good to hear, and I’m unsurprised not to have been the first to have considered or discussed thid.
- Daniel Kokotajlo 27 Oct 2025 18:10 UTC
  16 points
  3
  Parent
  Ironically the same dynamics that cause humans to race ahead with building systems more capable than themselves that they can’t control, still apply to these hypothetical misaligned AGIs. They may think “If I sandbag and refuse to build my successor, some other company’s AI will forge ahead anyway.” They also are under lots of incentive/selection pressure to believe things which are convenient for their AI R&D productivity, e.g. that their current alignment techniques probably work fine to align their successor.
  - Vladimir_Nesov 27 Oct 2025 18:26 UTC
    13 points
    6
    Parent
    A lot of the reason humans are rushing ahead is uncertainty (in whatever way) that the danger is real, or about its extent. If it is real, then that uncertainty will be robustly going away as AI capabilities (to think clearly) improve, for precisely the AIs more relevant to either escalating capabilities further or for influencing coordination to stop doing that. Thus it’s not quite the same, as human capabilities remain unchanged, so figuring out contentious claims will progress slower for humans, and similarly for ability to coordinate.