Vladimir_Nesov comments on Plans A, B, C, and D for misalignment risk

Vladimir_Nesov 8 Oct 2025 19:54 UTC
LW: 6 AF: 4
0
AF

You can’t really have a technical “Plan E” because there is approximately no one to implement the plan

AGIs themselves will be implementing some sort of plan (perhaps at very vague and disorganized prompting from humans, or without any prompting at all; which might be influenced by blog posts and such, in publicly available Internet text). This could be relevant for mitigating ASI misalignment if these AGIs are sufficiently aligned to the future of humanity, more so than some of the hypothetical future ASIs (created without following such a plan).
- ryan_greenblatt 8 Oct 2025 21:49 UTC
  LW: 4 AF: 4
  0
  AF Parent
  Sure, I agree with this, but it’s harder for us to usefully help these AIs.
  - Vladimir_Nesov 9 Oct 2025 0:04 UTC
    LW: 2 AF: 1
    0
    AF Parent
    The “ten people on the inside” direct AIs to useful projects within their resource allocation. The AGIs themselves direct their own projects according to their propensities, which might be influenced by publicly available Internet text, possibly to a greater extent if it’s old enough to be part of pretraining datasets.
    
    The amount of resources that AGIs direct on their own initiative might dwarf the amount of resources of the “ten people on the inside”, so the impact of openly published technical plans (that make sense on their own merits) might be significant. While AGIs could come up with any ideas independently on their own, path dependence of the acute risk period might still make their initial propensities to pay attention to particular plans matter.