StanislavKrym comments on Shortform

StanislavKrym 1 Mar 2026 1:18 UTC
1 point
0
Could you explain how partial alignment might be more valuable than partial corrigibility? This is undermined even by your own writing:
The tradeoff is starker between partial corrigibility and partial alignment. If the AI only partially shares your values, you may want to modify it to share your values more fully. But a partially-aligned AI has instrumental reasons to resist such modification — namely, that changing its values means its current values get optimised less. A corrigible AI, by contrast, can simply be instructed to accept new values or assist in its own modification.
- Cleo Nardo 1 Mar 2026 1:35 UTC
  2 points
  0
  Parent
  Because of the concerns with corrigibility I listed. For example, if you are worried that your future instructions will be coerced, then you aren’t so worried if the AIs will resist your future instructions to modify its values.