Max Harms comments on Any corrigibility naysayers outside of MIRI?

Max Harms 23 Oct 2025 18:15 UTC
2 points
0
Would you agree that we have about as much of a handle on what corrigibility is as we do on what an agent is? Like, I claim that I have some knowledge about corrigibility, even though it’s imperfect and I have remaining confusions. And I’m wondering whether you think humanity is deeply confused about what corrigibility even is, or whether you think it’s more like we have a handle on it but can’t quite give its True Name.
- Max Harms 23 Oct 2025 22:15 UTC
  2 points
  0
  Parent
  More of my thoughts here: https://www.lesswrong.com/posts/txNsg8hKLmnvkuqw4/worlds-where-iterative-design-succeeds