PeterMcCluskey comments on Any corrigibility naysayers outside of MIRI?

PeterMcCluskey 23 Oct 2025 17:33 UTC
4 points
2

you can’t just train your ASI for corrigibility because it will sit and do nothing

I’m confused. That doesn’t sound like what Max means by corrigibility. A corrigible ASI would respond to requests from its principal(s) as a subgoal of being corrigible, rather than just sit and do nothing.

Or did you mean that you need to do some next-token training in order to get it to be smart enough for corrigibility training to be feasible? And that next-token training conflicts with corrigibility?
- williawa 23 Oct 2025 18:17 UTC
  3 points
  0
  Parent
  Okay, sorry about this. You are right. I have a thought up a somewhat nuanced view about how prosaic corrigibility could work and I kind of just assumed that was the same was what Max had because he uses a lot of the same keywords I use when I think about this, but after actually reading the CAST article (or I read part 0 and 1), I realize we have really quite different views.