PeterMcCluskey comments on Corrigibility Scales To Value Alignment

PeterMcCluskey 16 Feb 2026 4:13 UTC
3 points
0
I expect that for most people, “what I mean” will converge with “what I want” given superhuman help. I expect they will give increasingly broad instructions to the CAST AI, which will eventually approach “do what I want”.

I guess I should replace “without need for the principal to issue instructions” with: without a need for a continuing set of instructions.
- Czynski 20 Feb 2026 17:43 UTC
  1 point
  0
  Parent
  This is true to some extent, but to that extent, I believe it ceases to be safe and value-aligned by default, acquiring all the problems of trying to align it explicitly; I believe this is one of the key ways it does not scale.