I expect that for most people, “what I mean” will converge with “what I want” given superhuman help. I expect they will give increasingly broad instructions to the CAST AI, which will eventually approach “do what I want”.
I guess I should replace “without need for the principal to issue instructions” with: without a need for a continuing set of instructions.
This is true to some extent, but to that extent, I believe it ceases to be safe and value-aligned by default, acquiring all the problems of trying to align it explicitly; I believe this is one of the key ways it does not scale.
I expect that for most people, “what I mean” will converge with “what I want” given superhuman help. I expect they will give increasingly broad instructions to the CAST AI, which will eventually approach “do what I want”.
I guess I should replace “without need for the principal to issue instructions” with: without a need for a continuing set of instructions.
This is true to some extent, but to that extent, I believe it ceases to be safe and value-aligned by default, acquiring all the problems of trying to align it explicitly; I believe this is one of the key ways it does not scale.