TsviBT comments on TsviBT’s Shortform

TsviBT 17 Mar 2025 11:26 UTC
6 points
8

First, I think “corrigibility to a human” is underdefined. A human is not, themselves, a coherent agent with a specific value/goal-slot to which an AI can be corrigible.

I mean, yes, but you wrote a lot of stuff after this that seems weird / missing the point, to me. A “corrigible AGI” should do at least as well as—really, much better than—you would do, if you had a huge team of researchers under you and your full time, 100,000x speed job is to do a really good job at “being corrigible, whatever that means” to the human in the driver’s seat. (In the hypothetical you’re on board with this for some reason.)