I would very much like to see proposals for AI alignment that escape completely from the assumption that we are going to hand off agency to AI.
Microscope AI (see here and here) is an AI alignment proposal that attempts to entirely avoid agency hand-off.
I also agree with Rohin’s comment that Paul-style corrigibility is at least trying to avoid a full agency hand-off, though it still has significantly more of an agency hand-off than something like microscope AI.
Np! Also, just going through the rest of the proposals in my 11 proposals paper, I’m realizing that a lot of the other proposals also try to avoid a full agency hand-off. STEM AI restricts the AI’s agency to just STEM problems, narrow reward modeling restricts individual AIs to only apply their agency to narrow domains, and the amplification and debate proposals are trying to build corrigible question-answering systems rather than do a full agency hand-off.
Microscope AI (see here and here) is an AI alignment proposal that attempts to entirely avoid agency hand-off.
I also agree with Rohin’s comment that Paul-style corrigibility is at least trying to avoid a full agency hand-off, though it still has significantly more of an agency hand-off than something like microscope AI.
Thanks for this
Np! Also, just going through the rest of the proposals in my 11 proposals paper, I’m realizing that a lot of the other proposals also try to avoid a full agency hand-off. STEM AI restricts the AI’s agency to just STEM problems, narrow reward modeling restricts individual AIs to only apply their agency to narrow domains, and the amplification and debate proposals are trying to build corrigible question-answering systems rather than do a full agency hand-off.