Agreed; the alignment plan sketched here skips over why alignment for a merely-human agent should be a lot easier than for a superhuman one, or instruction-following should be easier than value alignment. I think both are probably true, but to a limited and uncertain degree. See my other comment here for more.
Agreed; the alignment plan sketched here skips over why alignment for a merely-human agent should be a lot easier than for a superhuman one, or instruction-following should be easier than value alignment. I think both are probably true, but to a limited and uncertain degree. See my other comment here for more.