If we can solve enough of the alignment problem, the rest gets solved for us.
If we can get a half-assed approximate solution to the alignment problem, sufficient to semi-align a STEM-capable AGI value learner of about smart-human level well enough to not kill everyone, then it will be strongly motivated to solve the rest of the alignment problem for us, just as the ‘sharp left turn’ is happening, especially if it’s also going Foom. So with value learning, there is is a region of convergence around alignment.
Or to reuse one of Eliezer’s metaphors, then if we can point the rocket on approximately the right trajectory, it will automatically lock on and course-correct from there.
If we solve the alignment problem than we solve alignment problem.
I agree with this true statement.
If we can solve enough of the alignment problem, the rest gets solved for us.
If we can get a half-assed approximate solution to the alignment problem, sufficient to semi-align a STEM-capable AGI value learner of about smart-human level well enough to not kill everyone, then it will be strongly motivated to solve the rest of the alignment problem for us, just as the ‘sharp left turn’ is happening, especially if it’s also going Foom. So with value learning, there is is a region of convergence around alignment.
Or to reuse one of Eliezer’s metaphors, then if we can point the rocket on approximately the right trajectory, it will automatically lock on and course-correct from there.