Rohin Shah’s position seems perfectly clear to me.
More research good. AI will speed up research along the whole range of 1-4.
>> If our most scalable current approaches don’t scale all the way through takeoff, then we (humans) will be able to be keeping up and being critically intellectually involved with developing new alignment techniques when our existing ones stop scaling, at least at the level of granularity of (3) so that we can continue to leverage AIs in ways (1-3) above. We need to keep being able to do this indefinitely, or until we come up with an approach to alignment that does scale as far as capabilities will.
This seems a weird way to phrase things. Or seems to be predicated on some assumptions I find surprising. It seems to suggest that not only that alignment is a continual neverending process [likely] but that finding alignment techniques is a continual neverending process ..!
If we would analogize ‘alignment’ to ‘steering rockets so they hit their target’ this would suggest that not only that rockets need to be continually steered and course-corrected to hit their target—the entire field of rocket steering needs continual updating and course-correcting. There is a perspective from which this is true since as humanity invents faster and better rockets—there is another perspective from which this is sort of misleading: future rockets will be governed by the Laws of Newton and steering comes down to some variant on Kalman filters and the like.
Rohin Shah’s position seems perfectly clear to me.
More research good. AI will speed up research along the whole range of 1-4.
>> If our most scalable current approaches don’t scale all the way through takeoff, then we (humans) will be able to be keeping up and being critically intellectually involved with developing new alignment techniques when our existing ones stop scaling, at least at the level of granularity of (3) so that we can continue to leverage AIs in ways (1-3) above. We need to keep being able to do this indefinitely, or until we come up with an approach to alignment that does scale as far as capabilities will.
This seems a weird way to phrase things. Or seems to be predicated on some assumptions I find surprising. It seems to suggest that not only that alignment is a continual neverending process [likely] but that finding alignment techniques is a continual neverending process ..!
If we would analogize ‘alignment’ to ‘steering rockets so they hit their target’ this would suggest that not only that rockets need to be continually steered and course-corrected to hit their target—the entire field of rocket steering needs continual updating and course-correcting. There is a perspective from which this is true since as humanity invents faster and better rockets—there is another perspective from which this is sort of misleading: future rockets will be governed by the Laws of Newton and steering comes down to some variant on Kalman filters and the like.