By the time you have AIs capable of doing substantial work on AI r&d, they will also be able to contribute effectively to alignment research (including, presumably, secret self-alignment).
Even if takeoff is harder than alignment, that problem becomes apparent at the point where the amount of AI labor available to work on those problems begins to explode, so it might still happen quickly from a calendar perspective.
By the time you have AIs capable of doing substantial work on AI r&d, they will also be able to contribute effectively to alignment research (including, presumably, secret self-alignment).
Humans do substantial work on AI r&d, but we haven’t been very effective at alignment research. (At least, according to the view that says alignment is very hard, which typically also says that basically all of our current “alignment” techniques will not scale at all.)
Even if takeoff is harder than alignment, that problem becomes apparent at the point where the amount of AI labor available to work on those problems begins to explode, so it might still happen quickly from a calendar perspective.
By the time you have AIs capable of doing substantial work on AI r&d, they will also be able to contribute effectively to alignment research (including, presumably, secret self-alignment).
Even if takeoff is harder than alignment, that problem becomes apparent at the point where the amount of AI labor available to work on those problems begins to explode, so it might still happen quickly from a calendar perspective.
Humans do substantial work on AI r&d, but we haven’t been very effective at alignment research. (At least, according to the view that says alignment is very hard, which typically also says that basically all of our current “alignment” techniques will not scale at all.)
Yup, this is very possible.