I think if human-level AIs were going to be capable of making great strides in scalable alignment work, we would have seen more progress from human-level humans. The fact that a large chunk of the field has converged on strategies like “Get another person to do the work” (i.e. fieldbuilding work, organizing mentorships, etc.) or “Get an AI to do the work” (i.e. AI control, superalignment) or “Stop or slow the building of AGI and/or make the builders of it more responsible” (i.e. policy work) is a very bad sign.
The total progress being made on the real meat of alignment is very low, compared to the progress being made in capabilities. I don’t see why we should expect this, or the distribution of resources, to suddenly flip in favour of alignment during the middle of the singularity once human-level AIs have been developed, and everything is a thousand times more stressful and the race dynamics are a thousand times worse.
I think if human-level AIs were going to be capable of making great strides in scalable alignment work, we would have seen more progress from human-level humans. The fact that a large chunk of the field has converged on strategies like “Get another person to do the work” (i.e. fieldbuilding work, organizing mentorships, etc.) or “Get an AI to do the work” (i.e. AI control, superalignment) or “Stop or slow the building of AGI and/or make the builders of it more responsible” (i.e. policy work) is a very bad sign.
The total progress being made on the real meat of alignment is very low, compared to the progress being made in capabilities. I don’t see why we should expect this, or the distribution of resources, to suddenly flip in favour of alignment during the middle of the singularity once human-level AIs have been developed, and everything is a thousand times more stressful and the race dynamics are a thousand times worse.