The first. Value alignment is much harder. But it will be vastly easier with smarter-than-human help. So there are two difficult steps, and it’s clear which one should be tackled first.
The difficulty with value alignment is both in figuring out what we actually want, and then figuring out how to make those values stable in mind that changes as it learns new things.
Is your view closer to:
there’s two hard steps (instruction following, value alignment), and of the two instruction following is much more pressing
instruction following is the only hard step; if you get that, value alignment is almost certain to follow
The first. Value alignment is much harder. But it will be vastly easier with smarter-than-human help. So there are two difficult steps, and it’s clear which one should be tackled first.
The difficulty with value alignment is both in figuring out what we actually want, and then figuring out how to make those values stable in mind that changes as it learns new things.