I’d recommend checking out this post critiquing this view, if you haven’t read it already. Summary of the counterpoints:
(Intent) alignment doesn’t seem sufficient to ensure an AI makes safe decisions about subtle bargaining problems in a situation of high competitive pressure with other AIs. I don’t expect the kinds of capabilities progress that is incentivized by default to suffice for us to be able to defer these decisions to the AI, especially given path-dependence on feedback from humans who’d be pretty naïve about this stuff. (C.f. this post—you need the human feedback at bottom to be sufficiently high quality to not get garbage-in, garbage-out problems even if you’ve solved the hard parts of alignment.)
To the extent that solving all of intent alignment is too intractable, focusing on subsets of alignment that are especially likely to avoid s-risks—e.g. preventing AIs from intrinsically valuing frustrating others’ preferences—might be promising. I don’t think mainstream alignment research prioritizes these.
I’d recommend checking out this post critiquing this view, if you haven’t read it already. Summary of the counterpoints:
(Intent) alignment doesn’t seem sufficient to ensure an AI makes safe decisions about subtle bargaining problems in a situation of high competitive pressure with other AIs. I don’t expect the kinds of capabilities progress that is incentivized by default to suffice for us to be able to defer these decisions to the AI, especially given path-dependence on feedback from humans who’d be pretty naïve about this stuff. (C.f. this post—you need the human feedback at bottom to be sufficiently high quality to not get garbage-in, garbage-out problems even if you’ve solved the hard parts of alignment.)
To the extent that solving all of intent alignment is too intractable, focusing on subsets of alignment that are especially likely to avoid s-risks—e.g. preventing AIs from intrinsically valuing frustrating others’ preferences—might be promising. I don’t think mainstream alignment research prioritizes these.