Anthony DiGiovanni comments on Why are we so complacent about AI hell?

Anthony DiGiovanni 9 May 2023 20:01 UTC
12 points
11
I’d recommend checking out this post critiquing this view, if you haven’t read it already. Summary of the counterpoints:
- (Intent) alignment doesn’t seem sufficient to ensure an AI makes safe decisions about subtle bargaining problems in a situation of high competitive pressure with other AIs. I don’t expect the kinds of capabilities progress that is incentivized by default to suffice for us to be able to defer these decisions to the AI, especially given path-dependence on feedback from humans who’d be pretty naïve about this stuff. (C.f. this post—you need the human feedback at bottom to be sufficiently high quality to not get garbage-in, garbage-out problems even if you’ve solved the hard parts of alignment.)
- To the extent that solving all of intent alignment is too intractable, focusing on subsets of alignment that are especially likely to avoid s-risks—e.g. preventing AIs from intrinsically valuing frustrating others’ preferences—might be promising. I don’t think mainstream alignment research prioritizes these.