Vladimir_Nesov comments on I Think We’re Approaching The Bitter Lesson’s Asymptote

Vladimir_Nesov 18 Feb 2023 22:55 UTC
3 points
0

mutualism-seeking self-correction at every level

Unclear without a fuller reference, but taking a guess. I don’t think we either have or need anything to offer other than being moral patients. And moral patienthood doesn’t need to be symmetric, I expect it’s correct to leave sapient crocodiles (who by stipulation won’t consider me a moral patient) to their own devices, as long as their global influence and ability to commit atrocities is appropriately bounded.

Because of LLM human imitations, the main issue with alignment appears to be transitive alignment, ability of LLM AGIs to set up existential risk governance so that they don’t build unaligned successor AGIs. This is not a matter of morality or deeper principles that generate morality’s alignment-relevant aspects. It’s a matter of competence, and LLM AGIs won’t by default be much more competent at this sort of coordination and caution than humans, even as they are at least as competent at building unaligned AGIs. Even if there is a selection principle whereby eventually most unaligned AGIs grow to recognize humans as moral patients, that doesn’t save us from the initial ignorant nanotech-powered flailing.

So I don’t think non-extinction alignment is reliably feasible, even if LLM AGIs are the first to take off and are fine alignment-wise. But I think it’s not a problem whose outcome humans will have an opportunity to influence. What we need to focus on is alignment of LLM characters, channeling their potential humanity instead of morally arbitrary fictional AI tropes, and maybe unusual inclination to be wary of AI risk.