paulfchristiano comments on Counterexamples to some ELK proposals

paulfchristiano 2 Jan 2022 16:29 UTC
LW: 7 AF: 5
AF
I would describe the overall question as “Is there a situation where an AI trained using this approach deliberately murders us?” and for ELK more specifically as “Is there a situation where an AI trained using this approach gives an unambiguously wrong answer to a straightforward question despite knowing better?” I generally don’t think that much about the complexity of human values.