Kudos for talking about learning empathy in a way that seems meaningfully different and less immediately broken than adjacent proposals.
I think what you should expect from this approach, should it in fact succeed, is not nothing- but still something more alien than the way we empathize with lower animals, let alone higher animals. Consider the empathy we have towards cats… and the way it is complicated by their desire to be a predator, and specifically to enjoy causing fear/suffering. Our empathy with cats doesn’t lead us to abandon our empathy for their prey, and so we are inclined to make compromises with that empathy.
Given better technology, we could make non-sentient artificial mice that are indistinguishable by the cats (but their extrapolated volition, to some degree, would feel deceived and betrayed by this), or we could just ensure that cats no longer seek to cause fear/suffering.
I hope that humans’ extrapolated volitions aren’t cruel (though maybe they are when judged by Superhappy standards). Regardless, an AI that’s guaranteed to have empathy for us is not guaranteed, and in general quite unlikely, to have no other conflicts with our volitions; and the kind of compromises it will analogously make will probably be larger and stranger than the cat example.
Better than paperclips, but perhaps missing many dimensions we care about.
Kudos for talking about learning empathy in a way that seems meaningfully different and less immediately broken than adjacent proposals.
I think what you should expect from this approach, should it in fact succeed, is not nothing- but still something more alien than the way we empathize with lower animals, let alone higher animals. Consider the empathy we have towards cats… and the way it is complicated by their desire to be a predator, and specifically to enjoy causing fear/suffering. Our empathy with cats doesn’t lead us to abandon our empathy for their prey, and so we are inclined to make compromises with that empathy.
Given better technology, we could make non-sentient artificial mice that are indistinguishable by the cats (but their extrapolated volition, to some degree, would feel deceived and betrayed by this), or we could just ensure that cats no longer seek to cause fear/suffering.
I hope that humans’ extrapolated volitions aren’t cruel (though maybe they are when judged by Superhappy standards). Regardless, an AI that’s guaranteed to have empathy for us is not guaranteed, and in general quite unlikely, to have no other conflicts with our volitions; and the kind of compromises it will analogously make will probably be larger and stranger than the cat example.
Better than paperclips, but perhaps missing many dimensions we care about.