Any policy that contains a state-action pair that brings a human closer to harm is discarded.
If at least one policy contains a state-action pair that brings a human further away from harm, then all policies that are ambivalent towards humans should be discarded. (That is, if the agent is a aware of a nearby human in immediate danger, it should drop the task it is doing in order to prioritize the human life).
This policy optimizes for safety. You’ll end up living in a rubber-padded prison of some sort, depending on how you defined “harm”. E.g. maybe you’ll be cryopreserved for all eternity. There are many things people care about besides safety, and writing down the full list and their priorities in a machine-understandable way would solve the whole outer alignment problem.
When it comes to your criticism of utilitarianism, I don’t feel that killing people is always wrong, at any time, for any reason, and under any circumstance. E.g. if someone is about to start a mass shooting at a school, or a foreign army is invading your country and there is no non-lethal way to stop them, I’d say killing them is acceptable. If the options are that 49% of population dies or 51% of population dies, I think AI should choose the first one.
However, I agree that utilitarianism doesn’t capture the whole human morality, because our morality isn’t completely consequentialist. If you give me a gift of 10$ and forget about it, that’s good, but if I steal 10$ from you without anyone noticing, that’s bad, even though the end result is the same. Jonathan Haidt in “The Righteous Mind” identifies 6 foundations of morality: Care, Fairness, Liberty, Loyalty, Purity and Obedience to Authority. Utilitarian calculations are only concerned with Care: how many people are helped and by how much. They ignore other moral considerations. E.g. having sex with dead people is wrong because it’s disgusting, even if it harms no one.
This policy optimizes for safety. You’ll end up living in a rubber-padded prison of some sort, depending on how you defined “harm”. E.g. maybe you’ll be cryopreserved for all eternity. There are many things people care about besides safety, and writing down the full list and their priorities in a machine-understandable way would solve the whole outer alignment problem.
When it comes to your criticism of utilitarianism, I don’t feel that killing people is always wrong, at any time, for any reason, and under any circumstance. E.g. if someone is about to start a mass shooting at a school, or a foreign army is invading your country and there is no non-lethal way to stop them, I’d say killing them is acceptable. If the options are that 49% of population dies or 51% of population dies, I think AI should choose the first one.
However, I agree that utilitarianism doesn’t capture the whole human morality, because our morality isn’t completely consequentialist. If you give me a gift of 10$ and forget about it, that’s good, but if I steal 10$ from you without anyone noticing, that’s bad, even though the end result is the same. Jonathan Haidt in “The Righteous Mind” identifies 6 foundations of morality: Care, Fairness, Liberty, Loyalty, Purity and Obedience to Authority. Utilitarian calculations are only concerned with Care: how many people are helped and by how much. They ignore other moral considerations. E.g. having sex with dead people is wrong because it’s disgusting, even if it harms no one.