Indifference methods (e.g. see Stuart Armstrong’s paper) seem like they might be a way to formalize the person-affecting view in a rigorous way.
If we have a policy, we can always reverse engineer a corresponding reward function (see our Reward Modelling agenda, bottom page 6).
So while there might still be highly counter-intuitive bullets that need to be bitten, this might provide a way of cashing out person-affecting views in a way that is mathematically coherent/consistent.
What do you think? Does it work?
And is that even an open problem, or an interesting result to people in ethics?
[Question] Can indifference methods redeem person-affecting views?
My uninformed paraphrase/summary of “the person-affecting view” is: “classical utilitarianism + indifference to creating/destroying people”.
These views seem problematic (e.g. see Hillary Greeves interview on 80k), and difficult to support.
Indifference methods (e.g. see Stuart Armstrong’s paper) seem like they might be a way to formalize the person-affecting view in a rigorous way.
If we have a policy, we can always reverse engineer a corresponding reward function (see our Reward Modelling agenda, bottom page 6).
So while there might still be highly counter-intuitive bullets that need to be bitten, this might provide a way of cashing out person-affecting views in a way that is mathematically coherent/consistent.
What do you think? Does it work?
And is that even an open problem, or an interesting result to people in ethics?