one cheap and easy method (with surprisingly good properties) is to take the maximal possible expected utility (the expected utility that person would get if the AI did exactly what they wanted) as 1, and the minimal possible expected utility (if the AI was to work completely against them) as 0
If Alice likes cookies, and Bob likes cookies but hates whippings, this method gives Alice more cookies than Bob. Moreover, the number of bonus cookies Alice gets depends on the properties of whips that nobody ever uses.
(In general, it’s proper for properties of counterfactuals to have impact on which decisions are correct in reality, so this consideration alone isn’t sufficient to demonstrate that there’s a problem.)
If Alice likes cookies, and Bob likes cookies but hates whippings, this method gives Alice more cookies than Bob. Moreover, the number of bonus cookies Alice gets depends on the properties of whips that nobody ever uses.
(In general, it’s proper for properties of counterfactuals to have impact on which decisions are correct in reality, so this consideration alone isn’t sufficient to demonstrate that there’s a problem.)
It feels intuitively like it’s a problem in this specific case.
You can restrict to a Pareto boundary before normalising—not as mathematically elegant, but indifferent to effects “that nobody ever wants/uses”.