one cheap and easy method (with surprisingly good properties) is to take the maximal possible expected utility (the expected utility that person would get if the AI did exactly what they wanted) as 1, and the minimal possible expected utility (if the AI was to work completely against them) as 0.
“if the AI did exactly what they wanted” as opposed to “if the universe went exactly as they wanted” to avoid issues with unbounded utility functions? This seems like it might not be enough if the universe itself were unbounded in the relivant sense.
For example, suppose my utility function is U(Universe) = #paperclips, which is unbounded in a big universe. Then you’re going to normalise me as assigning U(AI becomes clippy) = 1, and U(individual paperclips) = 0.
For example, suppose my utility function is U(Universe) = #paperclips, which is unbounded in a big universe. Then you’re going to normalise me as assigning U(AI becomes clippy) = 1, and U(individual paperclips) = 0.
Yep.
So most likely a certain proportion of the universe will become paperclips.
“if the AI did exactly what they wanted” as opposed to “if the universe went exactly as they wanted” to avoid issues with unbounded utility functions? This seems like it might not be enough if the universe itself were unbounded in the relivant sense.
For example, suppose my utility function is U(Universe) = #paperclips, which is unbounded in a big universe. Then you’re going to normalise me as assigning U(AI becomes clippy) = 1, and U(individual paperclips) = 0.
Yep.
So most likely a certain proportion of the universe will become paperclips.