skeptical_lurker comments on Superintelligence 24: Morality models and “do what I mean”

skeptical_lurker 25 Feb 2015 11:26 UTC
1 point
0

Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.

I think this approach, which has been advocated before, is a massive help, especially when expanded to deal with different forms of CEV (what counts as a human? should utility functions be bounded? etc). When there are so many galaxies, it makes no sense to limit the AI to a single point of failure.

The problem comes if there are certain things which some moral systems believe to carry a negative moral weight. If one morality involved noble struggle, it could create more suffering than happiness, which would be inherently bad for a negative utilitarian. It seems there are certain moral systems which can’t coexist, even in different galaxies.

I wonder how many people believe that all moral good stems from their religion? I imagine the ‘extrapolated’ bit might deal with this, the AI deciding ‘no, you actually want happiness and an ingroup’, but its not certain. If more than two different groups had utility functions assigning 1 to a person who believes in their religion and −1 to a person who doesn’t, because they are going to burn in hell (1) then you end up with a situation where, even with all the resources in the universe, you can’t get a positive result on the utility function without killing much of humanity.

I don’t think this specific example would actually happen—people don’t really believe in their religions that strongly—but I’d still be inclined to at least consider saying that people’s utility functions can only apply to themselves, or can only apply negative weights to themselves, or can only apply to other people if the sign is the same as the sign that person applies to themselves. To give an example for the last proposition, you could care about me suffering iff I care about me suffering.

1) yes I know that not all religions believe this, but let’s just run with it for the purposes of a thought experiment