Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.