Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.