You can’t write an algorithm based on “if you don’t get it, you’re part of the problem”. You can get away with telling that to your children, sort of, but only because children are very good at synthesizing behavioral rules from contextual cues. Rorty’s advice might be useful as a practical guide to making moral humans, but it only masks the underlying issue: if the only way for morality to win in the real world is to avoid bringing amoral agents into existence, then there must already exist a well-bounded set of moral utility functions for agents to follow. It doesn’t tell us much about what such a set might contain, giving only a loose suggestion that good morality functions tend to be relatively subject-independent.
Now, to encode a member of such a set into an AI (which may or may not end up being Friendly depending on how well those functions generalize outside the human problem domain), you need a formalization of it. To teach one implicitly, you need a formalization of something analogous (but not necessarily identical) to the social intuitions that human children use to derive their morals, which is most likely a harder problem. And if you have such a formalization, explaining an instance of moral behavior to a rational sociopath is as easy as running it on particular inputs.
Presented with an irrational sociopath you’re out of luck, but I can’t think of any ethical systems that don’t have that problem.
You can’t write an algorithm based on “if you don’t get it, you’re part of the problem”. You can get away with telling that to your children, sort of, but only because children are very good at synthesizing behavioral rules from contextual cues. Rorty’s advice might be useful as a practical guide to making moral humans, but it only masks the underlying issue: if the only way for morality to win in the real world is to avoid bringing amoral agents into existence, then there must already exist a well-bounded set of moral utility functions for agents to follow. It doesn’t tell us much about what such a set might contain, giving only a loose suggestion that good morality functions tend to be relatively subject-independent.
Now, to encode a member of such a set into an AI (which may or may not end up being Friendly depending on how well those functions generalize outside the human problem domain), you need a formalization of it. To teach one implicitly, you need a formalization of something analogous (but not necessarily identical) to the social intuitions that human children use to derive their morals, which is most likely a harder problem. And if you have such a formalization, explaining an instance of moral behavior to a rational sociopath is as easy as running it on particular inputs.
Presented with an irrational sociopath you’re out of luck, but I can’t think of any ethical systems that don’t have that problem.