If Moral Realism is true, then the Orthogonality Thesis is false.

Claim: if moral realism is true, then the Orthogonality Thesis is false, and superintelligent agents are very likely to be moral.

I’m arguing against Armstrong’s version of Orthogonality[1]:

The fact of being of high intelligence provides extremely little constraint on what final goals an agent could have (as long as these goals are of feasible complexity, and do not refer intrinsically to the agent’s intelligence).

Argument

1. Assume moral realism; there are true facts about morality.

2. Intelligence is causally[2] correlated with having true beliefs.[3]

3. Intelligence is causally correlated with having true moral beliefs.[4]

4. Moral beliefs constrain final goals; believing “X is morally wrong” is a very good reason and motivator for not doing X.[5]

5. Superintelligent agents will likely have final goals that cohere with their (likely true) moral beliefs.

6. Superintelligent agents are likely to be moral.

  1. ^

    Though this argument basically applies to most other versions, including Yudkowsky’s strong form: “There can exist arbitrarily intelligent agents pursuing any kind of goal [and] there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal.”
    This argument as is does not work against Yudkowsky’s weak form: “Since the goal of making paperclips is tractable, somewhere in the design space is an agent that optimizes that goal.”

  2. ^

    Correction: there was a typo in the original post here. Instead of ‘causally’, it read ‘casually’.

  3. ^

    A different way of saying this: Intelligent agents tend towards a comprehensive understanding of reality; towards having true beliefs. As intelligence increases, agents will (in general, on average) be less wrong.

  4. ^

    A different way of saying this: Intelligent agents tend toward having true moral beliefs.
    To spell this out a bit more:

    A. Intelligent agents tend toward having true moral beliefs.
    B. Moral facts (under moral realism) are (an important!) part of reality.
    C. Intelligent agents tend toward having true moral beliefs.

  5. ^

    One could reject this proposition by taking a strong moral externalism stance. If moral claims are not intrinsically motivating and there is no general connection between moral beliefs and motivation, then this proposition does not follow. See here for discussion of moral interalism and orthogonality and here for discussion of moral motivation.

    As for the positive case for this proposition, there are at least two:
    A. For any ideally rational agent, judging “X is morally wrong” entails a decisive, undefeated reason not to adopt X as a terminal end. [This supports a proposition even stronger than the one here, but gets us into the weeds of ‘ideally rational agents’.]

    B. For sufficiently rational agents, believing “X is morally wrong” generates a pro-tanto motivation not to do X. Other motivations could in principle outweigh this motivation. [This supports the proposition here. Notably, it does not guarantee that perfectly rational, intelligent agents will be moral.]