Because it means you can’t get AI to do good things “for free,” it has to be something you intentionally designed it to do.
Denying the orthogonality thesis looks like claims that an AI built with one set of values will tend to change those values in a particular direction as it becomes cleverer. Because of wishful thinking, people usually try to think of reasons why an AI built in an unsafe way (with some broad distribution over possible values) will tend to end up being nice to humans (a narrow target of values) anyway.
(Although there’s at least one case where someone has argued “the orthogonality thesis is false, therefore even AIs built with good values will end up not valuing humans.”)
Denying the orthogonality thesis looks like claims that an AI built with one set of values will tend to change those values in a particular direction as it becomes cleverer
You can also argue that not all value-capacity pairs are stable or compatible with self-improvement.
Yeah, I was a bit fast and loose—there are plenty of other ways to deny the orthogonality thesis, I just focused on the one I think is most common in the wild.
Because it means you can’t get AI to do good things “for free,” it has to be something you intentionally designed it to do.
Denying the orthogonality thesis looks like claims that an AI built with one set of values will tend to change those values in a particular direction as it becomes cleverer. Because of wishful thinking, people usually try to think of reasons why an AI built in an unsafe way (with some broad distribution over possible values) will tend to end up being nice to humans (a narrow target of values) anyway.
(Although there’s at least one case where someone has argued “the orthogonality thesis is false, therefore even AIs built with good values will end up not valuing humans.”)
You can also argue that not all value-capacity pairs are stable or compatible with self-improvement.
Yeah, I was a bit fast and loose—there are plenty of other ways to deny the orthogonality thesis, I just focused on the one I think is most common in the wild.