The problem of whether the goals and values of an artificially intelligent agent will align with human goals and values, can be reduced to this problem: Will the goals and values of different human agents ever align with each other?
We can only program AI “in our own image”, so both the features and bugs of humanity will reappear in AI.
Yes great question. Looking at programming in general, there seem to be many obvious counterexamples, where computers have certain capabilities (‘features’) that humans don’t (e.g. doing millions of arithmetic operations extremely fast with zero clumsy errors) and likewise where they have certain problem (‘bugs’) that we don’t (e.g. adversarial examples for image classifiers, which don’t trip humans up at all but entire ruin the neural nets classification.)
Why do you think these statements are true?
Yes great question. Looking at programming in general, there seem to be many obvious counterexamples, where computers have certain capabilities (‘features’) that humans don’t (e.g. doing millions of arithmetic operations extremely fast with zero clumsy errors) and likewise where they have certain problem (‘bugs’) that we don’t (e.g. adversarial examples for image classifiers, which don’t trip humans up at all but entire ruin the neural nets classification.)