Help Understanding Preferences And Evil

I’m having a problem understanding why Stuart Russell thinks that AI learning human preferences is a good idea. I think it’s a bad idea. I assume I am wrong but I assume I don’t understand something. So, help me out here please. I’m not looking for an argument but rather to understand. Let me explain.

I have watched Stuart’s four hour lecture series of Reith lectures on the BBC. Highly recommended. I have watched several other videos that include him as well. I am currently reading his book, Human Compatible. I am not an academic. I am now retired and write science fiction about advanced social robots for a hobby.

Reading the chapter “AI: A Different Approach” in Stuart’s book I am still bothered by something about the preferences issue. My understanding of Stuart’s “new model for AI” is that it would learn what our preferences are from observing our behavior. I understand why he thinks “preferences” is a better word than “values” to describe these behaviors but, at the risk of confusing things, let me use the word values to explain my confusion.

As I understand it, humans have different kinds of values:

1) Those that are evolved and which we all share as a species, like why sugar tastes good or why glossy hair is attractive.
2) Those that reflect our own individuality, which make each of us unique including those some twin studies reveal.
3) Those our culture, family, society or what have you impose on us.

I believe the first two kinds are genetic and the third kind learned. Let me classify the first two as biological values and the third kind as social values. It would appear that the third category accounts for the majority of the recent evolution of our physical brains.

Let’s consider three values for each type just as simple examples. Biological values might be greed, selfishness and competition while social values might be trust, altruism and cooperation. Humans are a blend of all six of these values and will exhibit preferences based on them in different situations. A lot of times they are going to choose behaviors based on biological values as the nightly news makes clear.

If AI learns our preferences base on our behaviors it’s going to learn a lot of “bad” things like lying, stealing and cheating and other much worse things. From a biological point of view, these behaviors are “good” because they maximize the return on calories invested by getting others to do the work while we reap the benefits. Parasites and cuckoo birds for example.

In his Reith lecture Stuart states that an AI trained on preferences will not turn out evil but he never explains why not. There is no mention (so far) in his book regarding the issue of human preferences and anything we would consider negative, bad or evil. I simply don’t understand how an AI observing our behavior is going to end up being exclusively benevolent or “provably beneficial” to use Stuart’s term.

I think an AI learning from our preferences would be a terrible idea. What am I not understanding?