Value learning for moral essentialists

Many people—people we want to persuade—are essentialists about morality and ethics. They give weight to the idea that just like how knowing facts about my shoes is possible because my shoes exist and interact with me, facts about what is right or good are possible because there is some essence of rightness or goodness “out there” that we somehow interact with.

This isn’t totally wrong. But I think it reflects the usual human heuristic of assuming that every category has an essence that makes it what it is. Put a human race in a world with water, air, and fire, and the first thing they’ll think is that the basic categories of water, air, and fire must somehow be reflected in the fundamental order of the world. And why does heat behave the way it does? The flow of heat substance. Why does opium make you drowsy? It has sleepiness essence.

This sort of heuristic passes for everyday use, but if you assume that goodness is a fundamental part of the world, the idea of an AI learning to do the right thing is going to sound a bit weird. How can we talk about value learning without even once checking if the AI interacts with the goodness-essence?

This post outlines the sort of strategy I’d use to try to get people following essentialist intuitions on board with value learning. I’ll indirectly rely on some properties of morality that behave more like a pattern and less like an essence. I’d definitely be interested in feedback, and if you have moral essentialist intuitions, please forgive me for framing this post as talking about you, not to you.

1: Why bother?

AI is going to be the pivotal technology of the coming century. AI that does good things can help us manage existential risks like climate change, bio-engineered diseases, asteroids, or rogue AIs build by less careful people.

In a future where very clever AI does good things, everything is probably pretty great. In a future where very clever AI does things without any regard to their goodness, even if humanity doesn’t get swept away to extinction, we’re certainly not realizing the potential boon that AI represents.

Therefore, it would be really handy if you could program a computer to figure out what the right thing to do was, and then do it—or at least to make an honest try on both accounts. The status quo of humanity is not necessarily sustainable (see above about climate change, disease, asteroids, other AIs), so the point is not to design an AI where we’re 100% absolutely certain that it will do the right thing, the point is just to design an AI that we’re more confident in than the status quo.

2: It goes from God, to Jerry, to me.

Suppose that humans have knowledge of morality. Then we just need to have the AI learn that knowledge. Just like how you can know that lithium is the third element without having experimentally verified it yourself—you learn it from a trustworthy source. Hence, “value learning.”

Problem solved, post over, everyone go home and work on value learning? Well...

The basic question someone might have about this is “what about moral progress?” For example, if one models the transition from slavery being legal to illegal as moral progress made by interacting with the external goodness-essence, what if there are further such transitions in our future that the AI can’t learn about? Or what if futuristic technology will present us with new dilemmas, moral terra incognita, which can only be resolved correctly by consultation of the goodness-essence?

This rather depends on how the AI works. If what the AI learns from humans is some list of rules that humans follow, then absolutely it can become morally outdated. But what if the AI learns the human intuitions and dispositions that lead humans to make moral judgments? Then maybe if you sent this AI back to the 1700s, it would become an abolitionist.

In other words, the real goal of value learning isn’t just to regurgitate human opinions, it’s to become connected to moral judgments in the same way humans are. If you think that the human connection to morality is supernatural, even then there are conceptions of the supernatural that would allow AI to reach correct moral conclusions. But I think that even people who say they think morality is supernatural still share a lot of the same intuitions that would let an AI learn morality. Like “moral reasoning can be correct or not independent of the person who thinks it.”

If the human connection to the goodness-essence is something that depends on the details of how humans reason about morality, then I think the success of value learning is very much still on the table, and we should be looking for ways to achieve it. If you think human morality isn’t supernatural but don’t think that merely copying human moral reasoning to the extent that you could be an early abolitionist is sufficient, don’t use that as an excuse to give up! Try to figure out how an AI could learn to do the right thing, because it might be important!