Heading Toward Morality

Followup to: Ghosts in the Machine, Fake Fake Utility Functions, Fake Utility Functions

As people were complaining before about not seeing where the quantum physics sequence was going, I shall go ahead and tell you where I’m heading now.

Having dissolved the confusion surrounding the word “could”, the trajectory is now heading toward should.

In fact, I’ve been heading there for a while. Remember the whole sequence on fake utility functions? Back in… well… November 2007?

I sometimes think of there being a train that goes to the Friendly AI station; but it makes several stops before it gets there; and at each stop, a large fraction of the remaining passengers get off.

One of those stops is the one I spent a month leading up to in November 2007, the sequence chronicled in Fake Fake Utility Functions and concluded in Fake Utility Functions.

That’s the stop where someone thinks of the One Great Moral Principle That Is All We Need To Give AIs.

To deliver that one warning, I had to go through all sorts of topics—which topics one might find useful even if not working on Friendly AI. I warned against Affective Death Spirals, which required recursing on the affect heuristic and halo effect, so that your good feeling about one particular moral principle wouldn’t spiral out of control. I did that whole sequence on evolution; and discursed on the human ability to make almost any goal appear to support almost any policy; I went into evolutionary psychology to argue for why we shouldn’t expect human terminal values to reduce to any simple principle, even happiness, explaining the concept of “expected utility” along the way...

...and talked about genies and more; but you can read the Fake Utility sequence for that.

So that’s just the warning against trying to oversimplify human morality into One Great Moral Principle.

If you want to actually dissolve the confusion that surrounds the word “should”—which is the next stop on the train—then that takes a much longer introduction. Not just one November.

I went through the sequence on words and definitions so that I would be able to later say things like “The next project is to Taboo the word ‘should’ and replace it with its substance”, or “Sorry, saying that morality is self-interest ‘by definition’ isn’t going to cut it here”.

And also the words-and-definitions sequence was the simplest example I knew to introduce the notion of How An Algorithm Feels From Inside, which is one of the great master keys to dissolving wrong questions. Though it seems to us that our cognitive representations are the very substance of the world, they have a character that comes from cognition and often cuts crosswise to a universe made of quarks. E.g. probability; if we are uncertain of a phenomenon, that is a fact about our state of mind, not an intrinsic character of the phenomenon.

Then the reductionism sequence: that a universe made only of quarks, does not mean that things of value are lost or even degraded to mundanity. And the notion of how the sum can seem unlike the parts, and yet be as much the parts as our hands are fingers.

Followed by a new example, one step up in difficulty from words and their seemingly intrinsic meanings: “Free will” and seemingly intrinsic could-ness.

But before that point, it was useful to introduce quantum physics. Not just to get to timeless physics and dissolve the “determinism” part of the “free will” confusion. But also, more fundamentally, to break belief in an intuitive universe that looks just like our brain’s cognitive representations. And present examples of the dissolution of even such fundamental intuitions as those concerning personal identity. And to illustrate the idea that you are within physics, within causality, and that strange things will go wrong in your mind if ever you forget it.

Lately we have begun to approach the final precautions, with warnings against such notions as Author* control: every mind which computes a morality must do so within a chain of lawful causality, it cannot arise from the free will of a ghost in the machine.

And the warning against Passing the Recursive Buck to some meta-morality that is not itself computably specified, or some meta-morality that is chosen by a ghost without it being programmed in, or to a notion of “moral truth” just as confusing as “should” itself...

And the warning on the difficulty of grasping slippery things like “should”—demonstrating how very easy it will be to just invent another black box equivalent to should-ness, to sweep should-ness under a slightly different rug—or to bounce off into mere modal logics of primitive should-ness...

We aren’t yet at the point where I can explain morality.

But I think—though I could be mistaken—that we are finally getting close to the final sequence.

And if you don’t care about my goal of explanatorily transforming Friendly AI from a Confusing Problem into a merely Extremely Difficult Problem, then stick around anyway. I tend to go through interesting intermediates along my way.

It might seem like confronting “the nature of morality” from the perspective of Friendly AI is only asking for additional trouble.

Artificial Intelligence melts people’s brains. Metamorality melts people’s brains. Trying to think about AI and metamorality at the same time can cause people’s brains to spontaneously combust and burn for years, emitting toxic smoke—don’t laugh, I’ve seen it happen multiple times.

But the discipline imposed by Artificial Intelligence is this: you cannot escape into things that are “self-evident” or “obvious”. That doesn’t stop people from trying, but the programs don’t work. Every thought has to be computed somehow, by transistors made of mere quarks, and not by moral self-evidence to some ghost in the machine.

If what you care about is rescuing children from burning orphanages, I don’t think you will find many moral surprises here; my metamorality adds up to moral normality, as it should. You do not need to worry about metamorality when you are personally trying to rescue children from a burning orphanage. The point at which metamoral issues per se have high stakes in the real world, is when you try to compute morality in an AI standing in front of a burning orphanage.

Yet there is also a good deal of needless despair and misguided fear of science, stemming from notions such as, “Science tells us the universe is empty of morality”. This is damage done by a confused metamorality that fails to add up to moral normality. For that I hope to write down a counterspell of understanding. Existential depression has always annoyed me; it is one of the world’s most pointless forms of suffering.

Don’t expect the final post on this topic to come tomorrow, but at least you know where we’re heading.

Part of The Metaethics Sequence

Next post: “No Universally Compelling Arguments

(start of sequence)