That Tiny Note of Discord

Eliezer Yudkowsky23 Sep 2008 6:02 UTC

59 points

When we last left Eliezer₁₉₉₇, he believed that any superintelligence would automatically do what was “right”, and indeed would understand that better than we could; even though, he modestly confessed, he did not understand the ultimate nature of morality. Or rather, after some debate had passed, Eliezer₁₉₉₇ had evolved an elaborate argument, which he fondly claimed to be “formal”, that we could always condition upon the belief that life has meaning; and so cases where superintelligences did not feel compelled to do anything in particular, would fall out of consideration. (The flaw being the unconsidered and unjustified equation of “universally compelling argument” with “right”.)

So far, the young Eliezer is well on the way toward joining the “smart people who are stupid because they’re skilled at defending beliefs they arrived at for unskilled reasons”. All his dedication to “rationality” has not saved him from this mistake, and you might be tempted to conclude that it is useless to strive for rationality.

But while many people dig holes for themselves, not everyone succeeds in clawing their way back out.

And from this I learn my lesson: That it all began—

—with a small, small question; a single discordant note; one tiny lonely thought...

As our story starts, we advance three years to Eliezer₂₀₀₀, who in most respects resembles his self of 1997. He currently thinks he’s proven that building a superintelligence is the right thing to do if there is any right thing at all. From which it follows that there is no justifiable conflict of interest over the Singularity, among the peoples and persons of Earth.

This is an important conclusion for Eliezer₂₀₀₀, because he finds the notion of fighting over the Singularity to be unbearably stupid. (Sort of like the notion of God intervening in fights between tribes of bickering barbarians, only in reverse.) Eliezer₂₀₀₀′s self-concept does not permit him—he doesn’t even want—to shrug and say, “Well, my side got here first, so we’re going to seize the banana before anyone else gets it.” It’s a thought too painful to think.

And yet then the notion occurs to him:

Maybe some people would prefer an AI do particular things, such as not kill them, even if life is meaningless?

His immediately following thought is the obvious one, given his premises:

In the event that life is meaningless, nothing is the “right” thing to do; therefore it wouldn’t be particularly right to respect people’s preferences in this event.

This is the obvious dodge. The thing is, though, Eliezer₂₀₀₀ doesn’t think of himself as a villain. He doesn’t go around saying, “What bullets shall I dodge today?” He thinks of himself as a dutiful rationalist who tenaciously follows lines of inquiry. Later, he’s going to look back and see a whole lot of inquiries that his mind somehow managed to not follow—but that’s not his current self-concept.

So Eliezer₂₀₀₀ doesn’t just grab the obvious out. He keeps thinking.

But if people believe they have preferences in the event that life is meaningless, then they have a motive to dispute my Singularity project and go with a project that respects their wish in the event life is meaningless. This creates a present conflict of interest over the Singularity, and prevents right things from getting done in the mainline event that life is meaningful.

Now, there’s a lot of excuses Eliezer₂₀₀₀ could have potentially used to toss this problem out the window. I know, because I’ve heard plenty of excuses for dismissing Friendly AI. “The problem is too hard to solve” is one I get from AGI wannabes who imagine themselves smart enough to create true Artificial Intelligence, but not smart enough to solve a really difficult problem like Friendly AI. Or “worrying about this possibility would be a poor use of resources, what with the incredible urgency of creating AI before humanity wipes itself out—you’ve got to go with what you have”, this being uttered by people who just basically aren’t interested in the problem.

But Eliezer₂₀₀₀ is a perfectionist. He’s not perfect, obviously, and he doesn’t attach as much importance as I do to the virtue of precision, but he is most certainly a perfectionist. The idea of metaethics that Eliezer₂₀₀₀ espouses, in which superintelligences know what’s right better than we do, previously seemed to wrap up all the problems of justice and morality in an airtight wrapper.

The new objection seems to poke a minor hole in the airtight wrapper. This is worth patching. If you have something that’s perfect, are you really going to let one little possibility compromise it?

So Eliezer₂₀₀₀ doesn’t even want to drop the issue; he wants to patch the problem and restore perfection. How can he justify spending the time? By thinking thoughts like:

What about Brian Atkins? [Brian Atkins being the startup funder of the Singularity Institute.] He would probably prefer not to die, even if life were meaningless. He’s paying for the Singularity Institute right now; I don’t want to taint the ethics of our cooperation.

Eliezer₂₀₀₀′s sentiment doesn’t translate very well—English doesn’t have a simple description for it, or any other culture I know. Maybe the passage in the Old Testament, “Thou shalt not boil a young goat in its mother’s milk”. Someone who helps you out of altruism shouldn’t regret helping you; you owe them, not so much fealty, but rather, that they’re actually doing what they think they’re doing by helping you.

Well, but how would Brian Atkins find out, if I don’t tell him? Eliezer₂₀₀₀ doesn’t even think this except in quotation marks, as the obvious thought that a villain would think in the same situation. And Eliezer₂₀₀₀ has a standard counter-thought ready too, a ward against temptations to dishonesty—an argument that justifies honesty in terms of expected utility, not just a personal love of personal virtue:

Human beings aren’t perfect deceivers; it’s likely that I’ll be found out. Or what if genuine lie detectors are invented before the Singularity, sometime over the next thirty years? I wouldn’t be able to pass a lie detector test.

Eliezer₂₀₀₀ lives by the rule that you should always be ready to have your thoughts broadcast to the whole world at any time, without embarrassment. Otherwise, clearly, you’ve fallen from grace: either you’re thinking something you shouldn’t be thinking, or you’re embarrassed by something that shouldn’t embarrass you.

(These days, I don’t espouse quite such an extreme viewpoint, mostly for reasons of Fun Theory. I see a role for continued social competition between intelligent life-forms, as least as far as my near-term vision stretches. I admit, these days, that it might be all right for human beings to have a self; as John McCarthy put it, “If everyone were to live for others all the time, life would be like a procession of ants following each other around in a circle.” If you’re going to have a self, you may as well have secrets, and maybe even conspiracies. But I do still try to abide by the principle of being able to pass a future lie detector test, with anyone else who’s also willing to go under the lie detector, if the topic is a professional one. Fun Theory needs a commonsense exception for global catastrophic risk management.)

Even taking honesty for granted, there are other excuses Eliezer₂₀₀₀ could use to flush the question down the toilet. “The world doesn’t have the time” or “It’s unsolvable” would still work. But Eliezer ₂₀₀₀ doesn’t know that this problem, the “backup” morality problem, is going to be particularly difficult or time-consuming. He’s just now thought of the whole issue.

And so Eliezer₂₀₀₀ begins to really consider the question: Supposing that “life is meaningless” (that superintelligences don’t produce their own motivations from pure logic), then how would you go about specifying a fallback morality? Synthesizing it, inscribing it into the AI?

There’s a lot that Eliezer₂₀₀₀ doesn’t know, at this point. But he has been thinking about self-improving AI for three years, and he’s been a Traditional Rationalist for longer than that. There are techniques of rationality that he has practiced, methodological safeguards he’s already devised. He already knows better than to think that all an AI needs is the One Great Moral Principle. Eliezer₂₀₀₀ already knows that it is wiser to think technologically than politically. He already knows the saying that AI programmers are supposed to think in code, to use concepts that can be inscribed in a computer. Eliezer₂₀₀₀ already has a concept that there is something called “technical thinking” and it is good, though he hasn’t yet formulated a Bayesian view of it. And he’s long since noticed that suggestively named LISP tokens don’t really mean anything, etcetera. These injunctions prevent him from falling into some of the initial traps, the ones that I’ve seen consume other novices on their own first steps into the Friendly AI problem… though technically this was my second step; I well and truly failed on my first.

But in the end, what it comes down to is this: For the first time, Eliezer₂₀₀₀ is trying to think technically about inscribing a morality into an AI, without the escape-hatch of the mysterious essence of rightness.

That’s the only thing that matters, in the end. His previous philosophizing wasn’t enough to force his brain to confront the details. This new standard is strict enough to require actual work. Morality slowly starts being less mysterious to him—Eliezer₂₀₀₀ is starting to think inside the black box.

His reasons for pursuing this course of action—those don’t matter at all.

Oh, there’s a lesson in his being a perfectionist. There’s a lesson in the part about how Eliezer₂₀₀₀ initially thought this was a tiny flaw, and could have dismissed it out-of-mind if that had been his impulse.

But in the end, the chain of cause and effect goes like this: Eliezer₂₀₀₀ investigated in more detail, therefore he got better with practice. Actions screen off justifications. If your arguments happen to justify not working things out in detail, like Eliezer₁₉₉₆, then you won’t get good at thinking about the problem. If your arguments call for you to work things out in detail, then you have an opportunity to start accumulating expertise.

That was the only choice that mattered, in the end—not the reasons for doing anything.

I say all this, as you may well guess, because of the AI wannabes I sometimes run into, who have their own clever reasons for not thinking about the Friendly AI problem. Our clever reasons for doing what we do, tend to matter a lot less to Nature than they do to ourselves and our friends. If your actions don’t look good when they’re stripped of all their justifications and presented as mere brute facts… then maybe you should re-examine them.

A diligent effort won’t always save a person. There is such a thing as lack of ability. Even so, if you don’t try, or don’t try hard enough, you don’t get a chance to sit down at the high-stakes table—never mind the ability ante. That’s cause and effect for you.

Also, perfectionism really matters. The end of the world doesn’t always come with trumpets and thunder and the highest priority in your inbox. Sometimes the shattering truth first presents itself to you as a small, small question; a single discordant note; one tiny lonely thought, that you could dismiss with one easy effortless touch...

...and so, over succeeding years, understanding begins to dawn on that past Eliezer, slowly. That sun rose slower than it could have risen. To be continued.

What links here?

Eliezer Yudkowsky23 Sep 2008 6:02 UTC

59 points

36 comments6 min readLW link Archive

Noticing Noticing Confusion