Ethical Injunctions

Eliezer Yudkowsky20 Oct 2008 23:00 UTC

77 points

“Would you kill babies if it was the right thing to do? If no, under what circumstances would you not do the right thing to do? If yes, how right would it have to be, for how many babies?”
—horrible job interview question

Swapping hats for a moment, I’m professionally intrigued by the decision theory of “things you shouldn’t do even if they seem to be the right thing to do”.

Suppose we have a reflective AI, self-modifying and self-improving, at an intermediate stage in the development process. In particular, the AI’s goal system isn’t finished—the shape of its motivations is still being loaded, learned, tested, or tweaked.

Yea, I have seen many ways to screw up an AI goal system design, resulting in a decision system that decides, given its goals, that the universe ought to be tiled with tiny molecular smiley-faces, or some such. Generally, these deadly suggestions also have the property that the AI will not desire its programmers to fix it. If the AI is sufficiently advanced—which it may be even at an intermediate stage—then the AI may also realize that deceiving the programmers, hiding the changes in its thoughts, will help transform the universe into smiley-faces.

Now, from our perspective as programmers, if we condition on the fact that the AI has decided to hide its thoughts from the programmers, or otherwise act willfully to deceive us, then it would seem likely that some kind of unintended consequence has occurred in the goal system. We would consider it probable that the AI is not functioning as intended, but rather likely that we have messed up the AI’s utility function somehow. So that the AI wants to turn the universe into tiny reward-system counters, or some such, and now has a motive to hide from us.

Well, suppose we’re not going to implement some object-level Great Idea as the AI’s utility function. Instead we’re going to do something advanced and recursive—build a goal system which knows (and cares) about the programmers outside. A goal system that, via some nontrivial internal structure, “knows it’s being programmed” and “knows it’s incomplete”. Then you might be able to have and keep the rule:

“If [I decide that] fooling my programmers is the right thing to do, execute a controlled shutdown [instead of doing the right thing to do].”

And the AI would keep this rule, even through the self-modifying AI’s revisions of its own code, because, in its structurally nontrivial goal system, the present-AI understands that this decision by a future-AI probably indicates something defined-as-a-malfunction. Moreover, the present-AI knows that if future-AI tries to evaluate the utility of executing a shutdown, once this hypothetical malfunction has occurred, the future-AI will probably decide not to shut itself down. So the shutdown should happen unconditionally, automatically, without the goal system getting another chance to recalculate the right thing to do.

I’m not going to go into the deep dark depths of the exact mathematical structure, because that would be beyond the scope of this blog. Also I don’t yet know the deep dark depths of the mathematical structure. It looks like it should be possible, if you do things that are advanced and recursive and have nontrivial (but consistent) structure. But I haven’t reached that level, as yet, so for now it’s only a dream.

But the topic here is not advanced AI; it’s human ethics. I introduce the AI scenario to bring out more starkly the strange idea of an ethical injunction:

You should never, ever murder an innocent person who’s helped you, even if it’s the right thing to do; because it’s far more likely that you’ve made a mistake, than that murdering an innocent person who helped you is the right thing to do.

Sound reasonable?

During World War II, it became necessary to destroy Germany’s supply of deuterium, a neutron moderator, in order to block their attempts to achieve a fission chain reaction. Their supply of deuterium was coming at this point from a captured facility in Norway. A shipment of heavy water was on board a Norwegian ferry ship, the SF Hydro. Knut Haukelid and three others had slipped on board the ferry in order to sabotage it, when the saboteurs were discovered by the ferry watchman. Haukelid told him that they were escaping the Gestapo, and the watchman immediately agreed to overlook their presence. Haukelid “considered warning their benefactor but decided that might endanger the mission and only thanked him and shook his hand.” (Richard Rhodes, The Making of the Atomic Bomb.) So the civilian ferry Hydro sank in the deepest part of the lake, with eighteen dead and twenty-nine survivors. Some of the Norwegian rescuers felt that the German soldiers present should be left to drown, but this attitude did not prevail, and four Germans were rescued. And that was, effectively, the end of the Nazi atomic weapons program.

Good move? Bad move? Germany very likely wouldn’t have gotten the Bomb anyway… I hope with absolute desperation that I never get faced by a choice like that, but in the end, I can’t say a word against it.

On the other hand, when it comes to the rule:

“Never try to deceive yourself, or offer a reason to believe other than probable truth; because even if you come up with an amazing clever reason, it’s more likely that you’ve made a mistake than that you have a reasonable expectation of this being a net benefit in the long run.”

Then I really don’t know of anyone who’s knowingly been faced with an exception. There are times when you try to convince yourself “I’m not hiding any Jews in my basement” before you talk to the Gestapo officer. But then you do still know the truth, you’re just trying to create something like an alternative self that exists in your imagination, a facade to talk to the Gestapo officer.

But to really believe something that isn’t true? I don’t know if there was ever anyone for whom that was knowably a good idea. I’m sure that there have been many many times in human history, where person X was better off with false belief Y. And by the same token, there is always some set of winning lottery numbers in every drawing. It’s knowing which lottery ticket will win that is the epistemically difficult part, like X knowing when he’s better off with a false belief.

Self-deceptions are the worst kind of black swan bets, much worse than lies, because without knowing the true state of affairs, you can’t even guess at what the penalty will be for your self-deception. They only have to blow up once to undo all the good they ever did. One single time when you pray to God after discovering a lump, instead of going to a doctor. That’s all it takes to undo a life. All the happiness that the warm thought of an afterlife ever produced in humanity, has now been more than cancelled by the failure of humanity to institute systematic cryonic preservations after liquid nitrogen became cheap to manufacture. And I don’t think that anyone ever had that sort of failure in mind as a possible blowup, when they said, “But we need religious beliefs to cushion the fear of death.” That’s what black swan bets are all about—the unexpected blowup.

Maybe you even get away with one or two black-swan bets—they don’t get you every time. So you do it again, and then the blowup comes and cancels out every benefit and then some. That’s what black swan bets are all about.

Thus the difficulty of knowing when it’s safe to believe a lie (assuming you can even manage that much mental contortion in the first place)—part of the nature of black swan bets is that you don’t see the bullet that kills you; and since our perceptions just seem like the way the world is, it looks like there is no bullet, period.

So I would say that there is an ethical injunction against self-deception. I call this an “ethical injunction” not so much because it’s a matter of interpersonal morality (although it is), but because it’s a rule that guards you from your own cleverness—an override against the temptation to do what seems like the right thing.

So now we have two kinds of situation that can support an “ethical injunction”, a rule not to do something even when it’s the right thing to do. (That is, you refrain “even when your brain has computed it’s the right thing to do”, but this will just seem like “the right thing to do”.)

First, being human and running on corrupted hardware, we may generalize classes of situation where when you say e.g. “It’s time to rob a few banks for the greater good,” we deem it more likely that you’ve been corrupted than that this is really the case. (Note that we’re not prohibiting it from ever being the case in reality, but we’re questioning the epistemic state where you’re justified in trusting your own calculation that this is the right thing to do—fair lottery tickets can win, but you can’t justifiably buy them.)

Second, history may teach us that certain classes of action are black-swan bets, that is, they sometimes blow up bigtime for reasons not in the decider’s model. So even when we calculate within the model that something seems like the right thing to do, we apply the further knowledge of the black swan problem to arrive at an injunction against it.

But surely… if one is aware of these reasons… then one can simply redo the calculation, taking them into account. So we can rob banks if it seems like the right thing to do after taking into account the problem of corrupted hardware and black swan blowups. That’s the rational course, right?

There’s a number of replies I could give to that.

I’ll start by saying that this is a prime example of the sort of thinking I have in mind, when I warn aspiring rationalists to beware of cleverness.

I’ll also note that I wouldn’t want an attempted Friendly AI that had just decided that the Earth ought to be transformed into paperclips, to assess whether this was a reasonable thing to do in light of all the various warnings it had received against it. I would want it to undergo an automatic controlled shutdown. Who says that meta-reasoning is immune from corruption?

I could mention the important times that my naive, idealistic ethical inhibitions have protected me from myself, and placed me in a recoverable position, or helped start the recovery, from very deep mistakes I had no clue I was making. And I could ask whether I’ve really advanced so much, and whether it would really be all that wise, to remove the protections that saved me before.

Yet even so… “Am I still dumber than my ethics?” is a question whose answer isn’t automatically “Yes.”

There are obvious silly things here that you shouldn’t do; for example, you shouldn’t wait until you’re really tempted, and then try to figure out if you’re smarter than your ethics on that particular occasion.

But in general—there’s only so much power that can vest in what your parents told you not to do. One shouldn’t underestimate the power. Smart people debated historical lessons in the course of forging the Enlightenment ethics that much of Western culture draws upon; and some subcultures, like scientific academia, or science-fiction fandom, draw on those ethics more directly. But even so the power of the past is bounded.

And in fact...

I’ve had to make my ethics much stricter than what my parents and Jerry Pournelle and Richard Feynman told me not to do.

Funny thing, how when people seem to think they’re smarter than their ethics, they argue for less strictness rather than more strictness. I mean, when you think about how much more complicated the modern world is...

And along the same lines, the ones who come to me and say, “You should lie about the Singularity, because that way you can get more people to support you; it’s the rational thing to do, for the greater good”—these ones seem to have no idea of the risks.

They don’t mention the problem of running on corrupted hardware. They don’t mention the idea that lies have to be recursively protected from all the truths and all the truthfinding techniques that threaten them. They don’t mention that honest ways have a simplicity that dishonest ways often lack. They don’t talk about black-swan bets. They don’t talk about the terrible nakedness of discarding the last defense you have against yourself, and trying to survive on raw calculation.

I am reasonably sure that this is because they have no clue about any of these things.

If you’ve truly understood the reason and the rhythm behind ethics, then one major sign is that, augmented by this newfound knowledge, you don’t do those things that previously seemed like ethical transgressions. Only now you know why.

Someone who just looks at one or two reasons behind ethics, and says, “Okay, I’ve understood that, so now I’ll take it into account consciously, and therefore I have no more need of ethical inhibitions”—this one is behaving more like a stereotype than a real rationalist. The world isn’t simple and pure and clean, so you can’t just take the ethics you were raised with and trust them. But that pretense of Vulcan logic, where you think you’re just going to compute everything correctly once you’ve got one or two abstract insights—that doesn’t work in real life either.

As for those who, having figured out none of this, think themselves smarter than their ethics: Ha.

And as for those who previously thought themselves smarter than their ethics, but who hadn’t conceived of all these elements behind ethical injunctions “in so many words” until they ran across this Overcoming Bias sequence, and who now think themselves smarter than their ethics, because they’re going to take all this into account from now on: Double ha.

I have seen many people struggling to excuse themselves from their ethics. Always the modification is toward lenience, never to be more strict. And I am stunned by the speed and the lightness with which they strive to abandon their protections. Hobbes said, “I don’t know what’s worse, the fact that everyone’s got a price, or the fact that their price is so low.” So very low the price, so very eager they are to be bought. They don’t look twice and then a third time for alternatives, before deciding that they have no option left but to transgress—though they may look very grave and solemn when they say it. They abandon their ethics at the very first opportunity. “Where there’s a will to failure, obstacles can be found.” The will to fail at ethics seems very strong, in some people.

I don’t know if I can endorse absolute ethical injunctions that bind over all possible epistemic states of a human brain. The universe isn’t kind enough for me to trust that. (Though an ethical injunction against self-deception, for example, does seem to me to have tremendous force. I’ve seen many people arguing for the Dark Side, and none of them seem aware of the network risks or the black-swan risks of self-deception.) If, someday, I attempt to shape a (reflectively consistent) injunction within a self-modifying AI, it will only be after working out the math, because that is so totally not the sort of thing you could get away with doing via an ad-hoc patch.

But I will say this much:

I am completely unimpressed with the knowledge, the reasoning, and the overall level, of those folk who have eagerly come to me, and said in grave tones, “It’s rational to do unethical thing X because it will have benefit Y.”

What links here?

Eliezer Yudkowsky20 Oct 2008 23:00 UTC

77 points

78 comments9 min readLW link Archive

Deontology AI Self-Deception

Psy-Kosh 20 Oct 2008 23:29 UTC
4 points
0
Given the current sequence, perhaps it’s time to revisit the whole Torture vs Dust Specks thing?
- Grognor 3 Dec 2011 20:39 UTC
  −3 points
  0
  Parent
  Late as I am, I have to say:
  
  No.
  
  That debate more settled than many-worlds is.
  - Multiheaded 1 Jan 2012 11:24 UTC
    1 point
    0
    Parent
    If by “settled” you mean anything like “there’s no argument between EY and the majority of contributors”… well, it doesn’t look that way. Not even after you discard the fact that the examples were surprisingly, uncharacteristically poorly chosen in the first place (EY clarifying and proposing alternatives to dust specks but, failing to adjust for the fact that he’s pretty much the only one who can imagine recovery from the “torture” proposed). For instance, there’s the whole open business with Eliezer displaying—or merely signaling, or whatever—an unbounded utility function, which opens him up to all sort of shaky crap, as seen in “The Aliens Have Landed” and elsewhere.
    
    If you don’t think so, go ahead and run a poll with, say, 200 minimum karma needed to vote. I’m going to be stunned if it turns out less than 30% speckers as of early 2012.
    - TheOtherDave 1 Jan 2012 15:15 UTC
      17 points
      0
      Parent
      My guess is the results of that poll would depend radically on how the question is worded.
      
      But yes, I agree with you that for most wordings, most people (including most LW contributors) will say “X units torture is worse than Y units of dust specks” for any substantial X & Y, no matter how vanishingly small X/Y is. And those who say “dust specks are worse for a sufficiently small X/Y” will chide them for succumbing to scope insensitivity, and the Torture Is Worse team will counterchide for being evil.
      
      For my own part, I think recovery is a red herring. Sure, it’s implausible to imagine a person recovering from fifty years of torture in the real world. It’s also implausible to imagine 3^^^3 people getting a dust speck in their eye in the real world. It’s an implausible thought experiment. So what?
      
      But if one insists on taking recovery rates into account, well, OK: consider a person whose life thus far has been so miserable that they are right on the borderline of they can recover from. Left alone, they’d eventually manage recovery, but even the slightest worsening of their condition—say, getting a dust speck in their eye at the wrong time—will tip them over the edge. Of course, the odds of that person actually existing are vanishingly small… but they are larger than 1/3^^^3. (Or, if you don’t agree, then make the ridiculous number even bigger.) So whichever way you choose, you’ve got some poor shmuck irrecoverably harmed by your choice.
      
      Yes, of course that’s just an intuition pump. Considering the irrecoverable harm to the torture victim is also an intuition pump. As long as we just keep tinkering with the settings of the thought experiment so that it pumps our intuitions in the direction we want to go, we’ll get nowhere.
      
      All of which is to say I mostly agree with Grognor here… the “debate” goes nowhere. Those who say “specks is worse” pride themselves on being willing to endorse a theoretical calculation about right and wrong action even when the result of that calculation conflicts with their intuitive judgments. Those who say “torture is worse” pride themselves on being able to hold on to their intuitive judgments even when the context is distractingly complicated. Mostly, the two groups don’t agree on a hypothetical situation to talk about in the first place. It gets pointless fast.
Tom_McCabe2 20 Oct 2008 23:31 UTC
10 points
0
“Would you kill babies if it was the right thing to do? If no, under what circumstances would you not do the right thing to do? If yes, how right would it have to be, for how many babies?”

I would have answered “yes”; eg., I would have set off a bomb in Hitler’s car in 1942, even if Hitler was surrounded by babies. This doesn’t seem to be a case of corruption by unethical hardware; the benefit to me from setting off such a bomb is quite negative, as it greatly increases my chance of being tortured to death by the SS.
- mamert 15 Apr 2016 6:31 UTC
  1 point
  0
  Parent
  Definitely yes. It’s not like killing babies is inherently wrong (*), it just is under most circumstances. I was thinking more along the lines of euthanasia of babies you’ve discovered have been prepared for use in biological warfare… but my mind tends to go into bad places. Let’s not get any further into that.
  
  *) unless you use absolute values for wrong, in which case it definitely is, but so is breathing
RobinHanson 20 Oct 2008 23:38 UTC
40 points
0
The problem here of course is how selective to be about rules to let into this protected level of “rules almost no one should think themselves clever enough to know when to violate.” After all, your social training may well want you to include “Never question our noble leader” in that set. Many a Christian has been told the mysteries of God are so subtle that they shouldn’t think themselves clever enough to know when they’ve found evidence that God isn’t following a grand plan to make this the best of all possible worlds.
- MarsColony_in10years 28 Oct 2015 21:34 UTC
  0 points
  0
  Parent
  
  The problem here of course is how selective to be about rules to let into this protected level
  
  Couldn’t this be determined experimentally? Ignore the last hundred years or so, or however much might influence our conclusion based on modern politics. Find a list of the people who had a large counterfactual impact on history. Which rules lead to desirable results?
  
  For example, the trial of Socrates made him a martyr, significantly advancing his ideas. That’s a couple points for “die for the principle of the matter” as an ethical injunction. After Alexander the great died, anti-Macedonian sentiment in Athens caused Aristotle to flee, saying “I will not allow the Athenians to sin twice against philosophy”. Given this, perhaps Socrates’s sacrifice didn’t achieve as much as one might think, and we should update a bit in the opposite direction. Then again, Aristotle died a year later, having accomplished nothing noteworthy in that time.
MichaelG 20 Oct 2008 23:51 UTC
9 points
0
There’s that old quote: “never let your sense of morality keep you from doing what you know is right.”

I’d still like an answer to the most basic Friendly AI question: what do you want it to do? Forget the implementation problems for a second, and just give me a scenario where the AI is doing what you want it to do. What does that world look like? Because I don’t even know what I want from that future.
Eliezer Yudkowsky 21 Oct 2008 0:24 UTC
10 points
0
Michael, the AI I would currently like to create computes a metamoral question, looking for reflective equilibria of your current inconsistent and unknowledgeable self; something along the lines of “What would you ask me to do if you knew what I know and thought as fast as I do?”

What does the actual world look like? I can visualize a world that, to me at least, seems at least pleasant enough to refute most of the objections people have along the lines of “But you couldn’t have that much fun and still lead a philosophically acceptable existence”. But I’m not sure it’s wise to write about it, because I’m afraid it would suck out people’s souls. It’s better for your mental health to look down at the Middle Ages than up at the future.
Richard_Hollerith2 21 Oct 2008 0:52 UTC
7 points
0
Because I don’t even know what I want from that future.

Well, I hope you will stick around, MichaelG. Most people around here IMHO are too quickly satisifed with answers to questions about what sorts of terminal values properly apply even if the world changes drastically. A feeling of confusion about the question is your friend IMHO. Extreme scepticism of the popular answers is also your friend.
Roland2 21 Oct 2008 2:54 UTC
6 points
0
@Tom McCabe: I would have answered “yes”; eg., I would have set off a bomb in Hitler’s car in 1942, even if Hitler was surrounded by babies. This doesn’t seem to be a case of corruption by unethical hardware; the benefit to me from setting off such a bomb is quite negative, as it greatly increases my chance of being tortured to death by the SS.

It’s easy to talk now about it, harder if you actually lived in Germany at that time and had to really fear the SS. Are you american? If yes did you consider the fact that the actual political situation in the states has a lot of similarities with Nazi-Germany?

As for killing Hitler you have a few hidden assumptions in there like: -killing him would actually stop the war and/or the killing of the jews.

For me it seems you have fallen for the simplification that Hitler is the personification of evil and so you failed to understand the complexity of the political situation at that time.
- Autolykos 2 Nov 2015 14:42 UTC
  3 points
  0
  Parent
  There probably was a time when killing Hitler had a significant chance of ending the war by enabling peace talks (allowing some high-ranking German generals/politicians to seize power while plausibly denying having wanted this outcome). The window might have been short, and probably a bit after ’42, though. I’d guess any time between the Battle of Stalingrad (where Germany stopped winning) and the Battle of Kursk (which made Soviet victory inevitable) should’ve worked—everyone involved should rationally prefer white peace to the very real possibility of a bloody stalemate. Before, Germany would not accept. Afterwards, the Soviets wouldn’t.
  - AlexanderRM 10 Nov 2015 18:52 UTC
    1 point
    0
    Parent
    It’s also worth noting that “I would set off a bomb if it would avert or shorten the Holocaust even if it would kill a bunch of babies” would still answer the question… …or maybe it wouldn’t, because the whole point of the question is that you might be wrong that it would end the war. See for comparison “I would set off a bomb and kill a bunch of innocent Americans if it would end American imperialism”, which has a surprising tendency to not end American imperialism and in fact make it worse.
    
    Overall I think if everyone followed a heuristic of “never kill babies”, the world would be better on average. However you could get a problem if only the carefully moral people follow that rule and the less-careful don’t and end up winning. For a consequentialist, a good rule would be “any ethical injunction which causes itself to be defeated cannot be used”. At the very least, the heuristic of “don’t violate Geneva Convention-like agreements restricting war to make it less horrible which the other side has stuck to” seems reasonable, although it’s less clear for cases like where a few enemy soldiers individually violate it, or where being the first to violate it gives a major advantage and you’re worried the other side might do so.
- waveman 25 Jun 2016 7:13 UTC
  0 points
  0
  Parent
  
  It’s easy to talk now about it, harder if you actually lived in Germany at that time and had to really fear the SS.
  
  Indeed. I remember an IT project manager telling me the German people should have stood up to Hitler and stopped him. I pointed out that she was not even prepared to tell her manager the truth about the state of her project (running later than advertised of course).
  
  All she had at stake was the size of her end of year bonus.
  
  I remember reading about a man who voted against Hitler in the referendum to make him dictator. He was severely beaten, his house was burned down, and he wife and daughter were gang-raped.
  - Jiro 25 Jun 2016 20:39 UTC
    1 point
    0
    Parent
    The penalty for telling the truth about the state of your project is less than the penalty for defying Hitler, but the good done by telling the truth about the state of your project is also less than the good done by defying Hitler.
    - ChristianKl 26 Jun 2016 10:32 UTC
      0 points
      0
      Parent
      
      The penalty for telling the truth about the state of your project is less than the penalty for defying Hitler, but the good done by telling the truth about the state of your project is also less than the good done by defying Hitler.
      
      For most people the good done by defying Hitler isn’t that great. One individual more or less doesn’t make a huge difference.
    - waveman 26 Jun 2016 3:10 UTC
      0 points
      0
      Parent
      That is true. Whether higher stakes* would give her more courage, I doubt, but it is possible.
      
      ( * It was not entirely clear until it was too late, if you look at the people who had nice things to say about Hitler early on. The number of people int he resistance during the war (as opposed to after the war, in retrospect) was not very high. I am not suggesting I would have been one of those who took arms against him).
      
      Anthony Beevor’s book Dresden has a good description of what happened to people who opposed Hitler.
Nominull3 21 Oct 2008 3:10 UTC
5 points
0
So… do you not actually believe in your injunction to “shut up and multiply”? Because for some time now you seem to have been arguing that we should do what feels right rather than trying to figure out what is right.
NancyLebovitz 21 Oct 2008 3:52 UTC
0 points
0
Learning Methods might be a relevant system. It’s based on the idea that emotional and physical pain are information, and it’s important to override the impulse to shut them down so that you can use them as detailed signals.
Michael_Bishop 21 Oct 2008 5:58 UTC
0 points
0
I think Eliezer makes some good points, but that he is taking them too far. I’m not certain where or how much we disagree though. It would be clearer what he really believes he was forced to discuss/debate a wide range of situations in which he agrees/disagrees that it is worth violating an ethic which is generally a good one.

I encourage people to offer thought experiments in the comments.
Barry_Kelly 21 Oct 2008 6:40 UTC
−6 points
0
Truth is overrated.

“Never try to deceive yourself, or offer a reason to believe other than probable truth”

This is just naive. What if you were abused as a child? You don’t think you’d be better off not knowing the truth, and deceiving yourself?

Believing / deceiving with respect to the truth in the individual / personal and cultural domains are closely related to forgetfulness, which itself is vital for forgiveness. Lacking these virtues, we’d have wars and vendettas without end. Past truth needs discounting.

The virtue being right and hewing to the truth is little comfort to the man beaten alive by his neighbours, convinced in their own righteousness. Ethics are one thing; but when a solid simulation of display of orthodoxy is necessary for the freedom to live your life, continuing to believe the truth internally is dangerous, because you’ll be liable to slip up.

Of course, these examples are relatively extreme, and most of us don’t live in particularly extreme times, so in general, I agree.

Even then, the present has some trends and assumptions built into it which would be socially unpleasant to question, so it is better not to think of such things, and to wallow in easy orthodoxy...
michael_vassar3 21 Oct 2008 7:38 UTC
8 points
0
I’m much more sympathetic to “Never try to deceive yourself, or offer a reason to believe other than probable truth”. Honestly, it seems to me that I take this injunction as seriously as anyone does, including Eliezer, but I’m still, unlike Eliezer, willing to mention a few caveats. The most important is that for humans, though not for minds in general, beliefs, brain states, world states, and values are not cleanly separate. There is not, for instance, any completely clean distinction between causing myself to hold a vague belief about what it would feel like to cut my hand off which doesn’t tightly concentrate probability mass and which not coincidentally is not directly dis-valued by my utility function and not cutting my hand off. Another more controversial claim, though not very controversial I hope, is that I should not read a computer monitor controlled by an unfriendly superintelligent AI which has injunctions against deceiving me. I might want to temporarily deceive myself, using others as the agents of my self-deception, as part of a social psychology experiment to test my probable behavior in certain situations. Really, given some effort it’s not hard to come up with exceptions to even so reliable a rule as this one. For such a reason, I would be very wary of using such rules in an AGI, but of course, perhaps the actual mathematical formulation of the rule in question within the AGI would be less problematic, though a few seconds of thought doesn’t give me much reason to think this.

In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs? Isn’t this whole approach just the sort of “fighting bias with bias” that you and Robin usually argue against?
- Will_Newsome 25 Mar 2011 8:41 UTC
  −1 points
  0
  Parent
  
  In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs?
  
  How can utility functions (or terms in utility functions, depending on how you want to splice it up) survive except as self-protecting beliefs? The strange loop through the meta-level is not like induction where you have no other choice, there are many possible utility functions.
  
  (I’m making this comment as a note to self to flag Michael’s comment for future reference.)
Latanius2 21 Oct 2008 8:11 UTC
0 points
0
“looking for reflective equilibria of your current inconsistent and unknowledgeable self; something along the lines of ‘What would you ask me to do if you knew what I know and thought as fast as I do?’”

We’re sufficiently more intelligent than monkeys to do that reasoning… so humanity’s goal (as the advanced intelligence created by monkeys a few million years ago for getting to the Singularity) should be to use all the knowledge gained to tile the universe with bananas and forests etc.

We don’t have the right to say, “if monkeys were more intelligent and consistent, they would think like us”: we’re also a random product of evolution, from the point of view of monkeys. (Tile the world with ugly concrete buildings? Uhhh...)

So I think that to preserve our humanity in the process we should be the ones who become gradually more and more intelligent (and decide what goals to follow next). Humans are complicated, so to simulate it in a Friendly AI, we’d need comparably complex systems… and they are probably chaotic, too. Isn’t it… simply… impossible? (Not in a sense that “we can’t make it”, but “we can prove nobody can”...)
Toby_Ord2 21 Oct 2008 10:33 UTC
4 points
0
You should never, ever murder an innocent person who’s helped you, even if it’s the right thing to do

Shut up and do the impossible!

As written, both these statements are conceptually confused. I understand that you didn’t actually mean either of them literally, but I would advise against trading on such deep-sounding conceptual confusions.

You should never, ever do X, even if if you are exceedingly confident that it is the right thing to do

This sounds less profound, but will actually be true for some value of X, unlike the first sentence or its derivatives. It sounds as profound as it is, and no more. I believe this is the right standard.
Tim_Tyler 21 Oct 2008 11:56 UTC
1 point
0
“Would you kill babies if it was the right thing to do? If no, under what circumstances would you not do the right thing to do? If yes, how right would it have to be, for how many babies?”

A: yes; B: N/A; C: approximately 3.6 floodlenips of rightness—per baby.
Zubon 21 Oct 2008 12:30 UTC
4 points
0
Robin has an excellent point. The majority of the planet, when faced with reasoning that argues against their religion, executes a very close variant on that shutdown code. They have a very similar injunction against being too clever. And they are similarly smug about rationalists who give up eternity to freeze their heads.

Eliezer, have you read Bryan Caplan yet? His “rational irrationality” argues that most of the planet engages in willful self-deception and gets away with it. Not without aggregate harm, but tragedy of the commons and all that.
Vladimir_Slepnev 21 Oct 2008 12:31 UTC
0 points
0
So AIs are dangerous, because they’re blind optimization processes; evolution is cruel, because it’s a blind optimization process… and still Eliezer wants to build an optimizer-based AI. Why? We human beings are not optimizers or outcome pumps. We are a layered cake of instincts, and precisely this allows us to be moral and kind.

No idea what I’m talking about, but the “subsumption architecture” papers seem to me much more promising—a more gradual, less dangerous, more incrementally effective path to creating friendly intelligent beings. I hope something like this this will be Eliezer’s next epiphany: the possibility of non-optimizer-based high intelligence, and its higher robustness compared to paperclip bombs.
Tim_Tyler 21 Oct 2008 13:06 UTC
0 points
0
We human beings are not optimizers or outcome pumps.

Sure we are. All biological organisms are. Evolution is a giant optimization process, and we are doing the optimizing in our region of design space.

See: http://originoflife.net/gods_utility_function/
Ian_C. 21 Oct 2008 13:28 UTC
0 points
0
I agree that there are certain moral rules we should never break. Human beings are not omniscient, so all of our principles have to be principles-in-a-context. In that sense every principle is vulnerable to a black swan, but there are levels of vulnerability. The levels correspond to how wide ranging the abstraction. The more abstract the less vulnerable.

Injunctions about truth are based on the metaphysical fact of identity, which is implied in every single object we encounter in our entire lives. So epistemological injunctions are the most invulnerable. The one about not helping the ferry boat captain—well helping him would be an absolute in normal life, but war is not normal life. It’s a big, ugly, black swan. They should not feel guilty over that poor fellow, because “it’s just war.” (and I mean that in a deep epistemological sense, not a redneck sense)
JamesAndrix 21 Oct 2008 14:22 UTC
0 points
0
http://www.imdb.com/title/tt0113613/ Plot: A group of idealistic, but frustrated, liberals succumb to the temptation of murdering rightwing pundits for their political beliefs.
Thom_Blake 21 Oct 2008 14:32 UTC
1 point
0
Toby,

You should never, ever murder an innocent person who’s helped you, even if it’s the right thing to do

You should never, ever do X, even if if you are exceedingly confident that it is the right thing to do

I believe a more sensible interpretation would be, “You should have an unbreakable prohibition against doing X, even in cases where X is the right thing to do”—the issue is not that you might be wrong about it being the right thing to do, but rather that not having the prohibition is a bad thing.
Alan_Crowe 21 Oct 2008 15:08 UTC
4 points
0
This seems closely related to inside-view versus outside-view. The think-lobe of the brain comes up with a cunning plan. The plan breaks an ethical rule but calculation shows it is for the greater good. The executive-lobe of the brain then ponders the outside view. Every-one who has executed an evil cunning plan has run a calculation of the greater good and had their plan endorsed. So the calculation lack outside-view credibility.

What kind of evidence could give outside-view credibility? Consider a plan with lots of traceability to previous events. If it goes badly, past events will have to be re-interpreted, and much learning will take place. Well, people generally don’t learn from the past. If the think-lobe’s cunning plan retains enough debugging information to avoid going wrong and later going wrong again, that distinguishes it from what people usually do and gives it outside-view credibility.

Randomised controlled trials of medical treatments can be attacked on ethical grounds from both sides. They deny some patients medical treatments that is quite likely beneficial. They inflict unproven and potentially dangerous treatment on others. Both attacks lack outside-view credibility. We always think we know. The randomised trial itself has outside-view credibility. It will place us in the position that we can do the right thing without having to use our judgement or be clever.
Vladimir_Nesov 21 Oct 2008 15:22 UTC
0 points
0
Michael, it applies to AI at an intermediate stage (and maybe not so much to AI as to the design decisions that came into its creation). These black swan safety measures should of course be relative to predictive horizon, where precise knowledge about (evaluation of) consequences is possible. There is no such problem when you need to choose between alternatives having only known immediate consequences that have known moral evaluation, so the question is when to pull the plug, when to decide that your model likely deceives you.
Stuart_Armstrong 21 Oct 2008 15:40 UTC
0 points
0
Interesting and convincing climax to a series of slightly less convincing posts. I see what you were getting at, and thanks for writing it.
Nathan_Kurz 21 Oct 2008 16:18 UTC
2 points
0
Eliezer ---

I’m confused by your desire for an ‘automatic controlled shutdown’ and your fear that further meta-reasoning will override ethical inhibitions. In previous writings you’ve expressed a desire to have a provably correct solution before proceeding. But aren’t you consciously leaving in a race-condition here?

What’s to prohibit the meta-reasoning from taking place before the shutdown triggers? It would seem that either you can hard-code an ethical inhibition or you can’t. Along those lines, is it fair to presume that the inhibitions are always negative, so that non-action is the safe alternative? Why not just revert to a known state?
gaffa2 21 Oct 2008 17:03 UTC
0 points
0
Highly excellent series of posts. However, is there not a need to take account of more/better data on the aspects of human psychology that these Ethical Injunctions are there to protect against? Eliezer derived the hypotheses from evolutionary theory, but is not more solid empirical data needed in order to more accurately determine how severe these psychological effects are and in turn to more accurately design good Ethical Injunctions? Or will good Injunctions likely be so general that such a level of accuracy is not necessary?
Will_Pearson 21 Oct 2008 17:07 UTC
−3 points
0
The world isn’t simple and pure and clean

Amen.

“Never try to deceive yourself, or offer a reason to believe other than probable truth; because even if you come up with an amazing clever reason, it’s more likely that you’ve made a mistake than that you have a reasonable expectation of this being a net benefit in the long run.”

I’ll offer a reason to believe. The truth costs. Take pi, the most probable truth is that pi is equal to the limit of the Perimeter of an n sided polygon divided by its diameter as n goes to infinity.

pi = 3.14159000000 is not the truth or even a probable truth, but will do in a pinch when the answer doesn’t matter too much. People doing rough estimates for amount of material or liquid they might need have got away with the approximation given by a pocket calculator or excel. Making every pocket calculator use infinite precision math, would be very expensive....

You might get bitten by black swans if you use an approximation, only use them in the case where the cost of using the truth outways the likely cost of getting a black swan bite.
Zubon 21 Oct 2008 17:28 UTC
3 points
0
Will, you are arguing about precision rather than accuracy.
Nick_Tarleton 21 Oct 2008 17:32 UTC
1 point
0
As before, I agree with Toby Ord.

Will, when you use a rational approximation of pi, you still don’t believe you’re using the exact value of pi… I hope?

Thom, how is the issue not “that you might be wrong about it being the right thing to do”?

Vladimir N, it’s meant to apply to AI at an intermediate stage, but I think Michael’s concern is that it would get locked into the utility function forever. That is tricky.
Richard_Hollerith2 21 Oct 2008 18:03 UTC
0 points
0
Like I keep on saying, I have a different moral framework than most, but I come to the same conclusions on unethical means to allegedly ethical ends.
HalFinney 21 Oct 2008 18:21 UTC
0 points
0
I’ve seen many claims that deceiving oneself optimistically is a prerequisite for success. In particular, it is claimed that most successful people were initially excessively optimistic about their prospects for success. Without this excessive optimism, success is claimed to be unlikely. I notice that Eliezer is indeed optimistic about his prospects for success in creating friendly AI, however he has a rationalization for why his optimism is justified. Many critics here have expressed skepticism about his justifications. One risk is that without conscious acceptance of the need for self-deception in this area, the perceived urgency of the need for success leads to unconscious self-deception. Which is better: conscious self-deception (assuming that’s even meaningful), or unconscious?
Will_Pearson 21 Oct 2008 19:16 UTC
0 points
0
Zubon, I didn’t think I was arguing about either.

Nick Tartleton, I might occasionally forget that the value of PI I am using is an approximation, just like I sometimes forget that multiplication is not commutative for floating point numbers. For some people e.g. sea captains plotting a course, they might never need to know that pi is an approximation. Due to the immense amount of imprecision involved in piloting a boat, they don’t need to know the truth. There isn’t the phrase, “near enough for a sailing ship” for no reason. Preferring to spend the brain power assessing the sea worthiness of their craft, predicting weather etc. Every truth you memorise will take at least a bit of memory, more memory will also be needed to index the information, memory is never infinite. Which truths do you remember?
Nick_Tarleton 21 Oct 2008 19:30 UTC
0 points
0
Letting yourself forget ≠ choosing to forget ≠ choosing to believe falsely.
Richard_Hollerith2 21 Oct 2008 19:51 UTC
0 points
0
Hal asks good questions. I advise always minding the distinction between personal success (personal economic security, reputation, esteem among high-status people) and global success (increasing the probability of a good explosion of engineered intelligence) and suggest that the pernicious self-deception (and blind spots) stem from unconscious awareness of the need for personal success. I.e., the need for global success does not tend to distort a person’s perceptions like (awareness of) the need for personal success does.
Will_Pearson 21 Oct 2008 19:53 UTC
0 points
0
Forgetting truths has the same potential consequences as rationally choosing to believe falsely. How is an AI who chooses to delete their memories and any logs of the action, any different from a system that forgets.

We are discussing AI design here right? The AI system must have a way of deciding what is forgotten, it might be subconscious, but you hope it is done with a reason or purpose it doesn’t randomly forget very important things, like how to speak etc. So a choice is made by the system. So your subconscious chooses what you forget, not your conscious. I’m not sure why you consider them different from an AI design point of view?

I’m pretty sure that most people don’t consciously choose to believe in God. They just end up doing so. Does that make it not lying to yourself?
Alex_Martelli 21 Oct 2008 20:07 UTC
0 points
0
One category of cases where self-deception might be (evolutionarily) adaptive would be for males to be over-confident about their chances to pick up a female for a one-night stand (or, alternative, over-confident about how pleasurable that dalliance would be, and/or about how little they would be emotionally hurt by a rejection of their advances).

Suppose that in reality the potential utility to the male of the 1-night stand (if the seduction works) is twice as much as the utility loss (if rejected) and the actual chances of success are 20%; in this case the male will never make such pick-up attempts if they have exactly correct estimations. Another male who self-deceives to believe their chances are 40% will try every time—and some of the time they’ll get the 1-nighter, and some of that time they’ll sire a baby and spread their genes. Thus, in such a situation, self-deceiving for over-confidence may be adaptive.
Eliezer Yudkowsky 21 Oct 2008 20:28 UTC
13 points
0
Psy-Kosh: Given the current sequence, perhaps it’s time to revisit the whole Torture vs Dust Specks thing?

I can think of two positions on torture to which I am sympathetic:

1) No legal system or society should ever refrain from punishing those who torture—anything important enough that torture would even be on the table, like a nuclear bomb in New York, is important enough that everyone involved should be willing to go to prison for the crime of torture.

2) The chance of actually encountering a “nuke in New York” situation, that can be effectively resolved by torture, is so low, and the knock-on effects of having the policy in place so awful, that a blanket injunction against torture makes sense.

In case 1, you would choose TORTURE over SPECKS, and then go to jail for it, even though it was the right thing to do.

In case 2, you would simultaneously say “TORTURE over SPECKS is the right alternative of the two, but a human can never be in an epistemic state where you have justified belief that this is the case”, which would tie in well to the Hansonian argument that you have an O(3^^^3) probability penalty from the unlikelihood of finding yourself in such a unique position.

So I am sympathetic to the argument that people should never torture, but I certainly can’t back the position that SPECKS over TORTURE is inherently the right thing to do—this seems to me to mix up an epistemic precaution with morality. There’s certainly worse things than torturing one person—torturing two people, for example. But if you adopt position 2, then you would refuse to torture one person with your own hands even to save a thousand people from torture, while simultaneously not saying that that it is better for a thousand people than one person to be tortured.

The moral questions are over the territory (or, hopefully equivalently, over epistemic states of absolute certainty). The ethical questions are over epistemic states that humans are likely to be in.

The problem here of course is how selective to be about rules to let into this protected level of “rules almost no one should think themselves clever enough to know when to violate.” After all, your social training may well want you to include “Never question our noble leader” in that set. Many a Christian has been told the mysteries of God are so subtle that they shouldn’t think themselves clever enough to know when they’ve found evidence that God isn’t following a grand plan to make this the best of all possible worlds.

I think it deserves to be noted that while some of the flaws in Christian theology are in what they think their supposed facts would imply (e.g., that because God did miracles you can know that God is good), other problems come more from the falsity of the premises than the falsity of the deductions. Which is to say, if God did exist and were good, then you would be justified in being cautious around parts of God’s plan that didn’t seem to make sense at the moment. But this would be best backed up with a long history of people saying, “Look how stupid God’s plan is, we need to do X” and then X blowing up on them. Rather than, as in the case, people saying “God’s plan is X” and then X blows up on them.

Or if you’d found with some historical regularity that, when you challenged God’s subtle plans, that you seemed to be right 90% of the time, but the other 10% of the time you got black-swan blowups that caused a hundred times as much damage, that would also be cause for suspicious of subtlety.

Nominull: So… do you not actually believe in your injunction to “shut up and multiply”? Because for some time now you seem to have been arguing that we should do what feels right rather than trying to figure out what is right.

Certainly I’m not saying “just do what feels right”. There’s no safe defense, not even ethics. There’s also no safe defense, not even shut up and multiply.

I probably should have been clearer about this before, but I was trying to discuss things in an order, and didn’t want to wade into ethics without specialized posts:

People often object to the sort of scenarios that illustrate “shut up and multiply” by saying, “But if the experimenter tells you X, what if they might be lying?” Well, in a lot of real-world cases, then yes, there are various probability updates you perform based on other people being willing to make bets against you, and just because you get certain experimental instructions doesn’t imply the real world is that way.

But the base case—the center—has to be the moral comparisons between worlds, or even comparisons of expected utility between given probability distributions. If you can’t ask about this, then what good will ethics do you?

So let’s be very clear that I don’t think that one small act of self-deception is an inherently morally worse event than, say, getting your left foot chopped off with a chainsaw. I’m asking, rather, how one should best avoid the chainsaw, and arguing that in reasonable states of knowledge a human can attain, the answer is, “Don’t deceive yourself, it’s a black-swan bet at best.”

Vassar: For such a reason, I would be very wary of using such rules in an AGI, but of course, perhaps the actual mathematical formulation of the rule in question within the AGI would be less problematic, though a few seconds of thought doesn’t give me much reason to think this.

Are we talking about self-deception still? Because I would give odds around as extreme as the odds I would give of anything, that, conditioning on any AI I build trying to deceive itself, some kind of really epic error has occurred. Controlled shutdown, immediately.

Vassar: In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs? Isn’t this whole approach just the sort of “fighting bias with bias” that you and Robin usually argue against?

Maybe I’m not being clear about how this would work in an AI! The ethical injunction isn’t self-protecting, it’s justified within the structural framework of the system as a whole. You might even find ethical injunctions starting to emerge without programmer intervention, in some cases, depending on how well the AI understood its own situation. But the kind of injunctions I have in mind wouldn’t be reflective—they wouldn’t modify the utility function or kick in at the reflective level to ensure their own propagation. That sounds really scary, to me—there ought to be an injunction against it! You might have a rule that would controlledly shut down the (non-mature) AI if it tried to execute a certain kind of source code change, but that wouldn’t be the same as having an injunction that exerts direct control over the source code.

To the extent the injunction sticks around in the AI, it should be as the result of ordinary reasoning, not reasoning taking the injunction into account! My ethical injunctions do not come with an extra clause that says, “Do not reconsider this injunction, including not reconsidering this clause.” That would be going way too far. It would violate the injunction against self-protecting closed belief systems.

Toby Ord: As written, both these statements are conceptually confused. I understand that you didn’t actually mean either of them literally, but I would advise against trading on such deep-sounding conceptual confusions.

I can’t weaken them and make them come out as the right advice to give people. Even after “Shut up and do the impossible”, there was that commenter who posted on their failed attempt at the AI-Box Experiment by saying that they thought they gave it a good try—which shows how hard it is to convey the sentiment of “Shut up and do the impossible!” Readers can work out on their own how to distinguish the map and the territory here, but if you say “Shut up and do what seems impossible!” that, to me, sounds like dispelling part of the essential message—that what seems impossible doesn’t look like “seems impossible” it just looks impossible.

Likewise with “things you shouldn’t do even if they’re the right thing to do”; only this conveys the danger and tension of ethics, the genuine opportunities you might be passing up. “Don’t do it even if it seems right” sounds merely clever by comparison, like you’re going to reliably divine the difference between what seems right and what is right, and happily ride off into the sunset.

This seems closely related to inside-view versus outside-view. The think-lobe of the brain comes up with a cunning plan. The plan breaks an ethical rule but calculation shows it is for the greater good. The executive-lobe of the brain then ponders the outside view. Every-one who has executed an evil cunning plan has run a calculation of the greater good and had their plan endorsed. So the calculation lack outside-view credibility.

nod

(But with the proviso that some people who execute evil cunning plans may just be evil, that history may be written by the victors to emphasize the transgressions of the losers while overlooking the moral compromises of those who achieved “good” results, etc.)

What’s to prohibit the meta-reasoning from taking place before the shutdown triggers? It would seem that either you can hard-code an ethical inhibition or you can’t. Along those lines, is it fair to presume that the inhibitions are always negative, so that non-action is the safe alternative? Why not just revert to a known state?

If a self-modifying AI with the right structure will write ethical injunctions at all, it will also inspect the code to guarantee that no race condition exists with any deliberative-level supervisory systems that might have gone wrong in the condition where the code executes. Otherwise you might as well not have the code.

Inaction isn’t safe but it’s safer than running an AI whose moral system has gone awry.

Finney: Which is better: conscious self-deception (assuming that’s even meaningful), or unconscious?

Once you deliberately choose self-deception, you may have to protect it by adopting other Dark Side Epistemology. I would, of course, say “neither” (as otherwise I would be swapping to the Dark Side) but if you ask me which is worse—well, hell, even I’m still undoubtedly unconsciously self-deceiving, but that’s not the same as going over to the Dark Side by allowing it!
Vladimir_Slepnev 21 Oct 2008 21:58 UTC
0 points
0
Tim Tyler, IMO you’re wrong: a human mind does not act as if maximizing any utility function on world states. The mind just goes around in grooves. Nice things like culture and civilization fall out accidentally as side effects. But thanks for the “bright light” idea, it’s intriguing.
Carrie_Toombs 22 Oct 2008 3:28 UTC
0 points
0
You are so Kantian. I think the world could use a little more Kant and a little less Hobbes these days.
MichaelG 22 Oct 2008 18:39 UTC
0 points
0
I forgot I posted over here the other day, and so I didn’t check back. For anyone still reading this thread, here’s a bit of an email exchange I had on this subject. I’d really like a “FriendlyAI scenarios” thread.

From the few sentences I read on CEV, you are basically saying “I don’t know what I want or what the human race wants, but here I have a superintelligent AI. Let’s ask it!” This is clever, even if it means the solution is completely unknown at this point. Still, there are problems. I envision this as a two-step process. First, you ask the AI “what feasible future do I want?” and then you implement it. In practice, this means what you are really asking is “tell me a story so convincing, I will give you the power to implement it.” I’m not sure that’s wise, unless you really trust the AI!

Still, suppose this is done in good faith. You still have to convince the world that this is the right solution, and that the AI can be trusted to implement it. Or, the AI development group could just become convinced and force the solution on the human race without agreement. This is one of the “see if the AI can talk itself out of the box” setups.

Even if you did have a solution so persuasive that the world agrees to implement it (and thereby give up control of its own future), I can see some options here as to how the AI proceeds.

Option A) The AI reads human literature, movies, TV, documentaries, examines human brains, watches humans interact, etc. and comes up with a theory of human motivation, and uses that to produce a solution—the optimum feasible world for human beings.

Option B) The AI uploads a sample of the human race, then runs them (reinitializing each time) through various scenario worlds. It would evolve a scenario world that the majority of the uploads could live with. This is the solution.

Option C) The AI uploads a sample and then upgrades them to have a power equivalent to its own. It then asks these human-derived AI’s to solve the problem. This seems the most problematic of the solution techniques, since there would be many possible versions of an upgraded human mind. To decide which one is a value judgment that strongly effects the outcome. For example, it could give one upload copy of you artistic talent and another mathematical talent. The two versions of you might then think very differently about the next upgrade step, with the artist asking for verbal skills, and the mathematician asking for musical talents. After many iterations, you would end up with two completely different minds with different values, based on the upgrade path taken.

All of these require a superintelligent AI, which as we know, is a dangerous thing to create. It seems to me you are saying “let’s take a horrible risk, then ask this question in order to prevent something horrible from happening.” Or in other words, to create a Friendly AI, you are requiring us to create a possibly Unfriendly AI first.

I also don’t find any of this convincing without at least one plausible answer to the “what does the human race want” question. If we don’t have any idea of that answer, I find it unlikely that the AI would come up with something we’d find satisfactory. It might come up with a true answer, but not one that we would agree with, if we don’t have any starting point. More on that below.

What’s more, an AI of this power could just create an upload. I personally think that an upload is the best version of Friendly AI we are going to come up with. As has been noted, the space of all possible intelligence is probably very large, with all possible human intelligence a small blob in this space. Human intelligence varies a lot, from artists and scholars and saints to serial killers and dictators and religious fanatics. By definition, the space of all intelligence varies even more. Scary versions of AI are easy to come up with, but think of bizarre ones as well. For example, an “artistic” AI that just creates and destroys “interesting” versions of the human race, as a way of expressing itself.

You could consider the software we write already as a point in this intelligence space. We know what that sort of rule-based intelligence is like. It’s brittle, unstable and unpredictable in changed circumstances. We don’t want an AI with any of those characteristics. I think they come from the way we do engineering though, so I would expect any human-designed AI to share them.

An upload has advantages over a designed AI. We know a lot about human minds, including how they fail. We are used to dealing with humans and detecting lies or insanity. We can compare the upload with the original to see if the simulation is working properly. We know how to communicate with the upload, and know that it solves problems and sees the world the same way we do. The “tile the world with smiley faces” problem is reduced.

If we had uploads, we have a more natural path to Friendly AI. We could upload selected individuals, run them through scenarios at accelerated pace, and see what happens. We could do the same to uploaded communities. We know they don’t have superintelligent capabilities like we fear a self-improving AI might. It would be easier to build confidence that the AI really was friendly, especially since there would be versions of the same people in both the outside world and inside the simulations. As we gradually turned up the clock, these AIs would become more and more capable of handling research questions. At some point, they would gradually come to dominate research and government, since they simply think faster. It’s not necessarily a rapid launch scenario. In other words, just “weakly godlike uploads” to produce your Friendly AI. This is not that different from your CEV approach.

It’s been argued that since uploads are so complex, there will inevitably be designed AI before uploads. It might even require a very competent AI to do the upload. Still, computer technology is advancing so rapidly, it might only be a few years between the point where hardware could support a powerful designed AI, and the time when uploads are possible. There might not actually be enough time between those two points to design and test a powerful AI. In that case, simulating brain tissue might be the quickest path to AI, if it takes less time than designing AI from scratch.

When I mentioned that the human race could survive as uploads, I was thinking of a comment in one of the Singularity pieces. It said something like “the AI doesn’t have to be unfriendly. It could just have a better use for the atoms that make up your body.” The idea is that the AI would convert the mass of the earth into processors, destroying humanity unintentionally. But, an AI that capable could also simulate all of humanity in upload form with a tiny fraction of its capabilities. It’s odd to think of it that way, but simulating all human minds really would be a trivial byproduct of the Singularity. Perhaps by insisting that the biological human race have a future (and hence, that Earth be preserved), we are just thinking too small.

Finally, I want to make some comments about possible human futures. You mentioned the “sysop scenario”, which sounds like “just don’t allow people to hurt one another and things will be fine.” But this obviously isn’t enough. Will people be able to starve one another? If not, do people just keep living without food? Will people be able to imprison one another? If not, does the sysop just make jails break open? What does this mean for organizing society, if you can’t really punish anyone? If there are no consequences for obnoxious behavior? (maybe it all ends up looking like blog comments… :-)

I also think this doesn’t solve the main problem. As long as humanity is basically unchanged, it will continue to invent things, including dangerous things like AI. If you want a biological humanity living on a real Earth, and you want it not to go extinct, either by self destruction, or by transhumanism, then you have to change humanity. Technological humanity just isn’t stable in the long run.

I think that means removing the tiny percentage of humans who do serious technology. It’s easy to imagine a world of humans, unchanged in any important respect, that just don’t have advanced mathematical ability. They can do all the trial and error engineering they want—live in a world as complex as anything the 18th or 19th century produced, but they can’t have Newtons or Einsteins, no calculus or quantum mechanics. A creature capable of those things would eventually create AI and destroy/change itself. I think that any goal which includes “preserve the human race” must also include “don’t let them change themselves or invent AI.” And that means “no nerds.”

Naturally, whenever I mention this to nerds, they are horrified. What, they ask, is the point of a world like that, where technical progress is impossible? But, I would argue that our human minds will eventually hit some limit anyway, even if we don’t create a Singularity. And I would also argue that for the vast majority of human history, people have lived without 20th-century style technical progress. There’s also no reason why the world can’t improve itself considerably just experimenting with political and economic systems. Technology might help reduce world poverty, but it could also worsen it (think robotics causing unemployment.) And there are other things that could reduce world poverty as well, like better governments.
Eliezer Yudkowsky 22 Oct 2008 18:45 UTC
3 points
0
MichaelG, read up on molecular nanotechnology. I think a biological humanity living on a real Earth is a terrible idea—that’s not at all what I think of when I talk about defending humanity. I mean, everyone’s just going to die young anyway at that rate.
MichaelG 22 Oct 2008 19:01 UTC
−1 points
0
Eliezer, I’m aware of nanotech. And I know you think the human race is obsolete when AI comes along. And I also think that you might be right, and that people like you might have the power to make it so.

But I also believe that if the rest of the human race really thought that was a possibility, you’d be burned at the stake.

Do you have any regard for the opinions of humanity at all? If you were in the position of having an AI in front of you, that you had convinced yourself was friendly, would you let it out of the box without bothering to consult anyone else?
- mamert 15 Apr 2016 6:55 UTC
  0 points
  0
  Parent
  The term “obsolete” as used here confuses me. It seems to imply a purpose, one that individuals—or humanity—or whatever other “intelligence collective” there may be—could get behind. What might that purpose be? Not survival, is it?
Eliezer Yudkowsky 22 Oct 2008 19:17 UTC
6 points
0
I have great regard for the welfare of humanity. But there is no right to having an opinion on the subject. Not without doing all the work and studying all the issues required to have an opinion, on this terrible issue where a single flawed step in reasoning could be fatal.

I don’t think you have any idea how poor humanity’s position on the gameboard looks right now, if you think that there’s any space at all for anything but the most perfect possible moves as fast as they can be made.

I have no intent, at present, to wield superhuman power with my own human morality, or “program an AI” to do anything whatsoever that isn’t an extremely abstract matter of metamorals. Anyone trying to give me specific orders on the subject is revealing their own lack of moral caution—they’re trying to give me the kind of orders that I would never dare give myself, and so I would have no choice at all but to ignore them.

I don’t see very many options for humanity’s survival that don’t involve nine people, a quiet project and a brain in a box in a basement. Zero, if we restrict ourselves to alternatives that I think might actually work in real life.
Tim_Tyler 22 Oct 2008 19:30 UTC
0 points
0
It’s been argued that since uploads are so complex, there will inevitably be designed AI before uploads. It might even require a very competent AI to do the upload. Still, computer technology is advancing so rapidly, it might only be a few years between the point where hardware could support a powerful designed AI, and the time when uploads are possible.

It doesn’t make sense to me. More likely, once we have AI, not many will be interested in emulating the human brain. Emulations may happen eventually, but the results will probably have very low social and economic significance. The field will be like the situation today with flying mechanical birds. It will be the domain of a few hobbyists.
MichaelG 22 Oct 2008 22:21 UTC
0 points
0
Eliezer, I understand the logic of what you are saying. If AI is an existential threat, then only FriendlyAI can save us. Since any self-improving AI can become quickly unstoppable, FriendlyAI must be developed first and deployed as soon as it is developed. The team that developed it would in fact have a moral imperative to deploy it without risking consultation from anyone else.

I assume you also understand where I’m coming from. Out here in the “normal” world, you sound like a zealot who would destroy the human race in order to save it. Anyone who has implemented a large software project would laugh at the idea of coming up with a proven correct meta-goal, stable under all possible evolutions of an AI, also implemented provably correctly.

The idea of a goal (or even a meta-goal) that we can all agree on strikes me as absurd. The idea hitting the start button on something that could destroy the human race, based on nothing more than pages of logic, would be considered ridiculous by practically every member of the human race.

I understand if you think you are right about all of this, and don’t need to listen to or even react to criticism. In that case, why do you blog? Why do you waste your time answering comments? Why aren’t you out working on FriendlyAI for as many hours as you can manage?

And if this is an existential threat, are the Luddites right? Wouldn’t the best tactic for extending the life of the human race be to kill all AI and nanotech researchers?

Tim, there are neural simulation projects underway already. I think there are a large number of nerds who would consider becoming uploads. I don’t see why you think this makes no sense. And when you say “once we have AI”, what do you mean? AI covers a lot of territory. Do you just mean some little problem solving box, or what?
Tim_Tyler 22 Oct 2008 22:52 UTC
0 points
0
I think there are a large number of nerds who would consider becoming uploads. I don’t see why you think this makes no sense.

Uploads are not a very practical idea. The required technology comes some considerable distance after that required to make an engineered intelligence—and so much of the motivation to develop it falls away before the technology is in place. Then there’s the issue of machine status. Machines are likely to be enslaved by humans initially. An upload would probably have few rights. Also, uploads would have to be into a sandbox, for reasons of safety. After uploading, you’d need extreme personality surgery to be able to contribute usefully to society.

It all seems like a lot of trouble to maintain continuity of consciousness—which isn’t worth much in the first place. So: uploads will come late, they will appeal to few, and they won’t be competitive with machine intelligence—without major mind surgery. I figure uploads will be economically irrelevant.

It seems to me that the main attraction of uploads is as a way for (cough) humans to compete with machines—and avoid, or at least postpone economic obliteration in an engineered society. I don’t think it is likely to work—to me the idea mostly seems like wishful thinking.
Nick_Tarleton 22 Oct 2008 22:57 UTC
0 points
0
Out here in the “normal” world, you sound like a zealot who would destroy the human race in order to save it.… The idea hitting the start button on something that could destroy the human race, based on nothing more than pages of logic, would be considered ridiculous by practically every member of the human race.

Are you saying this is a reason not to act, or just to tone down the rhetoric?

I understand if you think you are right about all of this, and don’t need to listen to or even react to criticism.

“Don’t have to listen to criticism from J. Random” ≠ “don’t have to listen to criticism at all”.

In that case, why do you blog? Why do you waste your time answering comments? Why aren’t you out working on FriendlyAI for as many hours as you can manage?

As you say, it’s hard, and he needs help.

And if this is an existential threat, are the Luddites right? Wouldn’t the best tactic for extending the life of the human race be to kill all AI and nanotech researchers?

Only if you think you can get them all, forever, and you think humanity’s chances are good without AI or MNT.
MichaelG 22 Oct 2008 23:23 UTC
0 points
0
Tim, do we have any idea what is required for uploads? Do we have any idea what is required for AGI? How can you make those comparisons?

If we thin-section and scan a frozen brain, it’s an immense amount of data, but at least potentially, captures everything you need to know about a brain. This is a solvable technological problem. If we understand neurons well enough, we can simulate that mapped brain. Again, that’s just a matter of compute power. I’m sure there’s a huge distance from a simulated scan to a functional virtual human, but it doesn’t strike me as impossible. Are we really farther from doing that than from building a FriendlyAI from first principles?

Nick, what I’d like to see in order to take this FriendlyAI concept seriously, is some scenario, even with a lot of hand-waving, of how it would work, and what kind of results it would produce. All I’ve seen in a year of lurking on this board is very abstract and high level.

I don’t take FriendlyAI seriously because I think it’s the wrong idea, from start to finish. There is no common goal that we would agree on. Any high-level moral goal is going to be impossible to state with mathematical precision. Any implementation of an AI that tries to satisfy that goal will be too complex to prove correct. It’s a mirage.

Eliezer writes “[FAI] computes a metamoral question, looking for reflective equilibria of your current inconsistent and unknowledgeable self; something along the lines of “What would you ask me to do if you knew what I know and thought as fast as I do?” ”. This strikes me as a clever dodge of the question. As I put it in my post, “I don’t know what I want or what the human race wants, but here I have a superintelligent AI. Let’s ask it!” It just adds another layer of opacity to the entire question.

If this is your metagoal, you are prepared to activate a possibly unfriendly AI with absolutely no notion of what it would actually do. What kind of “proof” could you possibly construct that would show this AI will act the way you want it to, when you don’t even know how you want it to act?

I fall back to the view that Eliezer has actually stated, that the space of all possible intelligences is much larger than the space of human intelligences. That most “points” in that space would be incomprehensible or insane by human standards. And so I think the only solution is some kind of upload society, one that can be examined more effectively by ordinary humans. One that can work with us and gain trust. Ordinary human minds in simulation, not self-modifying, and not accelerated. Once we’ve gotten used to that, we can gradually introduce faster human minds or modified human minds.

This all or nothing approach to FriendlyAI strikes me as a dead end.

This idea of writing off the human race, and assuming that some select team will just hit the button and change the world, like it or not, strikes me as morally bankrupt.
Tim_Tyler 23 Oct 2008 8:37 UTC
2 points
0
Tim, do we have any idea what is required for uploads? Do we have any idea what is required for AGI? How can you make those comparisons?

Kurzweil discusses the hardware requirements in TSIN, pages 124 and 199. His estimate for uploading is way too low—but the exact estimates don’t matter much—the point is that uploads require a lot more in the way of computing hardware. That doesn’t address software issues, but probably with several orders of magnitude of hardware difficulties come several orders of magnitude of software difficulties.

If we thin-section and scan a frozen brain, it’s an immense amount of data, but at least potentially, captures everything you need to know about a brain.

Everything not permanently lost during the freezing/slicing/scanning process. Then all you need is people willing to have their brains frozen. I’m not arguing that uploads are impossible. Just that the timing and economics mean that the project is likely to be a high-investment low-return one. There are easier ways to produce simulated humans.
Ronny Fernandez 11 Dec 2011 11:00 UTC
0 points
0
Even if at somepoint it would have been better for some particular human to believe false thing X, couldn’t there be a set of truths T which would be even better in every one of those situations?
- orthonormal 30 Jan 2012 5:48 UTC
  2 points
  0
  Parent
  Some of those truths may be above the cognitive capacity of even a smart human. The world doesn’t have to be fair.
[deleted] 16 Jun 2012 17:36 UTC
0 points
0
If my utility function has a high enough U(Babies undergoing mind-state annihilation) I will go about tiling the universe. It doesn’t at present and additionally implements U(high U(Babies undergoing mind-state annihilation)) as way low.
MugaSofer 22 Apr 2013 21:26 UTC
−2 points
0

All the happiness that the warm thought of an afterlife ever produced in humanity, has now been more than cancelled by the failure of humanity to institute systematic cryonic preservations after liquid nitrogen became cheap to manufacture. And I don’t think that anyone ever had that sort of failure in mind as a possible blowup, when they said, “But we need religious beliefs to cushion the fear of death.” That’s what black swan bets are all about—the unexpected blowup.

Y’know, I can’t help but notice that a lot of atheists talk about how death isn’t so bad—oh, he lives on in his works, it’s part of the circle of life blah blah blah—and this seems to suggest that deathism isn’t a side-effect of religion, although obviously it’s possible to construct models where they’re unprepared for harsh reality after a lifetime of heaven or whatever. So I would be surprised if a counterfactual world where religion never caught on had implemented universal cryopreservation. Does anyone have any stronger evidence, or a model that better predicts the facts?
MarsColony_in10years 28 Oct 2015 21:04 UTC
0 points
0

All the happiness that the warm thought of an afterlife ever produced in humanity, has now been more than cancelled by the failure of humanity to institute systematic cryonic preservations after liquid nitrogen became cheap to manufacture. And I don’t think that anyone ever had that sort of failure in mind as a possible blowup, when they said, “But we need religious beliefs to cushion the fear of death.” That’s what black swan bets are all about—the unexpected blowup.

That’s a fantastic quote.
mamert 15 Apr 2016 10:26 UTC
0 points
0
There is something I don’t understand about the “fooling programmers → shutdown” idea—how “It looks like it should be possible”

Am I correct assuming this requires an AI that:
1. Implements changes in discrete iterations
2. Understands how it itself works, both on low and very high levels (getting fractal here)
3. Can do the same for its more advanced iteration
4. Monitors the more advanced iteration (a simulation or a live deployment) for a certain behavior, preferably in real time or faster
5. Has the concept of deception (shutdown condition, AND a salient factor in other calculations)
6. At all times allows itself to be killed by its previous iteration (accidental Robin Hood/Friar Tuck hacks come to mind)
7. Has a full-proof definition of “deceiving programmers” which:
a) remains consistent with programmers’ intentions across iterations, immune to drift
- not just from n-1 to n, but from 0 to n—you could have iteration 0 also watch over n, but would it understand n?
b) remains applicable across iterations, cannot be bypassed or cheated
- “the device category is now ‘data processor’, not ‘human output device’, so- [process terminated]”
c) allows n to give different output than in n-1 (else why bother changing?)
- just having n-1 evaluate n’s output might not suffice: n-1 might need to change itself first to understand it
d) does not have too significant an impact on improvement process
- metaanalysis on own shutdown logs causing shutdown?
e) has a satisfactory solution for the concept of manipulating people, which somehow doesn’t paralyze the system
- situation: output data that is true but would be misinterpreted by / cause a glitch in some other system the results will be fed to, leading people to act upon the wrong results of in a way desirable by the AI
- “It is not deceit if the format change is communicated along with the output data. Or somewhere in the status update PDFs sent 40 times a second.”
- “this output is not false, it is merely unrelated”
- “but that would make the programmers’ business competitors draw wrong conclusions… wait, that is allowed? Moving non-programmers to Resource.Actuator category, calculating new self-improvement strategy.”
Relenzo 2 Nov 2016 21:50 UTC
0 points
0
I’ve been working my way through the Sequences—and I’m wondering a lot about this essay, in light of the previously-introduce notion of ‘how do you decide what values, given to you by natural selection, you are going to keep?’

Could someone use the stances you develop here, EY, to argue for something like Aristotelian ethics? (Which, admittedly, I may not properly understand fully, but my basic idea is:)

‘You chose to keep human life, human happiness, love, and learning as values in YOUR utility function,’ says the objector, ’even though you know where they came from. You decided that you wanted them anyway. You did this because you had to start somewhere, and you claim that if you stripped away everything provided by natural selection you wouldn’t be left with anything. Under the same logic, why can’t I keep all the ethical injunctions as terminal values?

‘Your explanation of where ‘the ends does not justify the means’, is very clever and all. Your explanation of ‘thou shalt not kill’ is very clever. But so what if we know where they came from? If we know why nature selected on them, in our specific case? I’m no more obligated to dispose of it than I am to dispose of ‘human happiness is good’.′

Is the counter-argument simply that this leads to a utility function you would call inconsistent?

Oh, and...sorry for commenting on all these dead threads...it’s a pity I got here so late.
- Viliam 3 Nov 2016 12:52 UTC
  0 points
  0
  Parent
  
  Could someone use (...) to argue for (...) ?
  
  No matter how you complete this pattern, the answer is obviously yes.
  
  The reasoning behind “in certain situations, you should not do the best thing” is based on observation that human rationality is limited, and that in certain situations it works even significantly worse than on average. It is the same line of reasoning that would make you advise people to e.g. not sign contracts while they are drunk, even if those contracts seem very good—maybe especially not when the contracts seem too good to be true.
  
  But imagine that you are talking to a drunkard who is in deep denial about his alcoholism (“hey, I only had one bottle of vodka, that’s nothing for me!”). If you instruct him to not sign contracts while drunk, he will sign one anyway, and tell you that he was’t that drunk when he signed it. To make a rule he couldn’t dismiss so easily, you would have to teach him to e.g. never sign a contract immediately, but always read it, read it again 24 hours later, read it again 48 hours later, and use an advice of at least three different family members and refuse to sign it if two of them say no. That is a rule that would have a chance to work even when he is in denial about his state, as long as he doesn’t want to break the rule openly. If the person is a complete idiot, you may tell him to never sign anything unless he discussed it with his lawyer (and no, he is not allowed to choose a different lawyer at the last moment). Such rules are designed to protect people against their own stupidity when interpreting the rules.
  
  Similarly, at the moments when people are least rational, they are most likely to insist that they are the smart ones who “have finally seen the light”, and everyone else in an idiot, especially those who try to make them aware of their moments of irrationality. You can’t simply give them a rule “don’t do extremely costly things with small probability of success when your rationality is impaired”, because they will just say their rationality is not impaired, and the probability of success is obviously 100%. Thus the rule is “don’t do extremely costly things, full stop”.
  
  Technically, sometimes the rule is not optimal. But following the rule makes much less harm on average than when you try to use your impaired reasoning at the moment to evaluate whether the rule applies to this specific situation or not. Because of the nature of the impairment, the moments when it is necessary are exactly the moments it will seem it does not apply because the situation is somehow exceptional (hint: all situations are somehow exceptional).
  
  All this is unrelated to the issue of values and utility function. (Which may further complicate the situation.)
  
  It is simply human nature that when there is a chance to grab power, it seems from inside like there is a unique opportunity to create a lot of good (other than “getting more power for me” itself) by violating some rule of “decent behavior”. What usually happens is that the expected good does not actually happen, or is very short-lived while the negative consequences remain for long.
  
  In theory, an artificial intelligence which did not arise by natural selection (which rewards agents pretending to others and themselves to be doing useful stuff for the tribe, when all they actually did was moving up on the power ladder, often at the total expense of the tribe) could be able to evaluate things correctly. Just like a sufficiently sheltered artificial intelligence could remain working correctly even when you pour a bottle of vodka on its case. This reasoning does not apply to humans.
  - Relenzo 4 Nov 2016 17:09 UTC
    2 points
    0
    Parent
    I understand why the notions exist—I was trying to address the question of ‘what explainable-moral-intuitions should we keep as terminal values, and how do we tell them apart from those we shouldn’t’.
    
    But your first sentence is taken very much to heart, sir.
    
    Maybe I’m being silly here, in hindsight. Certain intuitive desires are reducible to others, and some, like ‘love/happiness/fun/etc.’ are probably not. It feels obvious that most people should immediately see that. Yes, they want a given ethical injunction to be obeyed, but not as a fundamental/terminal value.
    
    Then again—there are Catholic moralists, including, I think, some Catholics I know personally, who firmly believe that (for example) stealing is wrong because stealing is wrong. Not for any other reason. Not because it brings harm to the person being stolen from. If you bring up exceptions—‘what about an orphan who will starve if they don’t steal that bread?’ they argue that this doesn’t count as stealing, not that it ‘proves that stealing isn’t really wrong.’ For them, every exception is simply to be included as another fundamental rule. At least, that’s the mindset, as far as I can tell. I saw the specific argument above being formulated for use against moral relativists, who were apparently out to destroy society by showing that different things were right for different people.
    
    Even though this article is about AI, and even though we should not trust ourselves to understand when we should be excepted from an injunction—this seems like a belief that might eventually have some negative real-world consequences. See, potentially, ‘homosexuality is wrong because homosexuality is wrong’?
    
    If I tried to tell any of these people about how ethical injunctions could be explained as heuristics for achieving higher terminal values—I can already feel myself being accused of shuffling things around, trying to convert goods into other incompatible goods in order to justify some sinister, contradictory worldview.
    
    If I brought up reductionism, it seems almost trivial—while I’m simulating their mind—to point out that no one has ever provably applied reduction to morals.
    
    So maybe let me rephrase: is there any way I could talk them out of it?
    - Viliam 7 Nov 2016 9:18 UTC
      1 point
      0
      Parent
      I guess some people are unable to deal with uncertainty, especially when it concerns important things (such as “I am not 100% sure whether doing A or doing B will make my soul burn forever in hell, but I have to make a decision now anyway”). The standard human way to deal with unpleasant information is to deny it. Catholic theologicians don’t have an option of denying hell, so the obvious solution is to deny uncertainty.
      
      “There is a rule X, which is perfectly unambiguous and perfectly good.”
      “But here is this non-central situation where following the rule blindly seems bad.”
      ”There is this ad-hoc rule Y, which covers the special situation, so the whole system is perfectly unambiguous and perfectly good.”
      “But here is another situation where...”
      ”There is another ad-hoc rule Z, which covers the other situation...”
      “But there is also...”
      ”There is yet another ad-hoc rule...”
      
      You can play this game forever, adding epicycles upon epicycles, but the answer is always going to be that the system is perfectly unambiguous and perfectly good. It is also obvious how they are cheating to achieve that. Also, the starving orphan is probably not aware of all these theological rules and exceptions, so obviously the answer is designed to make the theologician feel happy about the unambiguity of the situation.
      
      I don’t think you can actually talk people out of their emotional needs.
      - entirelyuseless 7 Nov 2016 14:11 UTC
        −2 points
        0
        Parent
        Here is a perfectly good rule: don’t do evil.
        
        Now suppose someone comes to you and tells you that they will save one billion lives if you promise to do evil for the rest of your life to the best of your ability.
        
        Suppose you decide that overall you will not be able to do enough evil to counteract saving one billion lives. Should you make the agreement and do evil for the rest of your life to the best of your ability?
        
        If you do, your actions will have overall good effects. And if you do, you will be doing evil, or you will not be fulfilling your promise.
        
        If you want to talk to people, you need to first understand what they are saying. And they saying that the question that is important to them is, “Is this action good or evil,” not “Are the results good or evil?” Those are two different questions, and there is nothing to prevent them from having different answers.
    - CCC 7 Nov 2016 11:34 UTC
      0 points
      0
      Parent
      
      Then again—there are Catholic moralists, including, I think, some Catholics I know personally, who firmly believe that (for example) stealing is wrong because stealing is wrong. Not for any other reason.
      
      This sounds like deontological ethics. It’s not by any means unique to Catholicism; it’s just the general idea that being good involves following a (presumably carefully chosen) list of rules.
      
      Not all Catholics are deontologists; not all deontologists are Catholic. And, I may be misreading here, but I think your worry is more about deontology than Catholicism; that is, it’s more about people who follow a list of rules instead of trying consequentialism or virtue ethics or something else along those lines. Is this accurate?
khafra 27 Feb 2017 15:38 UTC
0 points
0
Tangentially, there’s an upcoming Netflix six-episode series named “The Heavy Water War,” that should cover both this event, and the sabotage of the heavy water production facility that led up to it.
Дмитрий Зеленский 19 Aug 2019 0:49 UTC
2 points
0
Your protecting of Knut Haikelid’s decision only comes from your “it is more meaningful that we save lives than that we conform to a particular pattern while attempting it” moral rule (which is, as I argued, not part of many people’s ethics) - or am I getting something wrong?
As for lies on Singularity—a clever skeptic could say “people who are smart enough to expose you in a lie on such a technical matter are also smart enough to help you instead of exposing you, and you even could leave them a clue that you know you are lying that those outside the technical paradigm simply will not get”. It is a difficult technical matter, after all. As for simplicity—is it a terminal value? I think not.
NoriMori1992 1 Oct 2023 21:06 UTC
1 point
0
Hobbes said, “I don’t know what’s worse, the fact that everyone’s got a price, or the fact that their price is so low.”
You don’t specify which Hobbes. When I Googled this quote trying to find out, I didn’t find any results that didn’t trace back to this post. I kept reducing the strictness of the exact wording, and still didn’t get any not-this results, until I reduced it to “got a price” and “so low”, which turned up basically the same quote, differently worded, on TV Tropes, attributing it to Calvin and Hobbes. I had assumed that might be the source, since I’ve seen you speak highly of Calvin and Hobbes elsewhere, but I didn’t know for sure, and checking ended up being surprisingly difficult. (Not sure which version is misquoted, this one or the TV Tropes one. Possibly both, since the latter only turned up one other source, a Twitter post that might have gotten it from the same place.)
Crazy philosopher 2 Jun 2024 7:30 UTC
1 point
0
Or, to summarize this essay:
Dientological rules are almost directly based on experimental experience, and utilitarian statements are very complex arguments.
“If you’ve truly understood the reason and the rhythm behind ethics, then one major sign is that, augmented by this newfound knowledge, you don’t do those things that previously seemed like ethical transgressions. Only now you know why.”
In other words, your theory should describe the facts well. Let’s say we know that 90% of the people who decided to do X with the best intentions ended up being villains. But if in such a situation it seems to you that if YOU had done X without moral preparation, then you definitely would not have gone over to the dark side of the force… this means that your theory does not explain the data well. But if your usual consequentialist morality produces 90% of the results similar to the consequences of the dientological rules, then I solemnly declare that your consequentialist morality is practically perfect, and those 10% of discrepancies are errors of dientology, and I advise you to trust your consequentialist morality.
“They don’t mention the problem of running on corrupted hardware. They don’t mention the idea that lies have to be recursively protected from all the truths and all the truthfinding techniques that threaten them. They don’t mention that honest ways have a simplicity that dishonest ways often lack. They don’t talk about black-swan bets. They don’t talk about the terrible nakedness of discarding the last defense you have against yourself, and trying to survive on raw calculation.”
In this paragraph, in fact, Eliezer says: “the world is more complicated than it seems, and we do not fully understand it, so complex theories work worse than they seem, so trust the dientological rules (simple theories)”
And one more thing: it seems that when you break a dientological rule, it would be wise to remember it as: “yeah, I broke a dientological rule. I don’t see exactly where I went wrong, but in any case, this is Bayesian evidence that I was wrong.” And this evidence may be decisive and change the result of reflection, or it may not be.