My utility function has non-zero terms for preferences of other people. If I asked each one of the 3^^^3 people whether they would prefer a dust speck if it would save someone a horrible fifty-year torture, they (my simulation of them) would say YES in 20*3^^^3-feet letters.
gRR
It wouldn’t follows that it is a good idea, or efficient idea. But it would follow that it is the preferred idea, as calculated by my utility function that has non-zero terms for preferences of other people.
Fortunately, my simulation of other people doesn’t suddenly wish to help an arbitrary person by donating a dollar with 99% transaction cost.
My point was that the “SPECKS!!” answer to the original problem, which is intuitively obvious to (I think) most people here, is not necessarily wrong. It can directly follow from expected utility maximization, if the utility function values the choice of people, even if the choice is “economically” suboptimal.
But I am trying to maximize the total utility, just a different one.
Ok, let me put it this way. I will drop the terms for other people’s preferences from my utility function. It is now entirely self-centered. But it still values the good feeling I get if I’m allowed to participate in saving someone from fifty years of torture. The value of this feeling if much more than the miniscule negative utility of a dust speck. Now, assume some reasonable percent of the 3^^^3 people are like me in this respect. Maximizing the total utility for everybody results in: SPECKS!!
Now an objection can be stated that by the conditions of the problem, I cannot change the utilities of the 3^^^3 people. They are given and are equal to a miniscule negative value corresponding to the small speck of dust. Evil forces give me the sadistic choice and don’t allow me to share the good news with everyone. Ok. But I can still imagine what the people would have preferred if given a choice. So I add a term for their preference to my utility function. I’m behaving like a representative of people in a government. Or like a Friendly AI trying to implement their CEV.
In other words, Walzer’s Spheres of Justice concept, which states that some trade-offs are morally impermissible, is not really implementable in a utility function.
My arguments have nothing to do with Walzer’s Spheres of Justice concept, AFAICT.
Now, assume some reasonable percent of the 3^^^3 people are like me in this respect. Maximizing the total utility for everybody results in: SPECKS!!
The point of picking a number the size of 3^^^3 is that it is so large that this statement is false.
Why would it ever be false, no matter how large the number?
Let S = negated disutility of speck, a small positive number. Let F = utility of good feeling of protecting someone from torture. Let P = the fraction of people who are like me (for whom F is positive), 0 < P ⇐ 1. Then the total utility for N people, no matter what N, is N(PF—S), which is >0 as long as P*F > S.
I don’t accept that utility is additive.
Well, we can agree that utility is complicated. I think it’s possible to keep it additive by shifting complexities to the details of its calculation.
Yes, I stated and answered this exact objection two comments ago.
Choosing TORTURE is making a decision to condemn someone to fifty years of torture, while knowing that 3^^^3 people would not want you to do so, would beg you not to, would react with horror and revulsion if/when they knew you did it. And you must do it for the sake of some global principle or something. I’d say it puts one at least into Well-intentioned Extremist / KnightTemplar category, if not outright villain.
If an AI had made a choice like that, against known wishes of practically everyone, I’d say it was rather unfriendly.
ADDED: Detailed
Given: a paradoxical (to everybody except some moral philosophers) answer “TORTURE” appears to follow from expected utility maximization.
Possibility 1: the theory is right, everybody is wrong.
But in the domain of moral philosophy, our preferences should be treated with more respect than elsewhere. We cherish some of our biases. They are what makes us human, we wouldn’t want to lose them, even if sometimes they give “inefficient” answer from the point of view of simplest greedy utility function.
These biases are probably reflexively consistent—even if we knew more, we would still wish to have them. At least, I can hypothesize that they are so, until proven otherwise. Simply showing me the inefficiency doesn’t make me wish not to have the bias. I value efficiency, but I value my humanity more.
Possibility 2: the theory (expected utility maximization) is wrong.
But the theory is rather nice and elegant, I wouldn’t wish to throw it away. So, maybe there’s another way to fix the paradox? Maybe, something wrong with the problem definition? And lo and behold—yes, there is.
Possibility 3: the problem is wrong
As the problem is stated, the preferences of 3^^^3 people are not taken into account. It is assumed that the people don’t know and will never know about the situation—because their total utility change regarding the whole is either nothing or a single small negative value.
If people were aware of the situation, their utility changes would be different—a large negative value from knowing about the tortured person’s plight and being forcibly forbidden to help, or a positive value from knowing they helped. Well, there would also be a negative value from moral philosophers who would know and worry about inefficiency, but I think it would be a relatively small value, after all.
Unfortunately, in the context of the problem, the people are unaware. The choice for the whole humanity is given to me alone. What should I do? Should I play dictator and make a choice that would be repudated by everyone, if they only knew? This seems wrong, somehow. Oh! I can simulate them, ask what they would prefer, and give their preference a positive term within my own utility function. I would be the representative of the people in a government, or an AI trying to implement their CEV.
Result: SPECKS!! Hurray! :)
- 9 Feb 2012 7:53 UTC; 2 points) 's comment on Torture vs. Dust Specks by (
I’m not sure why do you think I’m asking a different question. Do you mean to say that in the original Eliezer’s problem all of the utilities are fixed, including mine? But then, the question appears entirely without content:
“Here are two numbers, this one is bigger than that one, your task is to always choose the biggest number. Now which number do you choose?”
Besides, if this is indeed what Eliezer meant, then his choice of “torture” for one of the numbers is inconsistent. Torture always has utility implications for other people, not just the person being tortured. I hypothesize that this is what makes it different (non-additive, non-commeasurable, etc) for some moral philosophers.
Any utility function that does not give an explicit overwhelmingly positive value to truth, and does give an explicit positive value to “pleasure” would obviously include the implication that discovering or publicizing unpleasant truths can be morally wrong. I don’t see why it is relevant.
If all the utilities are specified by the problem text completely, then TORTURE maximizes the total utility by definition. There’s nothing to be committed about. But in this case, “torture” is just a label. It cannot refer to a real torture, because a real torture would produce different utility changes for people.
One group saw the measure described as saving 150 lives. The other group saw the measure described as saving 98% of 150 lives. The hypothesis motivating the experiment was that saving 150 lives sounds vaguely good—is that a lot? a little? - while saving 98% of something is clearly very good because 98% is so close to the upper bound of the percentage scale. Lo and behold, saving 150 lives had mean support of 10.4, while saving 98% of 150 lives had mean support of 13.6.
Pragmatics of normal language usage prescribes that any explicitly supplied information will be relevant to the hearer. Assuming that “98%” is relevant, and no other useful information, it is rational to support a measure with such a high level of efficiency, and to support it more than one for which no efficiency figure is provided.
Q: How many Eliezer Yudkowskys does it take to change a light bulb?
A: His mind only needs to impose the ‘triangular’ concept on a light bulb, and then the light bulb changes by itself.
The argument that confused me at first was: “Wouldn’t the second researcher always be able to produce a >60% result given enough time and resources, no matter what the actual efficacy of the treatment is?”
But this is not true. If the true efficacy is < 60%, then the probability of observing a “>60%” result at least once in a sequence of N experiments does not tend to 1 as N goes to infinity.
What if its domain is restricted to math and self-modification? Then, if it fooms, it will be a safe math Oracle, possibly even provably safe. Then it would be a huge help in the road to FAI, both directly and as a case study.
What do we know of the second Hat-and-Cloak:
doesn’t know Hermione well
clever, but not too clever (it takes him a LOT of time)
=> not Quirrell or Snape
has some motive, probably foreshadowed (EY is a good writer)
was recognized by Hermione
=> not anyone unknown
has no morals (mindrape)
knows that Severus Snape is a Death Eater
says he knows the true reason for the coldness in Harry Potter’s eyes
and says he’s frightened of it
and says he knows HP is dangerous to Hermione
says: “Lucius Malfoy has taken notice of you, Hermione...”
=> He’s Yhpvhf Znysbl!
One thing contrary: He says he knows the true nature of Professor Quirrell’s mysterious illness. This doesn’t fit...
The setup can be translated into a purely logical framework. There would be constants A (Agent) and W (World), satisfying the axioms:
(Ax1) A=1 OR A=2
(Ax2) (A=1 AND W=1000) OR (A=2 AND W=1000000)
(Ax3) forall a1,a2,w1,w2 ((A=a1 ⇒ W=w1) AND (A=a2 ⇒ W=w2) AND (w1>w2)) ⇒ NOT (A=a2)
Then the Agent’s algorithm would be equivalent to a step-by-step deduction:
(1) Ax2 |- A=1 ⇒ W=1000
(2) Ax2 |- A=2 ⇒ W=1000000
(3) Ax3 + (1) + (2) |- NOT (A=1)
(4) Ax1 + (3) |- A=2
In this form it is clear that there are no logical contradictions at any time. The system never believes A=1, it only believes (A=1 ⇒ W=1000), which is true.
In the purely logical framework, the word “could” is modeled by what Hofstadter calls the “Fantasy Rule” of propositional calculus.
I think it’s possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.
The input to the programs would be a pair (opponent’s source, opponent’s proof), and the output would be the decision: cooperate or defect.
The algorithm for A would be: Check the B’s proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out—cooperate, otherwise—defect.
Let ChecksOut(code, proof, X) = true iff proof is a proof that code is a function of two parameters PCode and PProof, which returns “C” if ChecksOut(PCode, PProof).
Def A(code, proof): if(ChecksOut(code, proof)) return “C”, otherwise return “D”.
The definition of ChecksOut is kinda circular, but it should be fixable with some kind of diagonalization. Like:
SeedChecksOut(code, proof, X) = true iff proof is a proof that code is a function of two parameters PCode and PProof, which returns “C” if eval(X(PCode, PProof)),
ChecksOut(code, proof) = SeedChecksOut(code, proof, #SeedChecksOut)
Hmm, yes, you’re right, it’s quining cooperation all over again—as defined, the programs would only cooperate with opponents that are sufficiently similar syntactically. And they need compatible proof systems in any case. So it’s not a solution to the same problem, only a possible optimization.
Are you searching for positive examples of positive bias right now, or sparing a fraction of your search on what positive bias should lead you to not see?
Isn’t what positive bias should lead you to not see a positive example of positive bias? Or am I explaining the joke?