In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
I would still implement the CEV option but I’d do it for real reasons.
Yes. I really don’t want the volition of psychopaths, suicidal fanatics and jerks in general to be extrapolated in such a way as it could destroy all that I hold dear. Let this be my solemnly sworn testimony made public where all can see. Allow me (wedrifid_2011) to commit to my declaration of my preferences as of the 21st of October by requesting that you quote me, leaving me unable to edit it away.
I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If
calculating based on a subset happens to be the most
practical instrumentally useful hack for implementing
CEV then an even remotely competent AI can
figure that out itself.
I would still implement the CEV option but I’d do
it for real reasons.
Jack wrote:
That something you want to say in public?
wedrifid wrote:
Yes. I really don’t want the volition of psychopaths,
suicidal fanatics and jerks in general to be extrapolated
in such a way as it could destroy all that I hold dear.
Let this be my solemnly sworn testimony made public where
all can see. Allow me (wedrifid_2011) to commit to my
declaration of my preferences as of the 21st of October by
requesting that you quote me, leaving me unable to edit it
away.
You are treading on treacherous moral ground! Your “jerk” may be my best mate (OK, he’s a bit intense… but you are no angel either!). Your “suicidal fanatic” may be my hero. As for psychopaths, see this.
Also, I can understand “I really don’t want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear”—why pick on psychopaths, suicidal fanatics and jerks in particular?
You are treading on treacherous moral ground! Your “jerk” may be my best mate (OK, he’s a bit intense… but you are no angel either!). Your “suicidal fanatic” may be my hero.
If so then I don’t want your volition extrapolated either. Because that would destroy everything I hold dear as well (given the extent to which you would either care about their dystopic values yourself or care about them getting those same values achieved).
Also, I can understand “I really don’t want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear”
I obviously would prefer an FAI to extrapolate only MY volition. Any other preference is a trivial reductio to absurdity. The reason to support the implementation of an FAI that extrapolates more generally is so that I can cooperate with other people whose preferences are not too much different to mine (and in some cases may even resolve to be identical). Cooperative alliances are best formed with people with compatible goals and not those whose success would directly sabotage your own.
why pick on psychopaths, suicidal fanatics and jerks in particular?
Do I need to write a post “Giving a few examples does not assert a full specification of a set”? I’m starting to feel the need to have such a post to link to pre-emptively.
Not anywhere closer to understanding how altruism and morality apply to extrapolated volition for a start.
Note that the conditions that apply to the quote but that are not included are rather significant. Approximately it is conditional on your volition being to help other agents do catastrophically bad things to the future light cone.
What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).
What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).
I barely understand this sentence. Do you mean: Excluding “jerks” from CEV does not guarantee that their destructive preferences will not be included?
If so, I totally do not agree with you, as my opinion is: Including “jerks” in CEV will not pose a danger, and saves the trouble of determining who is a “jerk” in the first place.
This is based on the observation that “jerks” are a minority, an opinion that “EV-jerks” are practically non-existent, and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV. If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.
I hope you are right. But that is what it is, hope. I cannot know with any confidence that and Artificial Intelligence implementing CEV is Friendly. I cannot know if it will result in me and the people I care about continuing to live. It may result in something that, say, Robin Hanson considers desirable (and I would consider worse than simple extinction.)
Declaring CEV to be optimal amounts to saying “I have faith that everyone is allright on the inside and we would all get along if we thought about it a bit more. Bullshit. That’s a great belief to have if you want to signal your personal ability to enforce cooperation in your social environment but not a belief that you want actual decision makers to have. Or, at least, not one you want them to simply assume without huge amounts of both theoretical and empirical research.
(Here I should again refer you to the additional safeguards Eliezer proposed/speculated on for in case CEV results in Jerkiness. This is the benefit of being able to acknowledge that CEV isn’t good by definition. You can plan ahead just in case!)
If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.
It is primarily a question of understanding (and being willing to understand) the content.
This is based on the observation that “jerks” are a minority, an opinion that “EV-jerks” are practically non-existent
You don’t know that. Particularly since EV is not currently sufficiently defined to make any absolute claims. EV doesn’t magically make people nice or especially cooperative unless you decide to hack in a “make nicer” component to the extrapolation routine.
and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV
You don’t know that either. The ‘coherence’ part of CEV is even less specified than the EV part. Majority rule is one way of resolving conflicts between competing agents. It isn’t the only one. But I don’t even know that AI> results in something I would consider Friendly. Again, there is a decent chance that it is not-completely-terrible but that isn’t something to count on without thorough research and isn’t an ideal to aspire to either. Just something that may need to be compromised down to.
The ‘coherence’ part of CEV is even less specified than the EV part.
One possibility is the one inclined to shut down rather than do anything not neutral or better from every perspective. This system is pretty likely useless, but likely to be safe too, and not certainly useless. Variants allow some negatives, but I don’t know how one would draw a line—allowing everyone a veto and requiring negotiation with them would be pretty safe, but also nearly useless.
EV doesn’t magically make people nice or especially cooperative
I’m not sure exactly what you’re implying so I’ll state something you may or may not agree with. It seems likely it makes people more cooperative in some areas, and has unknown implications in other areas, so as to whether it makes them ultimately more or less cooperative, that is unknown. But the little we can see is of cooperation increasing, and it would be unreasonable to be greatly surprised in the event that were found to be the overwhelming net effect.
But I don’t even know that AI> results in something I would consider Friendly.
As most possible minds don’t care about humans, I object to using “unfriendly” to mean “an AI that would result in a world that I don’t value.” I think it better to use “unfriendly” to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.
As most possible minds don’t care about humans, I object to using “unfriendly” to mean “an AI that would result in a world that I don’t value.” I think it better to use “unfriendly” to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.
I disagree. I will never refer to anything that wants to kill or torture me as friendly. Because that would be insane. AIs that are friendly to certain other people but not to me are instances of uFAIs in the same way that paperclippers are uFAIs (that are Friendly to paperclips). I incidentally also reject FAI and FAI. Although in the latter case I would still choose it as an alternative to nothing (which likely defaults to extinction).
Mind you the nomenclature isn’t really sufficient to the task either way. I prefer to make my meaning clear of ambiguities. So if talking about “Friendly” AI that will kill me I tend to use the quotes that I just used while if I am talking about something that is Friendly to a specific group I’ll parameterize.
I will never refer to anything that wants to kill or torture me as friendly
OK—this is included under what I would suggest to call “Friendly”, certainly if it only wanted to do so instrumentally, so we have a genuine disagreement. This is a good example for you to raise, as most even here might agree with how you put that.
Nonetheless, my example is not included under this, so let’s be sure not to talk past each other. It was intended to be a moderate case, one in which you might not call something friendly when many others here would* - one in which a being wouldn’t desire to torture you, and would be bluffing if only in the sense that it had scrupulously avoided possible futures in which anyone would be tortured, if not in other senses (i.e. it actually would torture you, if you chose the way you won’t).
As for not killing you, that sounds like an obviously badly phrased genie wish. As a similar point to the one you expressed would be reasonable and fully contrast with mine I’m surprised you added that.
One can go either way (or other or both ways) on this labeling. I am apparently buying into the mind-projection fallacy and trying to use “Friendly” the way terms like “funny” or “wrong” are regularly used in English. If every human but me “finds something funny”, it’s often least confusing to say it’s “a funny thing that isn’t funny to me” or “something everyone else considers wrong that I don’t consider “wrong” (according to the simplest way of dividing concept-space) that is also advantageous for me”. You favor taking this new term and avoiding using the MPF, unlike for other English terms, and having it be understood that listeners are never to infer meaning as if the speaker was committing it, I favor just using it like any other term.
So:
Mind you the nomenclature isn’t really sufficient to the task either way
My way, a being that wanted to do well by some humans and not others would be objectively both Friendly and Unfriendly, so that might be enough to make my usage inferior. But if my molecules are made out of usefulonium, and no one else’s are, I very much mind a being exploiting me for that, but wouldn’t mind other humans calling that being friendly when it uses the usefulonium to shield the Earth from a supernova, or whatever—and it’s not just not minding by comparison, either.
*I mean both when others refer to beings making analogous threats to them and to the one that would make them to you.
Do you mean: Excluding “jerks” from CEV does not guarantee that their destructive preferences will not be included?
If so, I totally do not agree with you
Through me, my dog is included. All the more so mothers’ sons!
an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV.
I don’t think this is true, the safeguard that’s safe is to shut down if a conflict exists. That way, either things are simply better or no worse; judging between cases when each case has some advantages over the other is tricky.
If so then I don’t want your volition extrapolated either. Because that would destroy everything I hold dear
How? As is, psychopaths have some influence, and I don’t consider the world worthless. Whatever their slice of a much larger pie, how would that be a difference in kind, something other than a lost opportunity?
There is a reasonable good chance that when averaged out by the currently unspecified method used by the CEV process that any abominable volitions are offset by volitions that are at least vaguely acceptable. But that doesn’t mean including Jerks (where ‘Jerk’ is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do any more than including paperclippers, superhappies and babyeaters in the process is obviously The Right Thing To Do.
CEV might turn out OK. Given the choice of setting loose a {Superintelligence Optimising CEV} or {Nothing At All nothing at all and we all go extinct} I’ll choose the former. There are also obvious political reasons why such a compromise might be necessary.
If anyone thinks that CEV is not a worse thing to set loose than CEV then they are not being altruistic or moral they are being confused about a matter of fact.
Disclaimer that is becoming almost mandatory in this kind of discussion: altruism, ethics and morality belong inside utility functions and volitions not in game theory or abstract optimisation processes.
But that doesn’t mean including Jerks (where ‘Jerk’ is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do
Sure, inclusion is a thing that causes good and bad outcomes, and not necessarily net good outcomes.
There are also obvious political reasons why such a compromise might be necessary.
Sure, but it’s not logically necessary that it’s a compromise, though it might be. It might be that the good outweighs the bad, or not, I’m not sure from where I stand.
If anyone thinks that CEV is not a worse thing to set loose than CEV then they are not being altruistic or moral they are being confused about a matter of fact.
Because I value inclusiveness more than zero, that’s not necessarily true. It’s probably true, or better yet, if one includes the best of the obvious Jerks with the rest of humanity, it’s quite probably true. All else equal, I’d rather an individual be in than out, so if someone is all else equal worse than useless but only light ballast, having them is a net good.
I think your distinction is artificial, can you use it to show how an example question is a wrong question and another isn’t, and show how your distinction sorts among those two types well?
Your Adam and and Eve reply made absolutely no sense and this question makes only slightly more. I cannot relate what you are saying to the disclaimer that you partially quote (except one way that implies you don’t understand the subject matter—which I prefer not to assume). I cannot answer a question about what I am saying when I cannot see how on earth it is relevant.
You missed my point 3 times out of 3. Wait, I’ll put down the flyswatter and pick up this hammer...:
Excluding certain persons from CEV creates issues that CEV was intended to resolve in the first place. The mechanic you suggest—excluding persons that YOU deem to be unfit—might look attractive to you, but it will not be universally acceptable.
Note that “our coherent extrapolated volition is our wish if we knew more, were smarter...” etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned—you both probably value freedom, justice, friendship, security and like good food, sex and World of Warcraft(1)… you just don’t know why he believes that suicidal fanaticism is the right way under his circumstances, and he is, perhaps, not smart enough to see other options to strive for his values.
Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
(1) that was a backhand with the flyswatter, which I grabbed with my left hand just then.
Note that “our coherent extrapolated volition is our wish if we knew more, were smarter...” etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned—you both probably value freedom
No. I will NOT assume that extrapolating the volition of people with vastly different preferences to me will magically make them compatible with mine. The universe is just not that convenient. Pretending it is while implementing a FAI is suicidally naive.
Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
I’m familiar with the document, as well as approximately everything else said on the subject here, even in passing. This includes Eliezer propozing ad-hoc work arounds to the “What if people are jerks?” problem.
Quite right, don’t assume. Think it through. Then you may be less inclined to pepper your posts with non-sequiturs like “magically”, “pretending” and “naive”.
I’m familiar with the document, as well as approximately everything else said on the subject here, even in passing.
Great! But, IMHO, you have a tendency to miss the point. So:
Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
What do you mean? As an analogy, .01% sure and 99.99% sure are both states of uncertainty. EVs are exactly the same or they aren’t. If someone’s unmuddled EV is different than mine—and it will be—I am better off with mine influencing the future alone rather than the future being influenced by both of us, unless my EV sufficiently values that person’s participation.
My current EV places some non-infinite value on each person’s participation. You can assume for the sake of argument each person’s EV would more greatly value this.
You can correctly assume that for each person, all else equal, I’d rather have them than not, (though not necessarily at the cost of having the universe diverted from my wishes) but I don’t really see why the death of most of the single ring species that is everything alive today makes selecting humans alone for CEV the right thing to do in a way that avoids the problem of excluding the disenfranchised whom the creators don’t care sufficiently about.
If enough humans value what other humans want, and more so when extrapolated, it’s an interlocking enough network to scoop up all humans but the biologist who spends all day with chimpanzees (dolphins, octopuses, dogs, whatever) is going to be a bit disappointed by the first-order exclusion of his or her friends from consideration.
I mean, once they both take pains to understand each other’s situation and have a good, long think about it, they would find that they will agree on the big issues and be able to easily accommodate their differences. I even suspect that overall they would value the fact that certain differences exist.
EVs can, of course, be exactly the same, or differ to some degree. But—provided we restrict ourselves to humans—the basic human needs and wants are really quite consistent across an overwhelming majority. There is enough material (on the web and in print) to support this.
Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say “obliterate Israel” or “my way or the highway”) with high level goals.
I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of “simpler” creatures would be impossible. See http://xkcd.com/605/.
Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say “obliterate Israel” or “my way or the highway”) with high level goals.
You are mistaken. That I entertain no such confusion should be overwhelmingly clear from reading nearby comments.
I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of “simpler” creatures would be impossible.
That sounds awfully convenient. If there really is a threshold of how “non-simple” a lifeform has to be to have coherently extrapolatable volitions, do you have any particular evidence that humans clear that threshold and, say, dolphins don’t?
For my part, I suspect strongly that any technique that arrives reliably at anything that even remotely approximates CEV for a human can also be used reliably on many other species. I can’t imagine what that technique would be, though.
(Just for clarity: that’s not to say one has to take other species’ volition into account, any more than one has to take other individuals’ volition into account.)
The lack of threshold is exactly the issue. If you include dolphins and chimpanzees, explicitly, you’d be in a position to apply the same reasoning to include parrots and dogs, then rodents and octopi, etc, etc.
Eventually you’ll slide far enough down this slippery slope to reach caterpillars and parasitic wasps. Now, what would a wasp want to do, if it understood how its acts affect the other creatures worthy of inclusion in the CEV?
This is what I see as the difficulty in extrapolating the wishes of simpler creatures. Perhaps in fact there is a coherent solution, but having only thought about this a little, I suspect there might not be one.
lack of threshold...then rodents...parasitic wasps
We don’t have to care. If everyone or nearly all were convinced that something less than 20 pounds had no moral value, or a person less than 40 days old, or whatever, that would be that.
Also, as some infinite sums have finite limits, I do not think that small things necessarily make summing humans’ or the Earth’s morality impossible.
Ah, OK. Sure, if your concern is that, if we extrapolated the volition of such creatures, we would find that they don’t cohere, I’m with you. I have similar concerns about humans, actually.
I’d thought you were saying that we’d be unable to extrapolate it in the first place, which is a different problem.
Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section?
Just, uh… just making sure: you do know that wedrifid has more fourteen thousand karma for a reason, right? It’s actually not solely because he’s an oldtimer, he can be counted on to have thought about this stuff pretty thoroughly.
Edit: I’m not saying “defer to him because he has high status”, I’m saying “this is strong evidence that he is not an idiot.”
I admit to being a little embarrassed as I wrote that paragraph, because this sort of thing can come across as “fuck you”. Not my intent at all, just that the reference is relevant, well written, supports my point—and is too long to quote.
Having said that, your comment is pretty stupid. Yes, he has heaps more karma here—so what? I have more karma here than R. Dawkins and B. Obama combined!
The “so what” is, he’s already read it. Also, he’s, you know, smart. A bit abrasive (or more than a bit), but still. He’s not going to go “You know, you’re right! I never thought about it that way, what a fool I’ve been!”
A bit of an ethical egoist (or more than a bit), but still.
I suppose “ethical egoism” fits. But only in some completely subverted “inclusive ethical egoist” sense in which my own “self interest” already takes into account all my altruistic moral and ethical values. ie. I’m basically not an ethical egoist at all. I just put my ethics inside the utility function where they belong.
I’m not sure this can mean one thing that is also important.
Huh? Yes it can. It means “results in something closer to CEV than the alternative does”, which is pretty damn important given that it is exactly what the context was talking about.
that it is exactly what the context was talking about.
I agree that context alone pointed to that interpretation, but as that makes your statement a tautology, I thought it more likely than not you were referencing a more general meaning than the one under discussion. This was particularly so because of the connotations of “wary”, i.e. “this sort of argument tends to seem more persuasive than it should, but the outside view doesn’t rule them out entirely,” rather than “arguments of this form are always wrong because they are logically inconsistent”.
Because Phlebas’s argument is not, in fact, tautologically false and is merely blatantly false I chose to refrain from a (false) accusation of inconsistency.
Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was “blatantly false”:
Phlebas:
In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
wedrifid:
I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
Note that I have made no particular claim about how likely it is that the selective CEV will closer to the ideal CEV of humanity than the universal CEV. I merely claimed that it is not what they most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
[Here] is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out for a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humaity as a selective CEV would.
Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy. If you are going to use insults, please back them up with a detailed, watertight argument.
How probable exactly is an interesting question, but I shan’t discuss that in this comment since I don’t wish to muddy the waters regarding the nature of the original statement that you were criticising.
Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was blatantly false:
Phlebas:
In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
wedrifid:
I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
Note that I have made no particular claim in this excerpt about how likely it is that a selective CEV would produce output closer to that of an ideal universal CEV dynamic than a universal CEV would. I merely claimed that a universal CEV dynamic designed by humans is not what humans most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
Here is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humanity as a selective CEV would.
Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy.
How probable exactly is an interesting question, but best left alone in this comment since I don’t wish to muddy the waters regarding the nature of the original statement that you were criticising.
The point being that actually, it is worthwhile to point out simply that it is not a logical necessity—because people actually believe that. Once that is accepted, it clears the way for discussion of the actual probability that the AI does such a good job.
Therefore there is not one thing wrong with the excerpt that you quoted (and if you have a problem with another part of the essay, you should at least point out where the fallacy is).
To address the question of the likelihood of the AI patching things up itself:
How much trust do we put in human programmers? In one instance, they would have to create a dynamic that can apply transformations to Nobel laureates; in the other, they must create a dynamic that can apply transformations to a massive number of mutually antagonistic, primitive, low-IQ and superstitious minds.
Furthermore, although speculation about the details of the implementation becomes necessary, using a small group of minds the programmers could learn about these minds in vast detail, specifically identifying any particular problems and conducting tests and trials, whereas with 7 billion or more minds this is impossible.
The initial dynamic is supposed to be capable of generating an improved dynamic. On the other hand, there are certain things the AI can’t help with. The AI does have vast knowledge of its own, but the programmers have specified the way in which the AI is to “increase knowledge” and so forth of the humans in the first place. This is the distinction wedrifid seems to have missed. If this specification is lousy in the first place, then the output that the AI extracts from extrapolating the volition of humanity might be some way off the mark, in comparison to the ouput if “increasing knowledge” etc. was done in an ideal fashion.
The AI may then go on to implement a new CEV dynamic – but this might be a lousy equilibrium generated by an original poor implementation of transforming the volition of humanity, and this poor reflection of human volition is down to the abilities of the human programmers.
On the other hand, it might take a suboptimal initial dynamic (with suboptimal specifications of “increase knowledge”, “grow up closer together etc.), and manage to locate the ideal dynamic. What I dispute is that this is “blatantly” obvious. That is (motivated) overconfidence regarding a scenario that is purely theoretical, and very vague at this point.
And I certainly dispute that it is necessary “by definition”, which is all I actually claimed in my essay!!
In other words, a superintelligence is not immune to GIGO. Getting an output of some kind from the CEV does not guarantee that the superintelligence has circumvented this problem.
Edit: He has disappeared. How is this for a rational quote:
“This is always the tactic of the denialist: lie and run. He never stays for a fight; he never admits error, even the most glaring; his goal is to pack the maximum insult into the minimum number of words.”
I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
I would still implement the CEV option but I’d do it for real reasons.
That something you want to say in public?
Yes. I really don’t want the volition of psychopaths, suicidal fanatics and jerks in general to be extrapolated in such a way as it could destroy all that I hold dear. Let this be my solemnly sworn testimony made public where all can see. Allow me (wedrifid_2011) to commit to my declaration of my preferences as of the 21st of October by requesting that you quote me, leaving me unable to edit it away.
wedrifid wrote:
Jack wrote:
wedrifid wrote:
Yes, but now they see you coming.
You are treading on treacherous moral ground! Your “jerk” may be my best mate (OK, he’s a bit intense… but you are no angel either!). Your “suicidal fanatic” may be my hero. As for psychopaths, see this.
Also, I can understand “I really don’t want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear”—why pick on psychopaths, suicidal fanatics and jerks in particular?
If so then I don’t want your volition extrapolated either. Because that would destroy everything I hold dear as well (given the extent to which you would either care about their dystopic values yourself or care about them getting those same values achieved).
I obviously would prefer an FAI to extrapolate only MY volition. Any other preference is a trivial reductio to absurdity. The reason to support the implementation of an FAI that extrapolates more generally is so that I can cooperate with other people whose preferences are not too much different to mine (and in some cases may even resolve to be identical). Cooperative alliances are best formed with people with compatible goals and not those whose success would directly sabotage your own.
Do I need to write a post “Giving a few examples does not assert a full specification of a set”? I’m starting to feel the need to have such a post to link to pre-emptively.
You are a jerk!
. . . .
See where this approach gets us?
Not anywhere closer to understanding how altruism and morality apply to extrapolated volition for a start.
Note that the conditions that apply to the quote but that are not included are rather significant. Approximately it is conditional on your volition being to help other agents do catastrophically bad things to the future light cone.
What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).
I barely understand this sentence. Do you mean: Excluding “jerks” from CEV does not guarantee that their destructive preferences will not be included?
If so, I totally do not agree with you, as my opinion is: Including “jerks” in CEV will not pose a danger, and saves the trouble of determining who is a “jerk” in the first place.
This is based on the observation that “jerks” are a minority, an opinion that “EV-jerks” are practically non-existent, and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV. If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.
I hope you are right. But that is what it is, hope. I cannot know with any confidence that and Artificial Intelligence implementing CEV is Friendly. I cannot know if it will result in me and the people I care about continuing to live. It may result in something that, say, Robin Hanson considers desirable (and I would consider worse than simple extinction.)
Declaring CEV to be optimal amounts to saying “I have faith that everyone is allright on the inside and we would all get along if we thought about it a bit more. Bullshit. That’s a great belief to have if you want to signal your personal ability to enforce cooperation in your social environment but not a belief that you want actual decision makers to have. Or, at least, not one you want them to simply assume without huge amounts of both theoretical and empirical research.
(Here I should again refer you to the additional safeguards Eliezer proposed/speculated on for in case CEV results in Jerkiness. This is the benefit of being able to acknowledge that CEV isn’t good by definition. You can plan ahead just in case!)
It is primarily a question of understanding (and being willing to understand) the content.
You don’t know that. Particularly since EV is not currently sufficiently defined to make any absolute claims. EV doesn’t magically make people nice or especially cooperative unless you decide to hack in a “make nicer” component to the extrapolation routine.
You don’t know that either. The ‘coherence’ part of CEV is even less specified than the EV part. Majority rule is one way of resolving conflicts between competing agents. It isn’t the only one. But I don’t even know that AI> results in something I would consider Friendly. Again, there is a decent chance that it is not-completely-terrible but that isn’t something to count on without thorough research and isn’t an ideal to aspire to either. Just something that may need to be compromised down to.
One possibility is the one inclined to shut down rather than do anything not neutral or better from every perspective. This system is pretty likely useless, but likely to be safe too, and not certainly useless. Variants allow some negatives, but I don’t know how one would draw a line—allowing everyone a veto and requiring negotiation with them would be pretty safe, but also nearly useless.
I’m not sure exactly what you’re implying so I’ll state something you may or may not agree with. It seems likely it makes people more cooperative in some areas, and has unknown implications in other areas, so as to whether it makes them ultimately more or less cooperative, that is unknown. But the little we can see is of cooperation increasing, and it would be unreasonable to be greatly surprised in the event that were found to be the overwhelming net effect.
As most possible minds don’t care about humans, I object to using “unfriendly” to mean “an AI that would result in a world that I don’t value.” I think it better to use “unfriendly” to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.
I disagree. I will never refer to anything that wants to kill or torture me as friendly. Because that would be insane. AIs that are friendly to certain other people but not to me are instances of uFAIs in the same way that paperclippers are uFAIs (that are Friendly to paperclips). I incidentally also reject FAI and FAI. Although in the latter case I would still choose it as an alternative to nothing (which likely defaults to extinction).
Mind you the nomenclature isn’t really sufficient to the task either way. I prefer to make my meaning clear of ambiguities. So if talking about “Friendly” AI that will kill me I tend to use the quotes that I just used while if I am talking about something that is Friendly to a specific group I’ll parameterize.
OK—this is included under what I would suggest to call “Friendly”, certainly if it only wanted to do so instrumentally, so we have a genuine disagreement. This is a good example for you to raise, as most even here might agree with how you put that.
Nonetheless, my example is not included under this, so let’s be sure not to talk past each other. It was intended to be a moderate case, one in which you might not call something friendly when many others here would* - one in which a being wouldn’t desire to torture you, and would be bluffing if only in the sense that it had scrupulously avoided possible futures in which anyone would be tortured, if not in other senses (i.e. it actually would torture you, if you chose the way you won’t).
As for not killing you, that sounds like an obviously badly phrased genie wish. As a similar point to the one you expressed would be reasonable and fully contrast with mine I’m surprised you added that.
One can go either way (or other or both ways) on this labeling. I am apparently buying into the mind-projection fallacy and trying to use “Friendly” the way terms like “funny” or “wrong” are regularly used in English. If every human but me “finds something funny”, it’s often least confusing to say it’s “a funny thing that isn’t funny to me” or “something everyone else considers wrong that I don’t consider “wrong” (according to the simplest way of dividing concept-space) that is also advantageous for me”. You favor taking this new term and avoiding using the MPF, unlike for other English terms, and having it be understood that listeners are never to infer meaning as if the speaker was committing it, I favor just using it like any other term.
So:
My way, a being that wanted to do well by some humans and not others would be objectively both Friendly and Unfriendly, so that might be enough to make my usage inferior. But if my molecules are made out of usefulonium, and no one else’s are, I very much mind a being exploiting me for that, but wouldn’t mind other humans calling that being friendly when it uses the usefulonium to shield the Earth from a supernova, or whatever—and it’s not just not minding by comparison, either.
*I mean both when others refer to beings making analogous threats to them and to the one that would make them to you.
Through me, my dog is included. All the more so mothers’ sons!
I don’t think this is true, the safeguard that’s safe is to shut down if a conflict exists. That way, either things are simply better or no worse; judging between cases when each case has some advantages over the other is tricky.
How? As is, psychopaths have some influence, and I don’t consider the world worthless. Whatever their slice of a much larger pie, how would that be a difference in kind, something other than a lost opportunity?
There is a reasonable good chance that when averaged out by the currently unspecified method used by the CEV process that any abominable volitions are offset by volitions that are at least vaguely acceptable. But that doesn’t mean including Jerks (where ‘Jerk’ is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do any more than including paperclippers, superhappies and babyeaters in the process is obviously The Right Thing To Do.
CEV might turn out OK. Given the choice of setting loose a {Superintelligence Optimising CEV} or {Nothing At All nothing at all and we all go extinct} I’ll choose the former. There are also obvious political reasons why such a compromise might be necessary.
If anyone thinks that CEV is not a worse thing to set loose than CEV then they are not being altruistic or moral they are being confused about a matter of fact.
Disclaimer that is becoming almost mandatory in this kind of discussion: altruism, ethics and morality belong inside utility functions and volitions not in game theory or abstract optimisation processes.
Sure, inclusion is a thing that causes good and bad outcomes, and not necessarily net good outcomes.
Sure, but it’s not logically necessary that it’s a compromise, though it might be. It might be that the good outweighs the bad, or not, I’m not sure from where I stand.
Because I value inclusiveness more than zero, that’s not necessarily true. It’s probably true, or better yet, if one includes the best of the obvious Jerks with the rest of humanity, it’s quite probably true. All else equal, I’d rather an individual be in than out, so if someone is all else equal worse than useless but only light ballast, having them is a net good.
It’s Adam and Eve, not Adam and Vilfredo Pareto!
Huh? Chewbacca?
I think your distinction is artificial, can you use it to show how an example question is a wrong question and another isn’t, and show how your distinction sorts among those two types well?
Your Adam and and Eve reply made absolutely no sense and this question makes only slightly more. I cannot relate what you are saying to the disclaimer that you partially quote (except one way that implies you don’t understand the subject matter—which I prefer not to assume). I cannot answer a question about what I am saying when I cannot see how on earth it is relevant.
You missed my point 3 times out of 3. Wait, I’ll put down the flyswatter and pick up this hammer...:
Excluding certain persons from CEV creates issues that CEV was intended to resolve in the first place. The mechanic you suggest—excluding persons that YOU deem to be unfit—might look attractive to you, but it will not be universally acceptable.
Note that “our coherent extrapolated volition is our wish if we knew more, were smarter...” etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned—you both probably value freedom, justice, friendship, security and like good food, sex and World of Warcraft(1)… you just don’t know why he believes that suicidal fanaticism is the right way under his circumstances, and he is, perhaps, not smart enough to see other options to strive for his values.
Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
(1) that was a backhand with the flyswatter, which I grabbed with my left hand just then.
No. I will NOT assume that extrapolating the volition of people with vastly different preferences to me will magically make them compatible with mine. The universe is just not that convenient. Pretending it is while implementing a FAI is suicidally naive.
I’m familiar with the document, as well as approximately everything else said on the subject here, even in passing. This includes Eliezer propozing ad-hoc work arounds to the “What if people are jerks?” problem.
Quite right, don’t assume. Think it through. Then you may be less inclined to pepper your posts with non-sequiturs like “magically”, “pretending” and “naive”.
Great! But, IMHO, you have a tendency to miss the point. So:
What do you mean? As an analogy, .01% sure and 99.99% sure are both states of uncertainty. EVs are exactly the same or they aren’t. If someone’s unmuddled EV is different than mine—and it will be—I am better off with mine influencing the future alone rather than the future being influenced by both of us, unless my EV sufficiently values that person’s participation.
My current EV places some non-infinite value on each person’s participation. You can assume for the sake of argument each person’s EV would more greatly value this.
You can correctly assume that for each person, all else equal, I’d rather have them than not, (though not necessarily at the cost of having the universe diverted from my wishes) but I don’t really see why the death of most of the single ring species that is everything alive today makes selecting humans alone for CEV the right thing to do in a way that avoids the problem of excluding the disenfranchised whom the creators don’t care sufficiently about.
If enough humans value what other humans want, and more so when extrapolated, it’s an interlocking enough network to scoop up all humans but the biologist who spends all day with chimpanzees (dolphins, octopuses, dogs, whatever) is going to be a bit disappointed by the first-order exclusion of his or her friends from consideration.
I mean, once they both take pains to understand each other’s situation and have a good, long think about it, they would find that they will agree on the big issues and be able to easily accommodate their differences. I even suspect that overall they would value the fact that certain differences exist.
EVs can, of course, be exactly the same, or differ to some degree. But—provided we restrict ourselves to humans—the basic human needs and wants are really quite consistent across an overwhelming majority. There is enough material (on the web and in print) to support this.
Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say “obliterate Israel” or “my way or the highway”) with high level goals.
I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of “simpler” creatures would be impossible. See http://xkcd.com/605/.
You are mistaken. That I entertain no such confusion should be overwhelmingly clear from reading nearby comments.
That sounds awfully convenient. If there really is a threshold of how “non-simple” a lifeform has to be to have coherently extrapolatable volitions, do you have any particular evidence that humans clear that threshold and, say, dolphins don’t?
For my part, I suspect strongly that any technique that arrives reliably at anything that even remotely approximates CEV for a human can also be used reliably on many other species. I can’t imagine what that technique would be, though.
(Just for clarity: that’s not to say one has to take other species’ volition into account, any more than one has to take other individuals’ volition into account.)
The lack of threshold is exactly the issue. If you include dolphins and chimpanzees, explicitly, you’d be in a position to apply the same reasoning to include parrots and dogs, then rodents and octopi, etc, etc.
Eventually you’ll slide far enough down this slippery slope to reach caterpillars and parasitic wasps. Now, what would a wasp want to do, if it understood how its acts affect the other creatures worthy of inclusion in the CEV?
This is what I see as the difficulty in extrapolating the wishes of simpler creatures. Perhaps in fact there is a coherent solution, but having only thought about this a little, I suspect there might not be one.
We don’t have to care. If everyone or nearly all were convinced that something less than 20 pounds had no moral value, or a person less than 40 days old, or whatever, that would be that.
Also, as some infinite sums have finite limits, I do not think that small things necessarily make summing humans’ or the Earth’s morality impossible.
Ah, OK. Sure, if your concern is that, if we extrapolated the volition of such creatures, we would find that they don’t cohere, I’m with you. I have similar concerns about humans, actually.
I’d thought you were saying that we’d be unable to extrapolate it in the first place, which is a different problem.
Just, uh… just making sure: you do know that wedrifid has more fourteen thousand karma for a reason, right? It’s actually not solely because he’s an oldtimer, he can be counted on to have thought about this stuff pretty thoroughly.
Edit: I’m not saying “defer to him because he has high status”, I’m saying “this is strong evidence that he is not an idiot.”
I admit to being a little embarrassed as I wrote that paragraph, because this sort of thing can come across as “fuck you”. Not my intent at all, just that the reference is relevant, well written, supports my point—and is too long to quote.
Having said that, your comment is pretty stupid. Yes, he has heaps more karma here—so what? I have more karma here than R. Dawkins and B. Obama combined!
(I prefer “Godspeed!”)
The “so what” is, he’s already read it. Also, he’s, you know, smart. A bit abrasive (or more than a bit), but still. He’s not going to go “You know, you’re right! I never thought about it that way, what a fool I’ve been!”
Edit: Discussed here.
I suppose “ethical egoism” fits. But only in some completely subverted “inclusive ethical egoist” sense in which my own “self interest” already takes into account all my altruistic moral and ethical values. ie. I’m basically not an ethical egoist at all. I just put my ethics inside the utility function where they belong.
Duly noted! (I apologize for misconstruing you, also.)
I’m not sure this can mean one thing that is also important.
Huh? Yes it can. It means “results in something closer to CEV than the alternative does”, which is pretty damn important given that it is exactly what the context was talking about.
I agree that context alone pointed to that interpretation, but as that makes your statement a tautology, I thought it more likely than not you were referencing a more general meaning than the one under discussion. This was particularly so because of the connotations of “wary”, i.e. “this sort of argument tends to seem more persuasive than it should, but the outside view doesn’t rule them out entirely,” rather than “arguments of this form are always wrong because they are logically inconsistent”.
Because Phlebas’s argument is not, in fact, tautologically false and is merely blatantly false I chose to refrain from a (false) accusation of inconsistency.
Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was “blatantly false”:
Phlebas:
wedrifid:
Note that I have made no particular claim about how likely it is that the selective CEV will closer to the ideal CEV of humanity than the universal CEV. I merely claimed that it is not what they most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
[Here] is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out for a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humaity as a selective CEV would.
Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy. If you are going to use insults, please back them up with a detailed, watertight argument.
How probable exactly is an interesting question, but I shan’t discuss that in this comment since I don’t wish to muddy the waters regarding the nature of the original statement that you were criticising.
Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was blatantly false:
Phlebas:
wedrifid:
Note that I have made no particular claim in this excerpt about how likely it is that a selective CEV would produce output closer to that of an ideal universal CEV dynamic than a universal CEV would. I merely claimed that a universal CEV dynamic designed by humans is not what humans most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
Here is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humanity as a selective CEV would.
Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy.
How probable exactly is an interesting question, but best left alone in this comment since I don’t wish to muddy the waters regarding the nature of the original statement that you were criticising.
The point being that actually, it is worthwhile to point out simply that it is not a logical necessity—because people actually believe that. Once that is accepted, it clears the way for discussion of the actual probability that the AI does such a good job.
Therefore there is not one thing wrong with the excerpt that you quoted (and if you have a problem with another part of the essay, you should at least point out where the fallacy is).
To address the question of the likelihood of the AI patching things up itself:
How much trust do we put in human programmers? In one instance, they would have to create a dynamic that can apply transformations to Nobel laureates; in the other, they must create a dynamic that can apply transformations to a massive number of mutually antagonistic, primitive, low-IQ and superstitious minds.
Furthermore, although speculation about the details of the implementation becomes necessary, using a small group of minds the programmers could learn about these minds in vast detail, specifically identifying any particular problems and conducting tests and trials, whereas with 7 billion or more minds this is impossible.
The initial dynamic is supposed to be capable of generating an improved dynamic. On the other hand, there are certain things the AI can’t help with. The AI does have vast knowledge of its own, but the programmers have specified the way in which the AI is to “increase knowledge” and so forth of the humans in the first place. This is the distinction wedrifid seems to have missed. If this specification is lousy in the first place, then the output that the AI extracts from extrapolating the volition of humanity might be some way off the mark, in comparison to the ouput if “increasing knowledge” etc. was done in an ideal fashion.
The AI may then go on to implement a new CEV dynamic – but this might be a lousy equilibrium generated by an original poor implementation of transforming the volition of humanity, and this poor reflection of human volition is down to the abilities of the human programmers.
On the other hand, it might take a suboptimal initial dynamic (with suboptimal specifications of “increase knowledge”, “grow up closer together etc.), and manage to locate the ideal dynamic. What I dispute is that this is “blatantly” obvious. That is (motivated) overconfidence regarding a scenario that is purely theoretical, and very vague at this point.
And I certainly dispute that it is necessary “by definition”, which is all I actually claimed in my essay!!
In other words, a superintelligence is not immune to GIGO. Getting an output of some kind from the CEV does not guarantee that the superintelligence has circumvented this problem.
Edit: He has disappeared. How is this for a rational quote:
“This is always the tactic of the denialist: lie and run. He never stays for a fight; he never admits error, even the most glaring; his goal is to pack the maximum insult into the minimum number of words.”