The few times I raised this question in the past, my comments were met with either indifference or hostility. I will try to raise it one more time in this open thread. If you think the question deserves a downvote, could you please, in addition to downvoting me, leave a brief comment explaining your rationale for doing so? I promise to upvote all comments providing such explanations.
So, here’s the question: What is the reason for defining the class of beings whose volitions are to be coherently extrapolated as the class of present human beings? Why present and not also future (or past!)? Why human and not, say, mammals, males, or friends of Eliezer Yudkowsky?
Note that the question is not: Why should we value only present people? This way of framing the problem already assumes that “we” (i.e., present human beings) are the subjects whose preferences are to be accorded relevance in the process of coherent extrapolation, and that the interests of any other being (present or future, human or nonhuman) should matter only to the extent that “we” value them. What I am asking for, rather, is a justification of the assumption that only “our” preferences matter.
Luke lists “Why extrapolate the values of humans alone? What counts as a human? Do values converge if extrapolated?” as an open question in So You Want to Save the World.
Would the choice to extrapolate the values of humans alone be an unjustified act of speciesism, or is it justified because humans are special in some way — perhaps because humans are the only beings who can reason about their own preferences? And what counts as a human? The problem is more complicated than one might imagine (Bostrom 2006; Bostrom & Sandberg 2011). Moreover, do we need to scan the values of all humans, or only some? These problems are less important if values converge upon extrapolation for a wide variety of agents, but it is far from clear that this is the case (Sobel 1999, Doring & Steinhoff 2009).
Of course, the premise that “humans are the only beings who can reason about their own preferences” could only justify the conclusion that some human beings are special, since there are members of the human species who lack that ability. Similar objections could be raised against any other proposed candidate property. This has long been recognized by moral philosophers.
Of course, the premise that “humans are the only beings who can reason about their own preferences” could only justify the conclusion that some human beings are special, since there are members of the human species who lack that ability.
In our society we don’t really respect the volition of those human beings. We give them legal guardians who are supposed to decide in their interests instead of letting them make their own decisions. We don’t let them vote in our elections.
Of course, the premise that “humans are the only beings who can reason about their own preferences” could only justify the conclusion that some human beings are special, since there are members of the human species who lack that ability.
In our society we don’t really respect the volition of those human beings. We give them legal guardians who are supposed to decide in their interests instead of letting them make their own decisions. We don’t let them vote in our elections.
That is not because we don’t regard their preferences as valuable in themselves, but simply because these beings lack the means to do the kinds of things that would allow them to satisfy those preferences. In any case, CEV does not exclude such humans from the class of creatures whose volitions are to be coherently extrapolated.
I see no reason to restrict our preference extrapolation to presently-existing humans. CEV should extrapolate from all preferences, which includes the preferences of all sentient beings, present and future. Any attempt to place boundaries on this require justification.
Edit: You might say, “Why not also include rocks in our consideration?” Simple: rocks don’t have preferences. Sentient beings (including many non-human animals) have preferences.
If ants and beetles are sentient, then CEV should take their preferences into account. It sounds like you’re trying to use this as a reductio ad absurdum of my claim, but I don’t believe that works. If ants and beetles are sentient then they deserve consideration, no matter how unintuitive that may seem.
If ants and beetles are sentient, then CEV should take their preferences into account.
No it shouldn’t.
Elaboration: Your ‘should’ claim indicates both that you have a preference for CEV (if not all then at least up to the inclusion of ants and beetles if they are sentient) and that you assert it as a tribal norm. Many others don’t implicitly instantiate CEV in that way and instead instantiate it to CEV. The most common favored group being ‘all humans’. To those people your unqualified assertion would be interpreted as false.
I’m not sure that there is community consensus that “human beings currently living” is the right reference class. Eliezer suggests that he thinks the right reference class is all of humanity ever in this post.
If one assumes some kind of moral progress constraint and unpredictable future values, CEV(living humans) seems like our future descendents would hate it. Certainly, modern Westerners probably would hate CEV(Europeans-alive-in-1300). But I’m a moral anti-realist, so I don’t believe there are constraints that cause moral progress—and don’t expect CEV(all-humans-ever) to output a morality.
Gwern collects some evidence against the proposition. The fact that people disagree and think morality is timeless in some sense is not particularly strong evidence when compared to results of competent historical analysis.
Of course, which historical analysis is considered credible is fairly controversial.
Part of the point of CEV is to make the extrapolation process good enough that future beings X won’t hate the extrapolation of arbitrary past group Y. The extrapolation should be effective and broad enough that extrapolating from humans in different parts of history would not appreciably change the outcome. My guess would be that the extrapolation process itself would provide most of the content, the starting reference class being a minor variable.
Resolving that issue is part of the overall goal of the SI, and a huge project. I’m also a moral anti-realist, by the way. CEV should be starter-insensitive w/ respect to humans from different time periods. My reasons for why I think that this is achievable in principle would be a whole post.
I’d be very interested in a theory that harmonized CEV with moral anti-realism.
And you seem to believe in a very strong form of extrapolation. I’m personally skeptical that CEV(modern-humanity) would output anything, while you assert CEV(modern-humanity) = CEV(ancient Greece). Surely you don’t think CEV(Clippy) = CEV(humanity).
minor terminology note: I’ve always used CEV and (moral) extrapolation interchangeably. If there’s a reason I shouldn’t do that, I’d appreciate an explanatory pointer.
Well, moral extrapolation is a broader category than CEV. CEV suggests, for instance, that we should also take into account the social dynamics that would influence the development of morality (“grown up farther together”), while you could conceivably also have a moral extrapolation approach which considered that irrelevant.
(One could also argue that it is the addition of social dynamics which helps justify the notion of CEV(modern-humanity) = CEV(ancient Greece), given that it was technological and social dynamics which got us from the values-of-ancient-Greece to values-of-today. Of course, that presupposes a deterministic view of history, which seems to me highly implausible. It also opens the door for all kinds of nasty social dynamics.)
No one else seems to be giving what is IMO the correct answer; I want the values of a created FAI to match my own, extrapolated. ie moral selfishness.
I would actually prefer that the extrapolation seed be drawn only from SI supporters (or ideally just me, but that’s unlikely to fly), because I’m uneasy about what happens if some of my values turn out to be memetic, and they get swamped/outvoted by a coherent extrapolated deathist or hedonist memplex. Or if you include, for example, uplifted sharks in the process.
I too would prefer super AI to look to my values when deciding what to implement.
But, given the existence of moral disagreement, I don’t see why that deserves to be labeled Friendly. And the whole point of CEV or similar process is to figure out what is awesome for humanity. Implementing something other than what is awesome for all of humanity is not Friendly.
If deathism really is what is awesome for all humanity, I expect a FAI to implement deathism. But there’s no particular reason to believe that deathism is what is awesome for humanity.
Tim, your comment highlights the potential conflict between CEV and FAI that I also mentioned previously. FAI is by definition not hostile to human beings, whereas CEV might permit, or even require, the extinction of all humanity. This may happen, for instance, if the process of coherent extrapolation shows that humans value certain superior beings more than they value themselves, and if the coexistence of humans and these beings is impossible.
When I pointed out this problem, both Kaj Sotala and Michael Anissimov replied that CEV can never condone hostile actions towards humanity because FAI is “defined as ‘human-benefiting, non-human harming’”. However, this reply just proves my point, namely that there is a potential internal inconsistency between CEV and FAI.
Don’t look at me to resolve that conflict. I think moral extrapolation is unlikely to output anything coherent if the reference class is sufficiently large to avoid the objections I raised above. And I can’t think of any other plausible candidate to produce Friendly instructions for an AI.
Slight sidetrack: By the time AI seems plausible, I think it’s likely that the human race will have done enough self-modification (computer augmentation, biological engineering) that the question of what’s human is going to be more difficult than it is now.
I was thinking “member of the species Homo sapiens”, but now that you mention it, I’d assign a small probability to genetically modified humans which can’t interbreed with other humans. I don’t have anything specific in mind, it’s just that if genetic modification becomes at all common, a lot of possibilities open up, and some of the good ones might be incompatible with mutual fertility....whatever that means under the circumstances.
I would also like to see this discussion. It isn’t terribly clear to me why the extinction of the human race and its replacement with some non-human AI is an inherently bad outcome. Why keep around and devote resources to human beings, who at best can be seen as sort of a prototype of true intelligence, since that’s not really what they’re designed for?
While imagining our extinction at the hands of our robot overlords seems unpleasant, if you imagine a gradual cyborg evolution to a post-human world, that seems scary, but not morally objectionable. Besides the Ship of Theseus, what’s the difference?
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
Not clear why? Because he likes people and doesn’t want everyone he knows (including himself), everyone he doesn’t know and any potential descendants of either to die? Doesn’t that sound like a default position? Most people don’t want themselves to go extinct.
“Superintelligent AIs” is not one thing, it’s a class of quadrillions of different possible things. The old Eliezer was probably thinking of one thing when he referred to superintelligences. When you realize that SAIs are a category of beings with more potential diversity than all species that have ever lived, it’s hard to side with them all as a group. You’d have to have poor aesthetics to value them all equally.
Thanks for the clarification. My understanding is that (the current) Eliezer doesn’t merely claim that we shouldn’t value all superintelligent AIs equally; he makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict. This stronger claim seems much harder to defend precisely in light of the fact that the space of possible AIs is so vast. Surely there must be some AIs in this heterogenous group whose survival is preferable to that of creatures like us?
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
“Well,” says the one, “maybe according to your provincial human values, you wouldn’t like it. But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals. And that’s fine by me. I’m not so bigoted as you are. Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things—”
My friend, I have no problem with the thought of a galactic civilization vastly unlike our own… full of strange beings who look nothing like me even in their own imaginations… pursuing pleasures and experiences I can’t begin to empathize with… trading in a marketplace of unimaginable goods… allying to pursue incomprehensible objectives… people whose life-stories I could never understand.
That’s what the Future looks like if things go right.
If the chain of inheritance from human (meta)morals is broken, the Future does not look like this. It does not end up magically, delightfully incomprehensible.
With very high probability, it ends up looking dull. Pointless. Something whose loss you wouldn’t mourn.
That’s helpful. I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it’s misleading to use the locution ‘friendly AI’ to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn’t be if they knew what was actually meant by that expression.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
[Eliezer] makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict.
Kaj replied:
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard.
I then said:
I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive.
But now you reply:
Friendly AI is defined as “human-benefiting, non-human harming”.
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
That doesn’t sound quite right either, given Eliezer’s unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn’t.)
ETA: Friendly AI was also explicitly defined as “human-benefiting” in e.g. Creating Friendly AI:
The term “Friendly AI” refers to the production of human-benefiting, non-humanharming
actions in Artificial Intelligence systems that have advanced to the point of
making real-world plans in pursuit of goals.
Even though Eliezer has declared CFAI as outdated, I don’t think that particular bit is.
As I understand Eliezer’s current position, it is that the right thing to optimize the universe for is the set of things humans collectively value (aka “CEV(humanity)”).
On this account the space of all possible optimizing systems (aka “AIs” or “AGIs”) can be divided into two sets: those which optimize for CEV(humanity) (aka “Friendly AIs”), and those which optimize for something else (aka “Unfriendly AIs”).
And Friendly AIs are the right thing to “side with”, as you put it here, because CEV(humanity) is on this account the right thing to optimize for.
On this account, “why side with Friendly AI over Unfriendly?” is roughly equivalent to asking “why do the right thing?”
The survival of creatures like us is entirely beside the point. Maybe CEV(humanity) includes the survival of creatures like us and maybe it doesn’t.
Now, you might ask, why is CEV(humanity) the right thing to optimize the universe for, as opposed to something else? To which I think Eliezer’s reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
Some people (myself among them) find this an unconvincing argument. That said, I don’t think anyone has made a convincing argument that some specific other thing is better to optimize for, either.
To which I think Eliezer’s reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
No. The argument is more like that there’s no source of complex value in the world besides humans, and writing complex values line by line would take thousands of years, so we are forced to use some combination and/or extrapolation of human values, whether we want to or not.
If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it’s fundamentally incoherent, I’d be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be “copying” human values in a machine, and would demonstrate that “writing” human values is possible. You could then edit a couple morally relevant bits, and you’d be demonstrating that you could “create” a human-like but slightly edited morality. Evaluating whether it is “superior” by some metric would be a whole additional exercise, though.
I don’t think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of “superior,” I agree completely that it’s possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies—well, it asserts—that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY’s view, evaluating whether the “edited morality” you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn’t.
Actually, now that I think about it more… when you say “there’s no source of complex value in the world besides humans”, do you mean to suggest that aliens with equally complex incompatible values simply can’t exist, or that if they did exist EY’s conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a “superior” morality, though I think that too value-loaded of a word to use; the better word is “extrapolated”. The whole idea of Friendly AI is to create a moral agent that continues to progress. So I’m not sure why you’re claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I’m sure they do in some faraway place. Those aliens don’t live here, though, so I’m not sure why we’d want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of “metamoral continuity” through the intelligence explosion.
To some extent, I think we may be talking past each other when I talk about values and you reply about moralities.
To clarify: would you say that this process you refer to of creating a different “morality” (whether it’s different by virtue of being superior or extrapolated or something else is beside my point right now) keeps values fixed, or not?
I think it depends on what is meant by “values”. I would say that the values change while the fundamental motivations are fixed, though Vladimir’s response makes me unsure about this. Another way of saying it is that supergoals are fixed but the “Friendliness content” changes. (Though I haven’t seen the phrase “Friendliness content” around much lately, perhaps it’s being discarded in favor of more formal terms.)
Maybe another useful distinction would be between Friendliness structure and content (see the CFAI entry on the wiki).
I have to admit, the proliferation of terms in this discussion is making me less and less clear that I understand what was being said when you corrected me initially, despite several attempts to clarify it. So I’m going to suggest that we roll back and try this again, keeping our working vocabulary as well-defined as we can.
As I understand EY’s account:
He endorses building an optimization process (that is, a process that acts to maximize the amount of some specified target) that uses as its target the set of human terminal values (that is, the things that we want for their own sake, rather than wanting because we believe they’ll get us something else).
He also endorses building this process in such a way that it will improve itself as required so as to be able to exert superhuman optimizing power towards its target. The term “Friendly AI” refers to processes of this sort—that is, self-improving superhuman optimization processes that use as their target the set of human terminal values.
He also endorses a particular process (building a seed AI that analyzes humans) as a way of identifying the set of human terminal values. The term “CEV” (or, sometimes, “CEV(humanity)”) refers to the output of such an analysis.
He endorses all of this not only as pragmatic for our purposes, but also as the morally right thing to do. Even if there’s an equally complex species out there whose terminal values differ from ours, on EY’s account the morally right thing to do is optimize the universe for our terminal values rather than for theirs or for some compromise between the two. Members of that species might believe that humans are wrong to do so, but if so they’ll be mistaken.
I understand that you believe I’m mistaken about some or all of the above. I’m really not clear at this point on what you think is mistaken, or what you think is true instead.
Can you edit the above to reflect where you think it’s mistaken?
The only part I disagree with strongly is the language of the last point. Referring to CEV as “THE morally right thing to do” makes it seem as if it were set in stone as the guaranteed best path to creating FAI, which it isn’t. EY argues that building Friendly AI instead of just letting the chips fall where they may is the morally right thing to do, and I’d agree with that, but not that CEV specifically is the right thing to do.
One general goal point for FAI is to target outcomes “at least as good” as those which would be caused by benevolent human mind upload(s). So, the kind of “moral development” that a community of uploads would undergo should be encapsulated within a FAI. In fact, any beneficial area of the moral state space that would be accessible starting from humans or any combination of humans and tools should be accessible by a good FAI design. CEV is one such proposal towards such a design.
As I understand it, yes, the thinking is to optimize for our terminal values instead of this hypothetical alien species or some compromise of the two. However, if values among different intelligent species converge given greater intelligence, knowledge, and self-reflection, then we would expect our FAI to have goals that converge with the alien FAI. If values do not converge, then we would suppose our FAI to have different values than alien FAIs.
A “terminal value” might include carefully thinking through philosophical questions such as this and designing the best goal content possible given these considerations. So, if there are hypothetical alien values that seem “correct” (or simply sufficiently desirable from the subjective perspective) to extrapolated humanity, these values would be integrated into the CEV-output.
I agree that EY does not assert that his proposed process for defining FAI’s optimization target (that is, seed AI calculating CEV) is necessarily the best path to FAI, nor that that proposed process is particularly right. Correction accepted.
And yes, I agree that on EY’s account, given an alien species whose values converge with ours, a system that optimizes for our terminal values also optimizes for theirs.
Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
FAI’s goals should be fixed, unchanging (by initial design). I see three possible things related to a FAI that could be described as involving a “changing morality”. First, it’s possible that the definition of FAI’s unchanging goals could take the form where it makes sense to talk about some process of change in provisional goals, but this process of change would be a part of the definition of the unchanging result. For something like CEV, we might say that CEV is the first stage that takes care of collecting initial data from humans, tries to “extrapolate” goals from this data, decides on whether it can formulate FAI’s goals, and if successful runs a FAI with these (fixed) goals.
Second, the world managed by FAI might contain agents with changing morality, if the FAI decides that agents with changing morality are the right thing to create or maintain, according to FAI’s fixed morality.
And third, FAI itself might take significant time in understanding the logical implications of the fixed definition of its morality, either in general or as applied to particular (hypothetical) situations. Even mathematics with elementary axioms that human mathematicians do is quite complicated. Useful parts of the mathematics of human value might take billions of years to figure out.
Yeah, that’s an interesting question. I’ll offer a conjecture.
From my understanding, one of the fundamental assumptions of FAI is that there is somehow a stable moral attractor for every AI that is in the local neighborhood of its original goals, or perhaps only that this attractor is possible. No matter how intelligent the machine gets, no matter how many times it improves itself, it will consciously attempt to stay in the local neighborhood of this point (ala the Gandhi murder pill analogy).
If an AI is designed with a moral attractor that is essentially random, and thus probably totally antithetical to human values (such as paperclip manufacture), then it’s hard to be on the side of the machines. Giving control of the world over to machine super-intelligences sounds like an okay idea if you imagine them growing, doing science, populating the universe, etc., but if they just tear apart the world to make paperclips in an exceptionally clever manner, then perhaps it isn’t such a good idea. This is to say, if the machines use their intelligence to derive their morality, then siding with the machines is all well and good, but if their morality is programmed from the start, and the machines are merely exceptionally skilled morality executors, then there’s no good reason to be on the sides of the machines just because they execute their random morality much more effectively.
I am fairly hesitant to agree with the idea of the moral attractor, along with the goals of FAI in general. I understand the idea only through analogy, which is to say not at all, and I have little idea what would dictate the peaks and valleys of a moral landscape, or even the coordinates really. It also isn’t clear to me that a machine of such high intelligence would be incapable of forming new value systems, and perhaps discarding its preference for paper clips if there was no more paper to clip together.
While I’m exploring a very wide hypothesis space here about a person I know essentially nothing about, this sort of reasoning is at least consistent with what appears to be the thinking that undergirds work on FAI.
It also raises a very interesting question, which is perhaps more fundamental, and that is whether moral preferences are a function of intelligence or not. If so, the beings far more intelligent than us would presumably be more moral, and have a reasonable claim for our moral support. If not, then they’re simply more clever and more powerful, and neither is a particularly good reason to welcome our robot overlords.
An idea I just had, which I’m sure others have considered, but I will merely note here, is that a recursively self-modifying AI would be subject to Darwinian evolution, with lines of code analogous to individual genes, and indeed if there is a stable attractor for such an AI, it seems likely to be about as moral as evolution. which is not particularly encouraging.
It sounds like extra work, and I’m not sure there would be a payoff. Presumably a past person whose volition was coherently extrapolated would lose their racism and other backwards attitudes, and thus be on par with a contemporary person’s coherently extrapolated volition. With future persons, the argument could be made that their CEV can’t be much different from a current person’s for similar reasons.
Presumably a past person whose volition was coherently extrapolated would lose their racism and other backwards attitudes, and thus be on par with a contemporary person’s coherently extrapolated volition.
Even if we grant this assumption, this sort of argument clearly cannot be generalized to justify the exclusion of nonhuman animals—who have preferences that humans routinely disregard—from the class of beings whose volitions are to be coherently extrapolated. Why not run CEV on all present sentient beings?
No preferences “matter” except in relation to each other. The subset of humanity that I value isn’t decided by logic, but by my values and how they interact with humans.
You say that you only value a subset of humanity. But this is irrelevant for CEV, according to which we should extrapolate the preferences of all (present?) humans, not just those of drethelin.
The few times I raised this question in the past, my comments were met with either indifference or hostility. I will try to raise it one more time in this open thread. If you think the question deserves a downvote, could you please, in addition to downvoting me, leave a brief comment explaining your rationale for doing so? I promise to upvote all comments providing such explanations.
So, here’s the question: What is the reason for defining the class of beings whose volitions are to be coherently extrapolated as the class of present human beings? Why present and not also future (or past!)? Why human and not, say, mammals, males, or friends of Eliezer Yudkowsky?
Note that the question is not: Why should we value only present people? This way of framing the problem already assumes that “we” (i.e., present human beings) are the subjects whose preferences are to be accorded relevance in the process of coherent extrapolation, and that the interests of any other being (present or future, human or nonhuman) should matter only to the extent that “we” value them. What I am asking for, rather, is a justification of the assumption that only “our” preferences matter.
Luke lists “Why extrapolate the values of humans alone? What counts as a human? Do values converge if extrapolated?” as an open question in So You Want to Save the World.
Thanks!
Of course, the premise that “humans are the only beings who can reason about their own preferences” could only justify the conclusion that some human beings are special, since there are members of the human species who lack that ability. Similar objections could be raised against any other proposed candidate property. This has long been recognized by moral philosophers.
In our society we don’t really respect the volition of those human beings. We give them legal guardians who are supposed to decide in their interests instead of letting them make their own decisions. We don’t let them vote in our elections.
In our society we don’t really respect the volition of those human beings. We give them legal guardians who are supposed to decide in their interests instead of letting them make their own decisions. We don’t let them vote in our elections.
That is not because we don’t regard their preferences as valuable in themselves, but simply because these beings lack the means to do the kinds of things that would allow them to satisfy those preferences. In any case, CEV does not exclude such humans from the class of creatures whose volitions are to be coherently extrapolated.
I see no reason to restrict our preference extrapolation to presently-existing humans. CEV should extrapolate from all preferences, which includes the preferences of all sentient beings, present and future. Any attempt to place boundaries on this require justification.
Edit: You might say, “Why not also include rocks in our consideration?” Simple: rocks don’t have preferences. Sentient beings (including many non-human animals) have preferences.
What if the majority of sentient beings are ants and beetles?
If ants and beetles are sentient, then CEV should take their preferences into account. It sounds like you’re trying to use this as a reductio ad absurdum of my claim, but I don’t believe that works. If ants and beetles are sentient then they deserve consideration, no matter how unintuitive that may seem.
No it shouldn’t.
Elaboration: Your ‘should’ claim indicates both that you have a preference for CEV (if not all then at least up to the inclusion of ants and beetles if they are sentient) and that you assert it as a tribal norm. Many others don’t implicitly instantiate CEV in that way and instead instantiate it to CEV. The most common favored group being ‘all humans’. To those people your unqualified assertion would be interpreted as false.
I addressed this point in my original comment.
I’m not sure that there is community consensus that “human beings currently living” is the right reference class. Eliezer suggests that he thinks the right reference class is all of humanity ever in this post.
If one assumes some kind of moral progress constraint and unpredictable future values, CEV(living humans) seems like our future descendents would hate it. Certainly, modern Westerners probably would hate CEV(Europeans-alive-in-1300). But I’m a moral anti-realist, so I don’t believe there are constraints that cause moral progress—and don’t expect CEV(all-humans-ever) to output a morality.
Some people would disagree.
Gwern collects some evidence against the proposition. The fact that people disagree and think morality is timeless in some sense is not particularly strong evidence when compared to results of competent historical analysis.
Of course, which historical analysis is considered credible is fairly controversial.
Part of the point of CEV is to make the extrapolation process good enough that future beings X won’t hate the extrapolation of arbitrary past group Y. The extrapolation should be effective and broad enough that extrapolating from humans in different parts of history would not appreciably change the outcome. My guess would be that the extrapolation process itself would provide most of the content, the starting reference class being a minor variable.
It would be convenient if such a process could be proven to exist and rigorously described.
Resolving that issue would do a lot to address the OPs concerns. Separately, it would be a strong reason for me to reject moral anti-realism.
What evidence do we have that such convenient extrapolation is actually possible?
Resolving that issue is part of the overall goal of the SI, and a huge project. I’m also a moral anti-realist, by the way. CEV should be starter-insensitive w/ respect to humans from different time periods. My reasons for why I think that this is achievable in principle would be a whole post.
I’d be very interested in a theory that harmonized CEV with moral anti-realism.
And you seem to believe in a very strong form of extrapolation. I’m personally skeptical that CEV(modern-humanity) would output anything, while you assert CEV(modern-humanity) = CEV(ancient Greece). Surely you don’t think CEV(Clippy) = CEV(humanity).
minor terminology note: I’ve always used CEV and (moral) extrapolation interchangeably. If there’s a reason I shouldn’t do that, I’d appreciate an explanatory pointer.
Well, moral extrapolation is a broader category than CEV. CEV suggests, for instance, that we should also take into account the social dynamics that would influence the development of morality (“grown up farther together”), while you could conceivably also have a moral extrapolation approach which considered that irrelevant.
(One could also argue that it is the addition of social dynamics which helps justify the notion of CEV(modern-humanity) = CEV(ancient Greece), given that it was technological and social dynamics which got us from the values-of-ancient-Greece to values-of-today. Of course, that presupposes a deterministic view of history, which seems to me highly implausible. It also opens the door for all kinds of nasty social dynamics.)
.
You can delete retracted comments if you reload the page.
But not if someone’s replied to the comment.
No one else seems to be giving what is IMO the correct answer; I want the values of a created FAI to match my own, extrapolated. ie moral selfishness.
I would actually prefer that the extrapolation seed be drawn only from SI supporters (or ideally just me, but that’s unlikely to fly), because I’m uneasy about what happens if some of my values turn out to be memetic, and they get swamped/outvoted by a coherent extrapolated deathist or hedonist memplex. Or if you include, for example, uplifted sharks in the process.
I too would prefer super AI to look to my values when deciding what to implement.
But, given the existence of moral disagreement, I don’t see why that deserves to be labeled Friendly. And the whole point of CEV or similar process is to figure out what is awesome for humanity. Implementing something other than what is awesome for all of humanity is not Friendly.
If deathism really is what is awesome for all humanity, I expect a FAI to implement deathism. But there’s no particular reason to believe that deathism is what is awesome for humanity.
Tim, your comment highlights the potential conflict between CEV and FAI that I also mentioned previously. FAI is by definition not hostile to human beings, whereas CEV might permit, or even require, the extinction of all humanity. This may happen, for instance, if the process of coherent extrapolation shows that humans value certain superior beings more than they value themselves, and if the coexistence of humans and these beings is impossible.
When I pointed out this problem, both Kaj Sotala and Michael Anissimov replied that CEV can never condone hostile actions towards humanity because FAI is “defined as ‘human-benefiting, non-human harming’”. However, this reply just proves my point, namely that there is a potential internal inconsistency between CEV and FAI.
Don’t look at me to resolve that conflict. I think moral extrapolation is unlikely to output anything coherent if the reference class is sufficiently large to avoid the objections I raised above. And I can’t think of any other plausible candidate to produce Friendly instructions for an AI.
Slight sidetrack: By the time AI seems plausible, I think it’s likely that the human race will have done enough self-modification (computer augmentation, biological engineering) that the question of what’s human is going to be more difficult than it is now.
By ‘human’, do you mean ‘member of the species Homo sapiens’ or something else?
I was thinking “member of the species Homo sapiens”, but now that you mention it, I’d assign a small probability to genetically modified humans which can’t interbreed with other humans. I don’t have anything specific in mind, it’s just that if genetic modification becomes at all common, a lot of possibilities open up, and some of the good ones might be incompatible with mutual fertility....whatever that means under the circumstances.
I would also like to see this discussion. It isn’t terribly clear to me why the extinction of the human race and its replacement with some non-human AI is an inherently bad outcome. Why keep around and devote resources to human beings, who at best can be seen as sort of a prototype of true intelligence, since that’s not really what they’re designed for?
While imagining our extinction at the hands of our robot overlords seems unpleasant, if you imagine a gradual cyborg evolution to a post-human world, that seems scary, but not morally objectionable. Besides the Ship of Theseus, what’s the difference?
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
Not clear why? Because he likes people and doesn’t want everyone he knows (including himself), everyone he doesn’t know and any potential descendants of either to die? Doesn’t that sound like a default position? Most people don’t want themselves to go extinct.
“Superintelligent AIs” is not one thing, it’s a class of quadrillions of different possible things. The old Eliezer was probably thinking of one thing when he referred to superintelligences. When you realize that SAIs are a category of beings with more potential diversity than all species that have ever lived, it’s hard to side with them all as a group. You’d have to have poor aesthetics to value them all equally.
Thanks for the clarification. My understanding is that (the current) Eliezer doesn’t merely claim that we shouldn’t value all superintelligent AIs equally; he makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict. This stronger claim seems much harder to defend precisely in light of the fact that the space of possible AIs is so vast. Surely there must be some AIs in this heterogenous group whose survival is preferable to that of creatures like us?
I don’t think he makes that claim: all of his arguments on the topic that I’ve seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
That’s helpful. I take it, then, that “friendly” AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it’s misleading to use the locution ‘friendly AI’ to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn’t be if they knew what was actually meant by that expression.
Not “that doesn’t sound quite right”, but “that’s completely wrong”. Friendly AI is defined as “human-benefiting, non-human harming”.
I would say that the defining characteristic of Friendly AI, as the term is used on LW, is that it optimizes for human values.
On this view, if it turns out that human values prefer that humans be harmed, then Friendly AI harms humans, and we ought to prefer that it do so.
That’s not the proper definition… Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by “Friendly AI” around here. No one is arguing that “human values” = “what we absolutely must pursue”. I’m not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it’s imbued with so much moral valence.
Let’s backtrack a bit.
I said:
Kaj replied:
I then said:
But now you reply:
It would clearly be wishful thinking to assume that the countless forms of AIs that “could be genuinely better than us in every regard” would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
That doesn’t sound quite right either, given Eliezer’s unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn’t.)
ETA: Friendly AI was also explicitly defined as “human-benefiting” in e.g. Creating Friendly AI:
Even though Eliezer has declared CFAI as outdated, I don’t think that particular bit is.
As I understand Eliezer’s current position, it is that the right thing to optimize the universe for is the set of things humans collectively value (aka “CEV(humanity)”).
On this account the space of all possible optimizing systems (aka “AIs” or “AGIs”) can be divided into two sets: those which optimize for CEV(humanity) (aka “Friendly AIs”), and those which optimize for something else (aka “Unfriendly AIs”).
And Friendly AIs are the right thing to “side with”, as you put it here, because CEV(humanity) is on this account the right thing to optimize for.
On this account, “why side with Friendly AI over Unfriendly?” is roughly equivalent to asking “why do the right thing?”
The survival of creatures like us is entirely beside the point. Maybe CEV(humanity) includes the survival of creatures like us and maybe it doesn’t.
Now, you might ask, why is CEV(humanity) the right thing to optimize the universe for, as opposed to something else? To which I think Eliezer’s reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
Some people (myself among them) find this an unconvincing argument. That said, I don’t think anyone has made a convincing argument that some specific other thing is better to optimize for, either.
No. The argument is more like that there’s no source of complex value in the world besides humans, and writing complex values line by line would take thousands of years, so we are forced to use some combination and/or extrapolation of human values, whether we want to or not.
Hm.
If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it’s fundamentally incoherent, I’d be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be “copying” human values in a machine, and would demonstrate that “writing” human values is possible. You could then edit a couple morally relevant bits, and you’d be demonstrating that you could “create” a human-like but slightly edited morality. Evaluating whether it is “superior” by some metric would be a whole additional exercise, though.
I don’t think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of “superior,” I agree completely that it’s possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies—well, it asserts—that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY’s view, evaluating whether the “edited morality” you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn’t.
Actually, now that I think about it more… when you say “there’s no source of complex value in the world besides humans”, do you mean to suggest that aliens with equally complex incompatible values simply can’t exist, or that if they did exist EY’s conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a “superior” morality, though I think that too value-loaded of a word to use; the better word is “extrapolated”. The whole idea of Friendly AI is to create a moral agent that continues to progress. So I’m not sure why you’re claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn’t comparing possible, “better” moralities to the current morality essential to the definition of “moral progress” and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I’m sure they do in some faraway place. Those aliens don’t live here, though, so I’m not sure why we’d want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of “metamoral continuity” through the intelligence explosion.
To some extent, I think we may be talking past each other when I talk about values and you reply about moralities.
To clarify: would you say that this process you refer to of creating a different “morality” (whether it’s different by virtue of being superior or extrapolated or something else is beside my point right now) keeps values fixed, or not?
I think it depends on what is meant by “values”. I would say that the values change while the fundamental motivations are fixed, though Vladimir’s response makes me unsure about this. Another way of saying it is that supergoals are fixed but the “Friendliness content” changes. (Though I haven’t seen the phrase “Friendliness content” around much lately, perhaps it’s being discarded in favor of more formal terms.)
Maybe another useful distinction would be between Friendliness structure and content (see the CFAI entry on the wiki).
I have to admit, the proliferation of terms in this discussion is making me less and less clear that I understand what was being said when you corrected me initially, despite several attempts to clarify it. So I’m going to suggest that we roll back and try this again, keeping our working vocabulary as well-defined as we can.
As I understand EY’s account:
He endorses building an optimization process (that is, a process that acts to maximize the amount of some specified target) that uses as its target the set of human terminal values (that is, the things that we want for their own sake, rather than wanting because we believe they’ll get us something else).
He also endorses building this process in such a way that it will improve itself as required so as to be able to exert superhuman optimizing power towards its target. The term “Friendly AI” refers to processes of this sort—that is, self-improving superhuman optimization processes that use as their target the set of human terminal values.
He also endorses a particular process (building a seed AI that analyzes humans) as a way of identifying the set of human terminal values. The term “CEV” (or, sometimes, “CEV(humanity)”) refers to the output of such an analysis.
He endorses all of this not only as pragmatic for our purposes, but also as the morally right thing to do. Even if there’s an equally complex species out there whose terminal values differ from ours, on EY’s account the morally right thing to do is optimize the universe for our terminal values rather than for theirs or for some compromise between the two. Members of that species might believe that humans are wrong to do so, but if so they’ll be mistaken.
I understand that you believe I’m mistaken about some or all of the above.
I’m really not clear at this point on what you think is mistaken, or what you think is true instead.
Can you edit the above to reflect where you think it’s mistaken?
The only part I disagree with strongly is the language of the last point. Referring to CEV as “THE morally right thing to do” makes it seem as if it were set in stone as the guaranteed best path to creating FAI, which it isn’t. EY argues that building Friendly AI instead of just letting the chips fall where they may is the morally right thing to do, and I’d agree with that, but not that CEV specifically is the right thing to do.
One general goal point for FAI is to target outcomes “at least as good” as those which would be caused by benevolent human mind upload(s). So, the kind of “moral development” that a community of uploads would undergo should be encapsulated within a FAI. In fact, any beneficial area of the moral state space that would be accessible starting from humans or any combination of humans and tools should be accessible by a good FAI design. CEV is one such proposal towards such a design.
As I understand it, yes, the thinking is to optimize for our terminal values instead of this hypothetical alien species or some compromise of the two. However, if values among different intelligent species converge given greater intelligence, knowledge, and self-reflection, then we would expect our FAI to have goals that converge with the alien FAI. If values do not converge, then we would suppose our FAI to have different values than alien FAIs.
A “terminal value” might include carefully thinking through philosophical questions such as this and designing the best goal content possible given these considerations. So, if there are hypothetical alien values that seem “correct” (or simply sufficiently desirable from the subjective perspective) to extrapolated humanity, these values would be integrated into the CEV-output.
I agree that EY does not assert that his proposed process for defining FAI’s optimization target (that is, seed AI calculating CEV) is necessarily the best path to FAI, nor that that proposed process is particularly right. Correction accepted.
And yes, I agree that on EY’s account, given an alien species whose values converge with ours, a system that optimizes for our terminal values also optimizes for theirs.
Thanks.
FAI’s goals should be fixed, unchanging (by initial design). I see three possible things related to a FAI that could be described as involving a “changing morality”. First, it’s possible that the definition of FAI’s unchanging goals could take the form where it makes sense to talk about some process of change in provisional goals, but this process of change would be a part of the definition of the unchanging result. For something like CEV, we might say that CEV is the first stage that takes care of collecting initial data from humans, tries to “extrapolate” goals from this data, decides on whether it can formulate FAI’s goals, and if successful runs a FAI with these (fixed) goals.
Second, the world managed by FAI might contain agents with changing morality, if the FAI decides that agents with changing morality are the right thing to create or maintain, according to FAI’s fixed morality.
And third, FAI itself might take significant time in understanding the logical implications of the fixed definition of its morality, either in general or as applied to particular (hypothetical) situations. Even mathematics with elementary axioms that human mathematicians do is quite complicated. Useful parts of the mathematics of human value might take billions of years to figure out.
Yeah, that’s an interesting question. I’ll offer a conjecture.
From my understanding, one of the fundamental assumptions of FAI is that there is somehow a stable moral attractor for every AI that is in the local neighborhood of its original goals, or perhaps only that this attractor is possible. No matter how intelligent the machine gets, no matter how many times it improves itself, it will consciously attempt to stay in the local neighborhood of this point (ala the Gandhi murder pill analogy).
If an AI is designed with a moral attractor that is essentially random, and thus probably totally antithetical to human values (such as paperclip manufacture), then it’s hard to be on the side of the machines. Giving control of the world over to machine super-intelligences sounds like an okay idea if you imagine them growing, doing science, populating the universe, etc., but if they just tear apart the world to make paperclips in an exceptionally clever manner, then perhaps it isn’t such a good idea. This is to say, if the machines use their intelligence to derive their morality, then siding with the machines is all well and good, but if their morality is programmed from the start, and the machines are merely exceptionally skilled morality executors, then there’s no good reason to be on the sides of the machines just because they execute their random morality much more effectively.
I am fairly hesitant to agree with the idea of the moral attractor, along with the goals of FAI in general. I understand the idea only through analogy, which is to say not at all, and I have little idea what would dictate the peaks and valleys of a moral landscape, or even the coordinates really. It also isn’t clear to me that a machine of such high intelligence would be incapable of forming new value systems, and perhaps discarding its preference for paper clips if there was no more paper to clip together.
While I’m exploring a very wide hypothesis space here about a person I know essentially nothing about, this sort of reasoning is at least consistent with what appears to be the thinking that undergirds work on FAI.
It also raises a very interesting question, which is perhaps more fundamental, and that is whether moral preferences are a function of intelligence or not. If so, the beings far more intelligent than us would presumably be more moral, and have a reasonable claim for our moral support. If not, then they’re simply more clever and more powerful, and neither is a particularly good reason to welcome our robot overlords.
An idea I just had, which I’m sure others have considered, but I will merely note here, is that a recursively self-modifying AI would be subject to Darwinian evolution, with lines of code analogous to individual genes, and indeed if there is a stable attractor for such an AI, it seems likely to be about as moral as evolution. which is not particularly encouraging.
It sounds like extra work, and I’m not sure there would be a payoff. Presumably a past person whose volition was coherently extrapolated would lose their racism and other backwards attitudes, and thus be on par with a contemporary person’s coherently extrapolated volition. With future persons, the argument could be made that their CEV can’t be much different from a current person’s for similar reasons.
That’s a lot to presume. Gwern lists some reasons from history to think this statement is unlikely to be true.
Even if we grant this assumption, this sort of argument clearly cannot be generalized to justify the exclusion of nonhuman animals—who have preferences that humans routinely disregard—from the class of beings whose volitions are to be coherently extrapolated. Why not run CEV on all present sentient beings?
No preferences “matter” except in relation to each other. The subset of humanity that I value isn’t decided by logic, but by my values and how they interact with humans.
You say that you only value a subset of humanity. But this is irrelevant for CEV, according to which we should extrapolate the preferences of all (present?) humans, not just those of drethelin.