This post seems to come out of nowhere… I haven’t seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn’t exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)
So I’m not sure what you’re responding to, and maybe the context matters. But something bewilders me about this whole line of reasoning as applied to what seems to be SIAI’s chosen strategy for avoiding non-Friendliness.
(This kind of picks up from my earlier comment. If I’m confused, the confusion may start there.)
You argue that universality and objectivity and so forth are just goals, ones that we as humans happen to sort high. Sure, agreed.
You argue that it’s wrong to decide what to do on the basis of those goals, because they are merely instrumental; you argue that other goals ( perhaps “life, consciousness, and activity; health and strength...” etc.) are right, or at least more right. Agreed with reservations.
You argue that individual minds will disagree on all of those goals, including the right ones. That seems guaranteed in the space of all possible minds, likely in the space of all evolved minds, and plausible in the space of all human minds.
And, you conclude, just because some mind disagrees with a goal doesn’t mean that goal isn’t right. And if the goal is right, we should pursue it, even if some mind disagrees. Even if a majority of minds disagree. Even (you don’t say this but it seems to follow) if it makes a majority of minds unhappy.
So… OK. Given that, I’m completely confused about why you support CEV.
Part of the point of CEV seems to be that if there is some goal that some subset of a maximally informed and socialized but not otherwise influenced human race would want to see not-achieved, then a process implementing CEV will make sure that the AGI it creates will not pursue that goal. So, no paperclippers. Which is great, and good, and wonderful.
(That said, I see no way to prove that something really is a CEV-implementing AI, even after you’ve turned it on, so I’m not really sure what this strategy buys us in practice. But perhaps you get to that later, and in any case it’s beside my point here.)
And presumably the idea is that humanity’s CEV is different from, say, the SIAI’s CEV, or LW’s CEV, or my personal CEV. Otherwise why complicate matters by involving an additional several billion minds?
But… well, consider the set G of goals in my CEV that aren’t in humanity’s CEV. It’s clear that the goals in G aren’t shared by all human minds… but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?
I’m not trying to make any special claim about my own mind, here. The same argument goes for everyone. To state it more generally, consider this proposition (P): for every right goal some human has, that goal is shared by all humans.
If P is true, then there’s no reason to calculate humanity’s CEV… any human’s CEV will do just as well. If P is false, then implementing humanity’s CEV fails to do the right thing.
But… well, consider the set G of goals in my CEV that aren’t in humanity’s CEV. It’s clear that the goals in G aren’t shared by all human minds… but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?
You need to distinguish between goals you have which the rest of humanity doesn’t like, from goals you have which the rest of humanity doesn’t care about. Since you are part of humanity, the only way that one of your goals could be excluded from the CEV is if someone else (or humanity in general) has a goal that’s incompatible and which is more highly weighted. If one of your goals is to have a candy bar, no one else really cares whether you have one or not, so the CEV will bring you one; but if one of your goals is to kill someone, then that goal would be excluded because it’s incompatible with other peoples’ goal of not dying.
The most common way for goals to be incompatible is to require the same resources. In that case, the CEV would do some balancing—if a human has the goal “maximize paperclips”, the CEV will allocate a limited amount of resources to making paperclips, but not so many that it can’t also make nice houses for all the humans who want them and fulfill various other goals.
Balancing resources among otherwise-compatible goals makes sense, sort of. It becomes tricky if resources are relevantly finite, but I can see where this would work.
Balancing resources among incompatible goals (e.g., A wants to kill B, B wants to live forever) is, of course, a bigger problem. Excluding incompatible goals seems a fine response. (Especially if we’re talking about actual volition.)
I had not yet come across the weighting aspect of CEV; I’d thought the idea was the CEV-implementing algorithm eliminates all goals that are incompatible with one another, not that it chooses one of them based on goal-weights and eliminates the others.
I haven’t a clue how that weighting happens. A naive answer is some function of the number of people whose CEV includes that goal… that is, some form of majority rule. Presumably there are better answers out there. Anyway, yes, I can see how that could work, sort of.
All of which is cool, and thank you, but it leaves me with the same question, relative to Eliezer’s post, that I had in the first place. Restated: if a goal G1 is right (1) but is incompatible with a higher-weighted goal that isn’t right, do we want to eliminate G1? Or does the weighting algorithm somehow prevent this?
==
(1) I’m using “right” here the same way Eliezer does, even though I think it’s a problematic usage, because the concept seems really important to this sequence… it comes up again and again. My own inclination is to throw the term away, personally.
Maybe CEV is intended to get some Right stuff done.
It would be kind of impossible, given we are right due to a gift of nature, not due to a tendency of nature, do design an algorithm which would actually be able to sort all into the Right, Not right, and Borderline categories.
I suppose Eliezer is assuming that the moral gift we have will be a bigger part of CEV than it would be of some other division of current moralities.
This post seems to come out of nowhere… I haven’t seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn’t exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)
This post seems to come out of nowhere… I haven’t seen any comments by Roko while reading up to this point, the Google link you provide turns up nothing relevant, and the bloglink doesn’t exist. (I gather from casual searching that there was some kind of political blowup and Roko deleted all his contributions.)
So I’m not sure what you’re responding to, and maybe the context matters. But something bewilders me about this whole line of reasoning as applied to what seems to be SIAI’s chosen strategy for avoiding non-Friendliness.
(This kind of picks up from my earlier comment. If I’m confused, the confusion may start there.)
You argue that universality and objectivity and so forth are just goals, ones that we as humans happen to sort high. Sure, agreed.
You argue that it’s wrong to decide what to do on the basis of those goals, because they are merely instrumental; you argue that other goals ( perhaps “life, consciousness, and activity; health and strength...” etc.) are right, or at least more right. Agreed with reservations.
You argue that individual minds will disagree on all of those goals, including the right ones. That seems guaranteed in the space of all possible minds, likely in the space of all evolved minds, and plausible in the space of all human minds.
And, you conclude, just because some mind disagrees with a goal doesn’t mean that goal isn’t right. And if the goal is right, we should pursue it, even if some mind disagrees. Even if a majority of minds disagree. Even (you don’t say this but it seems to follow) if it makes a majority of minds unhappy.
So… OK. Given that, I’m completely confused about why you support CEV.
Part of the point of CEV seems to be that if there is some goal that some subset of a maximally informed and socialized but not otherwise influenced human race would want to see not-achieved, then a process implementing CEV will make sure that the AGI it creates will not pursue that goal. So, no paperclippers. Which is great, and good, and wonderful.
(That said, I see no way to prove that something really is a CEV-implementing AI, even after you’ve turned it on, so I’m not really sure what this strategy buys us in practice. But perhaps you get to that later, and in any case it’s beside my point here.)
And presumably the idea is that humanity’s CEV is different from, say, the SIAI’s CEV, or LW’s CEV, or my personal CEV. Otherwise why complicate matters by involving an additional several billion minds?
But… well, consider the set G of goals in my CEV that aren’t in humanity’s CEV. It’s clear that the goals in G aren’t shared by all human minds… but why is that a good reason to prevent an AGI from implementing them? What if some subset of G is right?
I’m not trying to make any special claim about my own mind, here. The same argument goes for everyone. To state it more generally, consider this proposition (P): for every right goal some human has, that goal is shared by all humans.
If P is true, then there’s no reason to calculate humanity’s CEV… any human’s CEV will do just as well. If P is false, then implementing humanity’s CEV fails to do the right thing.
What am I missing here?
You need to distinguish between goals you have which the rest of humanity doesn’t like, from goals you have which the rest of humanity doesn’t care about. Since you are part of humanity, the only way that one of your goals could be excluded from the CEV is if someone else (or humanity in general) has a goal that’s incompatible and which is more highly weighted. If one of your goals is to have a candy bar, no one else really cares whether you have one or not, so the CEV will bring you one; but if one of your goals is to kill someone, then that goal would be excluded because it’s incompatible with other peoples’ goal of not dying.
The most common way for goals to be incompatible is to require the same resources. In that case, the CEV would do some balancing—if a human has the goal “maximize paperclips”, the CEV will allocate a limited amount of resources to making paperclips, but not so many that it can’t also make nice houses for all the humans who want them and fulfill various other goals.
Balancing resources among otherwise-compatible goals makes sense, sort of. It becomes tricky if resources are relevantly finite, but I can see where this would work.
Balancing resources among incompatible goals (e.g., A wants to kill B, B wants to live forever) is, of course, a bigger problem. Excluding incompatible goals seems a fine response. (Especially if we’re talking about actual volition.)
I had not yet come across the weighting aspect of CEV; I’d thought the idea was the CEV-implementing algorithm eliminates all goals that are incompatible with one another, not that it chooses one of them based on goal-weights and eliminates the others.
I haven’t a clue how that weighting happens. A naive answer is some function of the number of people whose CEV includes that goal… that is, some form of majority rule. Presumably there are better answers out there. Anyway, yes, I can see how that could work, sort of.
All of which is cool, and thank you, but it leaves me with the same question, relative to Eliezer’s post, that I had in the first place. Restated: if a goal G1 is right (1) but is incompatible with a higher-weighted goal that isn’t right, do we want to eliminate G1? Or does the weighting algorithm somehow prevent this?
==
(1) I’m using “right” here the same way Eliezer does, even though I think it’s a problematic usage, because the concept seems really important to this sequence… it comes up again and again. My own inclination is to throw the term away, personally.
Maybe CEV is intended to get some Right stuff done.
It would be kind of impossible, given we are right due to a gift of nature, not due to a tendency of nature, do design an algorithm which would actually be able to sort all into the Right, Not right, and Borderline categories.
I suppose Eliezer is assuming that the moral gift we have will be a bigger part of CEV than it would be of some other division of current moralities.
Thus rendering CEV a local optimum within a given set of gifted minds.
Yup—but archive.org still has it.