“So that we can regard our present values, as an approximation to the ideal
morality that we would have if we heard all the arguments, to whatever extent
such an extrapolation is coherent.”
This seems to be in the right ballpark, but the answer is dissatisfying
because I am by no means persuaded that the extrapolation would be coherent
at all (even if you only consider one person.) Why would it? It’s
god-shatter, not Peano Arithmetic.
There could be nasty butterfly effects, in that the order in which you
were exposed to all the arguments, the mood you were in upon hearing them and
so forth could influence which of the arguments you came to trust.
On the other hand, viewing our values as an approximation to the ideal
morality that us would have if we heard all the
arguments, isn’t looking good either: correctly predicting a bayesian port of
a massive network of sentient god-shatter looks to me like it would require a
ton of moral judgments to do at all. The subsystems in our brains sometimes
resolve things by fighting (ie. the feeling being in a moral dilemma.)
Looking at the result of the fight in your real physical brain isn’t helpful
to make that judgment if it would have depended on whether you just had a
cup of coffee or not.
So, what do we do if there is more than one basin of attraction a moral
reasoner considering all the arguments can land in? What if there are no
basins?
So, what do we do if there is more than one basin of attraction a moral
reasoner considering all the arguments can land in? What if there are no
basins?
I share Marcello’s concerns as well. Eliezer, have you thought about what to do if the above turns out to be the case?
Also, this post isn’t tagged with “metaethics” for some reason. I finally found it with Matt Simpson’s help.
What makes you think that any coherence exists in the first place? Marcello’s argument seems convincing to me. In the space of possible computations, what fraction gives the same final answer regardless of the order of inputs presented? Why do you think that the “huge blob of computation” that is your morality falls into this small category? There seems to be plenty of empirical evidence that human morality is in fact sensitive to the order in which moral arguments are presented.
Or think about it this way. Suppose an (unFriendly) SI wants to craft an argument that would convince you to adopt a certain morality and then stop paying attention to any conflicting moral arguments. Could it do so? Could it do so again with a different object-level morality on someone else? (This assumes there’s an advantage to being first, as far as giving moral arguments to humans is concerned. Adjust the scenario accordingly if there’s an advantage in being last instead.)
You say the FAI won’t act where coherence doesn’t exist but if you don’t expect coherence now, you ought to be doing something other than building such an FAI, or at least have a contingency plan for when it halts without giving any output?
Most people wouldn’t want to be turned into paperclips?
Of course not, since they haven’t yet heard the argument that would make they want to. All the moral arguments we’ve heard so far have been invented by humans, and we just aren’t that inventive. Even so, we have Voluntary Human Extinction Movement.
Wei, suppose I want to help someone. How ought I to do so?
Is the idea here that humans end up anywhere depending on what arguments they hear in what order, without the overall map of all possible argument orders displaying any sort of concentration in one or more clusters where lots of endpoints would light up, or any sort of coherency that could be extracted out of it?
Wei, suppose I want to help someone. How ought I to do so?
I don’t know. (I mean I don’t know how to do it in general. There are some specific situations where I do know how to help, but lots more where I don’t.)
Is the idea here that humans end up anywhere depending on what arguments they hear in what order, without the overall map of all possible argument orders displaying any sort of concentration in one or more clusters where lots of endpoints would light up, or any sort of coherency that could be extracted out of it?
Yes. Or another possibility is that the overall map of all possible argument orders does display some sort of concentration, but that concentration is morally irrelevant. Human minds were never “designed” to hear all possible moral arguments, so where the concentration occurs is accidental, and perhaps horrifying from our current perspective. (Suppose the concentration turns out to be voluntary extinction or something worse, would you bite the bullet and let the FAI run with it?)
A variety of people profess to consider this desirable if it leads to powerful intelligent life filling the universe with higher probability or greater speed. I would bet that there are stable equilibria that can be reached with arguments.
Carl says that a variety of people profess to consider it desirable that present-day humans get disassembled “if it leads to powerful intelligent life filling the universe with higher probability or greater speed.”
Well, yeah, I’m not surprised. Any system of valuing things in which every life, present and future, has the same utility as every other life will lead to that conclusion because turning the existing living beings and their habitat into computronium, von-Neumann probes, etc, to hasten the start of the colonization of the light cone by a few seconds will have positive expected marginal utility according to the system of valuing things.
...which won’t happen if the computronium is the most important thing and uploading existing minds would slow it down. The AI might upload some humans to get their cooperation during the early stages of takeoff, but it wouldn’t necessarily keep those uploads running once it no longer depended on humans, if the same resources could be used more efficiently for itself.
To get my cooperation, at least, it would have to credibly precommit that it wouldn’t just turn my simulation off after it no longer needs me. (Of course, the meaning of the word “credibly” shifts somewhat when we’re talking about a superintelligence trying to “prove” something to a human.)
This comment got linked a decade later, and so I thought it’s worth stating my own thoughts on the question:
We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that’s not the point) example is “emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there’s a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that”.
I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.
I assert, however, that I’d consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)
CEV may be underdetermined and many-valued, but that doesn’t mean paperclipping is as good an answer as any.
Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don’t have cached thoughts about that.
So, what do we do if there is more than one basin of attraction a moral
reasoner considering all the arguments can land in? What if there are no
basins?
This is a really insightful question, and it hasn’t been answered convincingly in this thread. Does anybody know if it has been discussed more completely elsewhere?
One option would be to say that the FAI only acts where there is coherence. Another would be to specify a procedure for acting when there are multiple basins of attraction (perhaps by weighting the basins according to the proportion of starting points and orderings of arguments that lead to each basin, when that’s possible, or some other ‘impartial’ procedure).
But still, what if it turns out that most of the difficult extrapolations that we would really care about bounce around without ever settling down or otherwise behave undesirably? No human being has ever done anything like the sorts of calculations that would be involved in a deep extrapolation, so our intuitions based on the extrapolations that we have imagined and that seem to cohere (which all have paths shorter than [e.g.] 1000) might be unrepresentative of the sorts of extrapolations than an FAI would actually have to perform.
“So that we can regard our present values, as an approximation to the ideal
morality that we would have if we heard all the arguments, to whatever extent
such an extrapolation is coherent.”
This seems to be in the right ballpark, but the answer is dissatisfying
because I am by no means persuaded that the extrapolation would be coherent
at all (even if you only consider one person.) Why would it? It’s
god-shatter, not Peano Arithmetic.
There could be nasty butterfly effects, in that the order in which you
were exposed to all the arguments, the mood you were in upon hearing them and
so forth could influence which of the arguments you came to trust.
On the other hand, viewing our values as an approximation to the ideal
morality that us would have if we heard all the
arguments, isn’t looking good either: correctly predicting a bayesian port of
a massive network of sentient god-shatter looks to me like it would require a
ton of moral judgments to do at all. The subsystems in our brains sometimes
resolve things by fighting (ie. the feeling being in a moral dilemma.)
Looking at the result of the fight in your real physical brain isn’t helpful
to make that judgment if it would have depended on whether you just had a
cup of coffee or not.
So, what do we do if there is more than one basin of attraction a moral
reasoner considering all the arguments can land in? What if there are no
basins?
I share Marcello’s concerns as well. Eliezer, have you thought about what to do if the above turns out to be the case?
Also, this post isn’t tagged with “metaethics” for some reason. I finally found it with Matt Simpson’s help.
It seems to me that if you build a Friendly AI, you ought to build it to act where coherence exists and not act where it doesn’t.
What makes you think that any coherence exists in the first place? Marcello’s argument seems convincing to me. In the space of possible computations, what fraction gives the same final answer regardless of the order of inputs presented? Why do you think that the “huge blob of computation” that is your morality falls into this small category? There seems to be plenty of empirical evidence that human morality is in fact sensitive to the order in which moral arguments are presented.
Or think about it this way. Suppose an (unFriendly) SI wants to craft an argument that would convince you to adopt a certain morality and then stop paying attention to any conflicting moral arguments. Could it do so? Could it do so again with a different object-level morality on someone else? (This assumes there’s an advantage to being first, as far as giving moral arguments to humans is concerned. Adjust the scenario accordingly if there’s an advantage in being last instead.)
You say the FAI won’t act where coherence doesn’t exist but if you don’t expect coherence now, you ought to be doing something other than building such an FAI, or at least have a contingency plan for when it halts without giving any output?
Most people wouldn’t want to be turned into paperclips?
Of course not, since they haven’t yet heard the argument that would make they want to. All the moral arguments we’ve heard so far have been invented by humans, and we just aren’t that inventive. Even so, we have Voluntary Human Extinction Movement.
Wei, suppose I want to help someone. How ought I to do so?
Is the idea here that humans end up anywhere depending on what arguments they hear in what order, without the overall map of all possible argument orders displaying any sort of concentration in one or more clusters where lots of endpoints would light up, or any sort of coherency that could be extracted out of it?
I don’t know. (I mean I don’t know how to do it in general. There are some specific situations where I do know how to help, but lots more where I don’t.)
Yes. Or another possibility is that the overall map of all possible argument orders does display some sort of concentration, but that concentration is morally irrelevant. Human minds were never “designed” to hear all possible moral arguments, so where the concentration occurs is accidental, and perhaps horrifying from our current perspective. (Suppose the concentration turns out to be voluntary extinction or something worse, would you bite the bullet and let the FAI run with it?)
A variety of people profess to consider this desirable if it leads to powerful intelligent life filling the universe with higher probability or greater speed. I would bet that there are stable equilibria that can be reached with arguments.
Carl says that a variety of people profess to consider it desirable that present-day humans get disassembled “if it leads to powerful intelligent life filling the universe with higher probability or greater speed.”
Well, yeah, I’m not surprised. Any system of valuing things in which every life, present and future, has the same utility as every other life will lead to that conclusion because turning the existing living beings and their habitat into computronium, von-Neumann probes, etc, to hasten the start of the colonization of the light cone by a few seconds will have positive expected marginal utility according to the system of valuing things.
That could still be a great thing for us provided that current human minds were uploaded into the resulting computronium explosion.
...which won’t happen if the computronium is the most important thing and uploading existing minds would slow it down. The AI might upload some humans to get their cooperation during the early stages of takeoff, but it wouldn’t necessarily keep those uploads running once it no longer depended on humans, if the same resources could be used more efficiently for itself.
To get my cooperation, at least, it would have to credibly precommit that it wouldn’t just turn my simulation off after it no longer needs me. (Of course, the meaning of the word “credibly” shifts somewhat when we’re talking about a superintelligence trying to “prove” something to a human.)
Is “not act” a meaningful option for a Singleton?
This comment got linked a decade later, and so I thought it’s worth stating my own thoughts on the question:
We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that’s not the point) example is “emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there’s a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that”.
I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.
I assert, however, that I’d consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)
CEV may be underdetermined and many-valued, but that doesn’t mean paperclipping is as good an answer as any.
Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don’t have cached thoughts about that.
This is a really insightful question, and it hasn’t been answered convincingly in this thread. Does anybody know if it has been discussed more completely elsewhere?
One option would be to say that the FAI only acts where there is coherence. Another would be to specify a procedure for acting when there are multiple basins of attraction (perhaps by weighting the basins according to the proportion of starting points and orderings of arguments that lead to each basin, when that’s possible, or some other ‘impartial’ procedure).
But still, what if it turns out that most of the difficult extrapolations that we would really care about bounce around without ever settling down or otherwise behave undesirably? No human being has ever done anything like the sorts of calculations that would be involved in a deep extrapolation, so our intuitions based on the extrapolations that we have imagined and that seem to cohere (which all have paths shorter than [e.g.] 1000) might be unrepresentative of the sorts of extrapolations than an FAI would actually have to perform.