I agree with everything in this response. In particular, I don’t mean to “deny the ability to do some amount of logical inference on top of your preferences.”
My point is that it doesn’t answer the key metaethical question of why you ought to act according to any of those ideas.
I mean, because you are applying logical inferences on top of your existing oughts?
As long as you grant that you ought to care about some things, and that you ought to care about things in any kind of coherent way, then you ought to care about the different things that are implied by the things you already ought to care about.
But I feel like I am restating things here, so I might have misunderstood you.
If you ask lots of people whether their moral preferences ought to be self-consistent, they’ll mostly say yes. If you ask lots of people whether their moral preferences are more valid after they think about them longer, after a good night’s sleep, they’ll also mostly say yes.
But also, if you ask lots of people whether it’s moral for their family to be tortured, they’ll mostly say no. And they probably won’t say that no-torture is less important than self-consistency.
Here are three (IMO reasonable) people arguing that moral deliberation / self-consistency does not straightforwardly and universally trump other ways to reach normative conclusions: Scott Alexander:
But I’m not sure I want to play the philosophy game. Maybe MacAskill can come up with some clever proof that the commitments I list above imply I have to have my eyes pecked out by angry seagulls or something. If that’s true, I will just not do that, and switch to some other set of axioms. If I can’t find any system of axioms that doesn’t do something terrible when extended to infinity, I will just refuse to extend things to infinity.
Anyway, if we’re gonna treat CEV (and related things like Long Reflection) as meta-ethical ground truth (and not just as pragmatic projects to design a widely-acceptable ASI motivation system, per my other comment), then we have to grant moral deliberation and self-consistency a special status, NOT just “well yeah self-consistency is one of the things that people feel is good and right, along with all the other things that people feel are good and right”. And I think Arjun is asking: where would this special status come from?
It’s evidently not grounded in people’s moral intuitions, because people’s moral intuitions in favor of self-consistency are not systematically stronger or different-in-kind from people’s moral intuitions in favor of justice or whatever else. Alternatively, if we want to ground it in, like, “well they’d appreciate the value of self-consistency if they thought about it more”, then that’s circular question-begging, because it’s already granting a special status to deliberation.
I think you are probably misinterpreting me here, though the domain is tricky, so that’s understandable.
I advocate that you only take the steps towards consistency that are endorsed. There are really quite a lot of those! This does not require giving (apparent) logical consistency some kind of supremacy. Indeed, I would strongly argue against the kind of philosophy that MacAskill tends to do, and don’t think it really has much to do with the thing that I expect to happen during CEV.
The way I usually phrase it is that you list all the interventions that you could make to your beliefs and brain, and you start doing the ones that seem the most robust under really any viewpoint (e.g. something like “make sure to get enough sleep”). Then you work your way down the list, very conservatively taking actions or propagating beliefs that seem less reversible or robust.[1]
I think the default outcome of this maximally conservative approach is that you still end up somewhere extremely different from where you started, and it doesn’t really require giving self-consistency some kind of dominating overriding status where someone gives you a clever argument with horrifying conclusions and then you have to accept it. Indeed, not accepting those arguments seems extremely wise to me.
Yes, this does require some degree to which my moral beliefs are subject to consistency, but of course, they would have no meaning at all if they were not at least subject to some minimal levels of consistency.
A preference needs to ground in reality somehow, and for the things over which you have preferences to “be real” in some meaningful sense. And the subject of this conversation is the kind of preference that makes sense for humans to endorse and make plans around. A bundle of local-minimization urges does not write internet comments, or thinks about what they would like a future AI system to do with them, or cares about “metaethics” at all.
This would reasonably also include things like “make a copy of yourself that you give veto power to that you check in with after you’ve gone down a path of self-reflection and self-modification”.
That all sounds fine, if we’re engaged in a pragmatic project for deciding what to do, and want to propose an answer that you and I can get behind, and that lots of people around the world can also get behind.
I think Arjun is (rightly) complaining about something different, namely that Eliezer and you and others frequently slip into treating this answer as being fundamentally privileged / “Right”, as opposed to merely a pragmatic option that you and I and lots of people can get behind.
E.g. here’s Nate referring to “the future’s potential value”, as if there’s a metric for that which is canonical and characteristic of humanity-as-a-whole. I think that’s moral-realist (or “crypto”-moral-realist) thinking, sneaking in.
Hmm, I don’t really get this. Or like, I am about as sympathetic to this argument as someone saying “E.g. here’s Nate referring to ‘the future’ as a thing that exists, as if there is consensus on there being a single reality and arrow of time. I think that’s scientific materialist thinking sneaking in, denying the possibility of solipsism or simulationism”. To which my reaction is “yes, metaphysics is actually quite confusing, but come on man, you know what I mean, in as much as words mean anything, this is a fine use of them”.
Similarly here, my reaction is: “Come on man, you know what Nate means. In as much as ‘preferences’ mean anything, there is an up-direction for humanity as a whole, and a down-direction for humanity as a whole, even without any kind of substantial convergence, given how far away we are from the Pareto frontier from anything”.
I agree with everything in this response. In particular, I don’t mean to “deny the ability to do some amount of logical inference on top of your preferences.”
My point is that it doesn’t answer the key metaethical question of why you ought to act according to any of those ideas.
I mean, because you are applying logical inferences on top of your existing oughts?
As long as you grant that you ought to care about some things, and that you ought to care about things in any kind of coherent way, then you ought to care about the different things that are implied by the things you already ought to care about.
But I feel like I am restating things here, so I might have misunderstood you.
If you ask lots of people whether their moral preferences ought to be self-consistent, they’ll mostly say yes. If you ask lots of people whether their moral preferences are more valid after they think about them longer, after a good night’s sleep, they’ll also mostly say yes.
But also, if you ask lots of people whether it’s moral for their family to be tortured, they’ll mostly say no. And they probably won’t say that no-torture is less important than self-consistency.
Here are three (IMO reasonable) people arguing that moral deliberation / self-consistency does not straightforwardly and universally trump other ways to reach normative conclusions: Scott Alexander:
plus Stuart Armstrong here, and Joe Carlsmith discusses this a bunch (kinda arguing both sides) here & here & here.
Anyway, if we’re gonna treat CEV (and related things like Long Reflection) as meta-ethical ground truth (and not just as pragmatic projects to design a widely-acceptable ASI motivation system, per my other comment), then we have to grant moral deliberation and self-consistency a special status, NOT just “well yeah self-consistency is one of the things that people feel is good and right, along with all the other things that people feel are good and right”. And I think Arjun is asking: where would this special status come from?
It’s evidently not grounded in people’s moral intuitions, because people’s moral intuitions in favor of self-consistency are not systematically stronger or different-in-kind from people’s moral intuitions in favor of justice or whatever else. Alternatively, if we want to ground it in, like, “well they’d appreciate the value of self-consistency if they thought about it more”, then that’s circular question-begging, because it’s already granting a special status to deliberation.
I think you are probably misinterpreting me here, though the domain is tricky, so that’s understandable.
I advocate that you only take the steps towards consistency that are endorsed. There are really quite a lot of those! This does not require giving (apparent) logical consistency some kind of supremacy. Indeed, I would strongly argue against the kind of philosophy that MacAskill tends to do, and don’t think it really has much to do with the thing that I expect to happen during CEV.
The way I usually phrase it is that you list all the interventions that you could make to your beliefs and brain, and you start doing the ones that seem the most robust under really any viewpoint (e.g. something like “make sure to get enough sleep”). Then you work your way down the list, very conservatively taking actions or propagating beliefs that seem less reversible or robust.[1]
I think the default outcome of this maximally conservative approach is that you still end up somewhere extremely different from where you started, and it doesn’t really require giving self-consistency some kind of dominating overriding status where someone gives you a clever argument with horrifying conclusions and then you have to accept it. Indeed, not accepting those arguments seems extremely wise to me.
Yes, this does require some degree to which my moral beliefs are subject to consistency, but of course, they would have no meaning at all if they were not at least subject to some minimal levels of consistency.
A preference needs to ground in reality somehow, and for the things over which you have preferences to “be real” in some meaningful sense. And the subject of this conversation is the kind of preference that makes sense for humans to endorse and make plans around. A bundle of local-minimization urges does not write internet comments, or thinks about what they would like a future AI system to do with them, or cares about “metaethics” at all.
This would reasonably also include things like “make a copy of yourself that you give veto power to that you check in with after you’ve gone down a path of self-reflection and self-modification”.
That all sounds fine, if we’re engaged in a pragmatic project for deciding what to do, and want to propose an answer that you and I can get behind, and that lots of people around the world can also get behind.
I think Arjun is (rightly) complaining about something different, namely that Eliezer and you and others frequently slip into treating this answer as being fundamentally privileged / “Right”, as opposed to merely a pragmatic option that you and I and lots of people can get behind.
E.g. here’s Nate referring to “the future’s potential value”, as if there’s a metric for that which is canonical and characteristic of humanity-as-a-whole. I think that’s moral-realist (or “crypto”-moral-realist) thinking, sneaking in.
Hmm, I don’t really get this. Or like, I am about as sympathetic to this argument as someone saying “E.g. here’s Nate referring to ‘the future’ as a thing that exists, as if there is consensus on there being a single reality and arrow of time. I think that’s scientific materialist thinking sneaking in, denying the possibility of solipsism or simulationism”. To which my reaction is “yes, metaphysics is actually quite confusing, but come on man, you know what I mean, in as much as words mean anything, this is a fine use of them”.
Similarly here, my reaction is: “Come on man, you know what Nate means. In as much as ‘preferences’ mean anything, there is an up-direction for humanity as a whole, and a down-direction for humanity as a whole, even without any kind of substantial convergence, given how far away we are from the Pareto frontier from anything”.