By “metaethics,” do you mean something like “a theory of how humans should think about their values”?
I feel like I’ve seen that kind of usage on LW a bunch, but it’s atypical. In philosophy, “metaethics” has a thinner, less ambitious interpretation of answering something like, “What even are values, are they stance-independent, yes/no?”
And yeah, there is often a bit more nuance than that as you dive deeper into what philosophers in the various camps are exactly saying, but my point is that it’s not that common, and certainly not necessary, that “having confident metaethical views,” on the academic philosophy reading of “metaethics,” means something like “having strong and detailed opinions on how AI should go about figuring out human values.”
(And maybe you’d count this against academia, which would be somewhat fair, to be honest, because parts of “metaethics” in philosophy are even further removed from practicality, as they concern the analysis of the language behind moral claims, which, if we compare it to claims about the Biblical God and miracles, it would be like focusing way too much on whether the people who wrote the Bible thought they were describing real things or just metaphores, without directly trying to answer burning questions like “Does God exist?” or “Did Jesus live and perform miracles?”)
Anyway, I’m asking about this because I found the following paragraph hard to understand:
Behind a veil of ignorance, wouldn’t you want everyone to be less confident in their own ideas? Or think “This isn’t likely to be a subjective question like morality/values might be, and what are the chances that I’m right and they’re all wrong? If I’m truly right why can’t I convince most others of this? Is there a reason or evidence that I’m much more rational or philosophically competent than they are?”
My best guess of what you might mean (low confidence) is the following:
You’re conceding that morality/values might be (to some degree) subjective, but you’re cautioning people from having strong views about “metaethics,” which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we’d want for ourselves and others.
Is that roughly correct?
Because if one goes with the “thin” interpretation of metaethics, then “having one’s own metaethics” could be as simple as believing some flavor of “morality/values are subjective,” and it feels like you, in the part I quoted, don’t sound like you’re too strongly opposed to just that stance in itself, necessarily.
By “metaethics,” do you mean something like “a theory of how humans should think about their values”?
I feel like I’ve seen that kind of usage on LW a bunch, but it’s atypical. In philosophy, “metaethics” has a thinner, less ambitious interpretation of answering something like, “What even are values, are they stance-independent, yes/no?”
By “metaethics” I mean “the nature of values/morality”, which I think is how it’s used in academic philosophy. Of course the nature of values/morality has a strong influence on “how humans should think about their values” so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you’ll see below, this is actually not crucial for understanding my post.)
Anyway, I’m asking about this because I found the following paragraph hard to understand:
So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has “metaethics” in it, the text of the post talks generically about any “philosophical questions” that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:
Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.
For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can’t be wrong about their values, and implies “how humans should think about their values” is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.
I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.
You’re conceding that morality/values might be (to some degree) subjective, but you’re cautioning people from having strong views about “metaethics,” which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we’d want for ourselves and others.
If we substitute “how humans/AIs should reason about values” (which I’m not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it’s also a valid interpretation of what I’m trying to convey.
Thanks! That makes sense, and I should have said earlier that I already suspected I likely understood your point and you expressed yourself well – it’s just that (1) I’m always hesitant to put words in people’s mouths, so I didn’t want to say I was confident I could paraphrase your position, and (2) whenever you make posts about metaethics, I’m wondering “oh no, does this apply to me, am I one of the people who is doing the thing he says one shouldn’t do?,” and so I was interested in prompting you to be more concrete about what level of detailedness someone’s confident opinion in that area would have to be before you think they reveal themselves as overconfident.
By “metaethics” I mean “the nature of values/morality”, which I think is how it’s used in academic philosophy.
Yeah, makes sense. I think academic use is basically that with some added baggage that adds mostly confusion. If I were to sum up what I think the use is in academic philosophy, I would say “the nature of values/morality, at a very abstract level and looked at from the lens of analyzing language.” For some reason, academic philosophy is oddly focused on the nature of moral language rather than morality/values directly. (I find it a confusing/unhelpful tradition of, “Language comes first, then comes the territory.”) As a result, classical metaethical positions at best say pretty abstract things about what values are. They might say things like “Values are irreducible (nonnaturalism)” or “Values can be reduced to nonmoral terminology like desires/goals, conscious states, etc. (naturalism),” but without actually telling us the specifics of that connection/reduction. If we were to ask, “Well, how can we know what the right values are?”—then it’s not the case that most metaethicists would consider themselves obviously responsible for answering it! Sure, they might have a personal take, but they may write about their personal take in a way that doesn’t connect their answer to why they endorse a high-level metaethical theory like nonnaturalist moral realism.
Basically, there are (at least) two ways to do metaethics, metaethics via analysis of moral language and metaethics via observation of how people do normative ethics in applied contexts like EA/rationality/longtermism. Academic philosophy does one while LW does the other. And so, to academic philosophers, if they read a comment like the one Jan Kulveit left here about metaethics, my guess is that they would think he’s confusing metaethics for something else entirely (like maybe, “applied ethics but done in a circumspect way, with awareness of the contested and possibly under-defined nature of what we’re even trying to do”).
I have also noticed that when you read the word ”metaethics” on Lesswrong it can mean anything that is in some way related to morality.
Mayby I should take it upon myself to write a short essay on metaethics and how it differs from normative ethics and why it may be of importance to AI alignment.
By “metaethics,” do you mean something like “a theory of how humans should think about their values”?
I feel like I’ve seen that kind of usage on LW a bunch, but it’s atypical. In philosophy, “metaethics” has a thinner, less ambitious interpretation of answering something like, “What even are values, are they stance-independent, yes/no?”
And yeah, there is often a bit more nuance than that as you dive deeper into what philosophers in the various camps are exactly saying, but my point is that it’s not that common, and certainly not necessary, that “having confident metaethical views,” on the academic philosophy reading of “metaethics,” means something like “having strong and detailed opinions on how AI should go about figuring out human values.”
(And maybe you’d count this against academia, which would be somewhat fair, to be honest, because parts of “metaethics” in philosophy are even further removed from practicality, as they concern the analysis of the language behind moral claims, which, if we compare it to claims about the Biblical God and miracles, it would be like focusing way too much on whether the people who wrote the Bible thought they were describing real things or just metaphores, without directly trying to answer burning questions like “Does God exist?” or “Did Jesus live and perform miracles?”)
Anyway, I’m asking about this because I found the following paragraph hard to understand:
My best guess of what you might mean (low confidence) is the following:
You’re conceding that morality/values might be (to some degree) subjective, but you’re cautioning people from having strong views about “metaethics,” which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we’d want for ourselves and others.
Is that roughly correct?
Because if one goes with the “thin” interpretation of metaethics, then “having one’s own metaethics” could be as simple as believing some flavor of “morality/values are subjective,” and it feels like you, in the part I quoted, don’t sound like you’re too strongly opposed to just that stance in itself, necessarily.
By “metaethics” I mean “the nature of values/morality”, which I think is how it’s used in academic philosophy. Of course the nature of values/morality has a strong influence on “how humans should think about their values” so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you’ll see below, this is actually not crucial for understanding my post.)
So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has “metaethics” in it, the text of the post talks generically about any “philosophical questions” that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:
Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.
For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can’t be wrong about their values, and implies “how humans should think about their values” is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.
I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.
If we substitute “how humans/AIs should reason about values” (which I’m not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it’s also a valid interpretation of what I’m trying to convey.
I hope that makes everything a bit clearer?
Thanks! That makes sense, and I should have said earlier that I already suspected I likely understood your point and you expressed yourself well – it’s just that (1) I’m always hesitant to put words in people’s mouths, so I didn’t want to say I was confident I could paraphrase your position, and (2) whenever you make posts about metaethics, I’m wondering “oh no, does this apply to me, am I one of the people who is doing the thing he says one shouldn’t do?,” and so I was interested in prompting you to be more concrete about what level of detailedness someone’s confident opinion in that area would have to be before you think they reveal themselves as overconfident.
Yeah, makes sense. I think academic use is basically that with some added baggage that adds mostly confusion. If I were to sum up what I think the use is in academic philosophy, I would say “the nature of values/morality, at a very abstract level and looked at from the lens of analyzing language.” For some reason, academic philosophy is oddly focused on the nature of moral language rather than morality/values directly. (I find it a confusing/unhelpful tradition of, “Language comes first, then comes the territory.”) As a result, classical metaethical positions at best say pretty abstract things about what values are. They might say things like “Values are irreducible (nonnaturalism)” or “Values can be reduced to nonmoral terminology like desires/goals, conscious states, etc. (naturalism),” but without actually telling us the specifics of that connection/reduction. If we were to ask, “Well, how can we know what the right values are?”—then it’s not the case that most metaethicists would consider themselves obviously responsible for answering it! Sure, they might have a personal take, but they may write about their personal take in a way that doesn’t connect their answer to why they endorse a high-level metaethical theory like nonnaturalist moral realism.
Basically, there are (at least) two ways to do metaethics, metaethics via analysis of moral language and metaethics via observation of how people do normative ethics in applied contexts like EA/rationality/longtermism. Academic philosophy does one while LW does the other. And so, to academic philosophers, if they read a comment like the one Jan Kulveit left here about metaethics, my guess is that they would think he’s confusing metaethics for something else entirely (like maybe, “applied ethics but done in a circumspect way, with awareness of the contested and possibly under-defined nature of what we’re even trying to do”).
I like the details of specific ways people may (implicitly or explicitly) make this mistake regarding meta-ethics in a way that matters.
It almost seems like the post was “Don’t roll your own” and this added “meta-ethics”.
I have also noticed that when you read the word ”metaethics” on Lesswrong it can mean anything that is in some way related to morality.
Mayby I should take it upon myself to write a short essay on metaethics and how it differs from normative ethics and why it may be of importance to AI alignment.