By “metaethics,” do you mean something like “a theory of how humans should think about their values”?
I feel like I’ve seen that kind of usage on LW a bunch, but it’s atypical. In philosophy, “metaethics” has a thinner, less ambitious interpretation of answering something like, “What even are values, are they stance-independent, yes/no?”
By “metaethics” I mean “the nature of values/morality”, which I think is how it’s used in academic philosophy. Of course the nature of values/morality has a strong influence on “how humans should think about their values” so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you’ll see below, this is actually not crucial for understanding my post.)
Anyway, I’m asking about this because I found the following paragraph hard to understand:
So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has “metaethics” in it, the text of the post talks generically about any “philosophical questions” that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:
Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.
For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can’t be wrong about their values, and implies “how humans should think about their values” is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.
I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.
You’re conceding that morality/values might be (to some degree) subjective, but you’re cautioning people from having strong views about “metaethics,” which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we’d want for ourselves and others.
If we substitute “how humans/AIs should reason about values” (which I’m not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it’s also a valid interpretation of what I’m trying to convey.
Thanks! That makes sense, and I should have said earlier that I already suspected I likely understood your point and you expressed yourself well – it’s just that (1) I’m always hesitant to put words in people’s mouths, so I didn’t want to say I was confident I could paraphrase your position, and (2) whenever you make posts about metaethics, I’m wondering “oh no, does this apply to me, am I one of the people who is doing the thing he says one shouldn’t do?,” and so I was interested in prompting you to be more concrete about what level of detailedness someone’s confident opinion in that area would have to be before you think they reveal themselves as overconfident.
By “metaethics” I mean “the nature of values/morality”, which I think is how it’s used in academic philosophy.
Yeah, makes sense. I think academic use is basically that with some added baggage that adds mostly confusion. If I were to sum up what I think the use is in academic philosophy, I would say “the nature of values/morality, at a very abstract level and looked at from the lens of analyzing language.” For some reason, academic philosophy is oddly focused on the nature of moral language rather than morality/values directly. (I find it a confusing/unhelpful tradition of, “Language comes first, then comes the territory.”) As a result, classical metaethical positions at best say pretty abstract things about what values are. They might say things like “Values are irreducible (nonnaturalism)” or “Values can be reduced to nonmoral terminology like desires/goals, conscious states, etc. (naturalism),” but without actually telling us the specifics of that connection/reduction. If we were to ask, “Well, how can we know what the right values are?”—then it’s not the case that most metaethicists would consider themselves obviously responsible for answering it! Sure, they might have a personal take, but they may write about their personal take in a way that doesn’t connect their answer to why they endorse a high-level metaethical theory like nonnaturalist moral realism.
Basically, there are (at least) two ways to do metaethics, metaethics via analysis of moral language and metaethics via observation of how people do normative ethics in applied contexts like EA/rationality/longtermism. Academic philosophy does one while LW does the other. And so, to academic philosophers, if they read a comment like the one Jan Kulveit left here about metaethics, my guess is that they would think he’s confusing metaethics for something else entirely (like maybe, “applied ethics but done in a circumspect way, with awareness of the contested and possibly under-defined nature of what we’re even trying to do”).
By “metaethics” I mean “the nature of values/morality”, which I think is how it’s used in academic philosophy. Of course the nature of values/morality has a strong influence on “how humans should think about their values” so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you’ll see below, this is actually not crucial for understanding my post.)
So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has “metaethics” in it, the text of the post talks generically about any “philosophical questions” that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:
Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.
For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can’t be wrong about their values, and implies “how humans should think about their values” is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.
I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.
If we substitute “how humans/AIs should reason about values” (which I’m not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it’s also a valid interpretation of what I’m trying to convey.
I hope that makes everything a bit clearer?
Thanks! That makes sense, and I should have said earlier that I already suspected I likely understood your point and you expressed yourself well – it’s just that (1) I’m always hesitant to put words in people’s mouths, so I didn’t want to say I was confident I could paraphrase your position, and (2) whenever you make posts about metaethics, I’m wondering “oh no, does this apply to me, am I one of the people who is doing the thing he says one shouldn’t do?,” and so I was interested in prompting you to be more concrete about what level of detailedness someone’s confident opinion in that area would have to be before you think they reveal themselves as overconfident.
Yeah, makes sense. I think academic use is basically that with some added baggage that adds mostly confusion. If I were to sum up what I think the use is in academic philosophy, I would say “the nature of values/morality, at a very abstract level and looked at from the lens of analyzing language.” For some reason, academic philosophy is oddly focused on the nature of moral language rather than morality/values directly. (I find it a confusing/unhelpful tradition of, “Language comes first, then comes the territory.”) As a result, classical metaethical positions at best say pretty abstract things about what values are. They might say things like “Values are irreducible (nonnaturalism)” or “Values can be reduced to nonmoral terminology like desires/goals, conscious states, etc. (naturalism),” but without actually telling us the specifics of that connection/reduction. If we were to ask, “Well, how can we know what the right values are?”—then it’s not the case that most metaethicists would consider themselves obviously responsible for answering it! Sure, they might have a personal take, but they may write about their personal take in a way that doesn’t connect their answer to why they endorse a high-level metaethical theory like nonnaturalist moral realism.
Basically, there are (at least) two ways to do metaethics, metaethics via analysis of moral language and metaethics via observation of how people do normative ethics in applied contexts like EA/rationality/longtermism. Academic philosophy does one while LW does the other. And so, to academic philosophers, if they read a comment like the one Jan Kulveit left here about metaethics, my guess is that they would think he’s confusing metaethics for something else entirely (like maybe, “applied ethics but done in a circumspect way, with awareness of the contested and possibly under-defined nature of what we’re even trying to do”).