I don’t follow the reasoning. How do you get from “most people’s moral behaviour is explainable in terms of them ‘playing’ a status game” to “solving (some versions of) the alignment problem probably won’t be enough to ensure a future that’s free from astronomical waste or astronomical suffering”?
More details: Regarding the quote from The Status Game: I have not read the book, so I’m not sure what the intended message is but this sounds like some sort of unwarranted pessimism about ppl’s moral standing (something like a claim like “the vast majority of ppl are morally ugly in this way”). There is all the difference between being able to explain most ppl’s behaviour in terms of “playing a status game” (in some sense of the word “playing”), and claiming that most ppl’s conscious motivation to act morally is to win at the status game. The latter claim could plausibly warrant the pessimism; the former, not. But I don’t see an argument for the latter. Why is the former claim not evidence for ugliness? For the same reason that the claim “a mother’s love for their child is genetically ‘hard-wired’” is not evidence that a mother’s love for their child is ugly (or fake, not genuine etc). Explaining the underlying causal mechanisms of a given moral behaviour is not (in general) enough to warrant a given moral judgment. (If instead the book is arguing for some sort of meta-ethical anti-realism, well, then the discussion needs to be much longer...)
Regarding your fear about morality: is the worry that if we just aggregated everyone’s values we would get a lock-in to some sort of “ugly” status game? Again, we need more details on its proposed implementation before we can judge whether its ugly (something to be scared of).
But also, why are we assuming some sort of aggregation of first-order human value preferences (no matter the method of aggregation)? Assuming we’re talking about AGI (and not CAIS), I always thought it strange to think we need to make sure it shares our own idealized preferences, as opposed to merely the preferences we would hope for in, say, a benevolent god or something. I don’t see any a priori reason to believe that the preferences/goals of a benevolent shepherd are likely to be shared with/or strongly aligned with those of the shepherd’s flock (no matter how you aggregate the preferences/goals of the flock). (I suppose it depends on the nature of the species we’re talking about, but whether it’s sheep or humans, I maintain my skepticism). In any case, I agree with you that a lot more meta-ethics needs to be discussed in the alignment research community.
I don’t follow the reasoning. How do you get from “most people’s moral behaviour is explainable in terms of them ‘playing’ a status game” to “solving (some versions of) the alignment problem probably won’t be enough to ensure a future that’s free from astronomical waste or astronomical suffering”?
More details:
Regarding the quote from The Status Game: I have not read the book, so I’m not sure what the intended message is but this sounds like some sort of unwarranted pessimism about ppl’s moral standing (something like a claim like “the vast majority of ppl are morally ugly in this way”). There is all the difference between being able to explain most ppl’s behaviour in terms of “playing a status game” (in some sense of the word “playing”), and claiming that most ppl’s conscious motivation to act morally is to win at the status game. The latter claim could plausibly warrant the pessimism; the former, not. But I don’t see an argument for the latter. Why is the former claim not evidence for ugliness? For the same reason that the claim “a mother’s love for their child is genetically ‘hard-wired’” is not evidence that a mother’s love for their child is ugly (or fake, not genuine etc). Explaining the underlying causal mechanisms of a given moral behaviour is not (in general) enough to warrant a given moral judgment. (If instead the book is arguing for some sort of meta-ethical anti-realism, well, then the discussion needs to be much longer...)
Regarding your fear about morality: is the worry that if we just aggregated everyone’s values we would get a lock-in to some sort of “ugly” status game? Again, we need more details on its proposed implementation before we can judge whether its ugly (something to be scared of).
But also, why are we assuming some sort of aggregation of first-order human value preferences (no matter the method of aggregation)? Assuming we’re talking about AGI (and not CAIS), I always thought it strange to think we need to make sure it shares our own idealized preferences, as opposed to merely the preferences we would hope for in, say, a benevolent god or something. I don’t see any a priori reason to believe that the preferences/goals of a benevolent shepherd are likely to be shared with/or strongly aligned with those of the shepherd’s flock (no matter how you aggregate the preferences/goals of the flock). (I suppose it depends on the nature of the species we’re talking about, but whether it’s sheep or humans, I maintain my skepticism). In any case, I agree with you that a lot more meta-ethics needs to be discussed in the alignment research community.