I agree there’s a long and storied history behind the evolution of moral psychology, and I do think moral instinct evolved as an iterated game — even consciousness may have resulted from language implying a shared normative justification for co-operative action between agents. If two agents have shared ends they respect as self-similar, they can start to co-operate on the means.
Where I may disagree is with the implied framing that the existing tools of evolutionary moral philosophy are sufficient. I’d argue that the existence of the alignment problem (and the problem of the rescue-ability moral internalism) shows that the last half-century of descriptive moral philosophy has been insufficient at providing us the requisite tools to deal with the current circumstance. Eliezer explicitly calls out moral internalism as one of the gaps that prevents CEV from being a complete normative theory, or one that could be broadly adopted.
The iterated game framing also breaks down precisely in the circumstances alignment is worried about: An agent with decisive strategic advantage genuinely escapes iteration. The “no social payoff, might as well be on the other side of the planet” condition is an attempt to draw an analog to the circumstance a super intelligence or AI with decisive strategic advantage would actually inhabit—not a rhetorical contrivance.
I don’t read you as claiming the descriptive story is itself the justification — you’re offering a richer model of the payoff structure, which is fair. But I want to flag why I bracketed it: though the iterated-game framing is descriptively true of human psychology generally (except maybe in fringe cases like psychopaths), I don’t think the descriptive principles of moral development can serve as justification for the continued development of moral philosophy — because they themselves lack the kind of ongoing justification that all moral claims ultimately require.
If the goal is meeting the standard that rescuing moral internalism entails, the binding has to be intrinsic, not extrinsically contingent. I take this to mean making ethical considerations because you, on some level, consider other moral patienthood at least plausibly your own in a way that cannot be coherently falsified. Treating other moral patients as the subject of utility-function considerations by virtue of uncertainty is in a different class than treating them as instrumental objects to avoid punishment in certain competitive dynamics.
We’re trying to create an intelligent being that acts morally when it doesn’t have to. The Orthogonality Thesis says that’s possible, and intuition says it’s hard. For a selfish being, what evolution produces, that can’t be based simply on game theory: the game theory says “you can do what you want and there are no consequences”. Game theory looks like the second column in your table. What we want is a being that isn’t selfish at all, but “otherish”: whose utility function is aligned to our utility function. That’s even better than your Omega proposal: that’s a being whose payoff is −10m for pressing the button and 0m for not pressing the button: it ignores the cash entirely (or actually, would donate it to charity, in which case its payoff for pressing the button is −9m, matching your Omega). That’s a being whose utility function is that of an intelligent piece of humanity’s extended phenotype: something that could not evolve, but is the correct thing for us to build.
How does Evolutionary Moral Psychology help here? Well, I have a post about that…
Thanks for the comment!
I agree there’s a long and storied history behind the evolution of moral psychology, and I do think moral instinct evolved as an iterated game — even consciousness may have resulted from language implying a shared normative justification for co-operative action between agents. If two agents have shared ends they respect as self-similar, they can start to co-operate on the means.
Where I may disagree is with the implied framing that the existing tools of evolutionary moral philosophy are sufficient. I’d argue that the existence of the alignment problem (and the problem of the rescue-ability moral internalism) shows that the last half-century of descriptive moral philosophy has been insufficient at providing us the requisite tools to deal with the current circumstance. Eliezer explicitly calls out moral internalism as one of the gaps that prevents CEV from being a complete normative theory, or one that could be broadly adopted.
The iterated game framing also breaks down precisely in the circumstances alignment is worried about: An agent with decisive strategic advantage genuinely escapes iteration. The “no social payoff, might as well be on the other side of the planet” condition is an attempt to draw an analog to the circumstance a super intelligence or AI with decisive strategic advantage would actually inhabit—not a rhetorical contrivance.
I don’t read you as claiming the descriptive story is itself the justification — you’re offering a richer model of the payoff structure, which is fair. But I want to flag why I bracketed it: though the iterated-game framing is descriptively true of human psychology generally (except maybe in fringe cases like psychopaths), I don’t think the descriptive principles of moral development can serve as justification for the continued development of moral philosophy — because they themselves lack the kind of ongoing justification that all moral claims ultimately require.
If the goal is meeting the standard that rescuing moral internalism entails, the binding has to be intrinsic, not extrinsically contingent. I take this to mean making ethical considerations because you, on some level, consider other moral patienthood at least plausibly your own in a way that cannot be coherently falsified. Treating other moral patients as the subject of utility-function considerations by virtue of uncertainty is in a different class than treating them as instrumental objects to avoid punishment in certain competitive dynamics.
We’re trying to create an intelligent being that acts morally when it doesn’t have to. The Orthogonality Thesis says that’s possible, and intuition says it’s hard. For a selfish being, what evolution produces, that can’t be based simply on game theory: the game theory says “you can do what you want and there are no consequences”. Game theory looks like the second column in your table. What we want is a being that isn’t selfish at all, but “otherish”: whose utility function is aligned to our utility function. That’s even better than your Omega proposal: that’s a being whose payoff is −10m for pressing the button and 0m for not pressing the button: it ignores the cash entirely (or actually, would donate it to charity, in which case its payoff for pressing the button is −9m, matching your Omega). That’s a being whose utility function is that of an intelligent piece of humanity’s extended phenotype: something that could not evolve, but is the correct thing for us to build.
How does Evolutionary Moral Psychology help here? Well, I have a post about that…