Morality as Parfitian-filtered Decision Theory?

Non-political follow-up to: Ungrateful Hitchhikers (offsite)

Related to: Prices or Bindings?, The True Prisoner’s Dilemma

Summary: Situations like the Parfit’s Hitchhiker problem select for a certain kind of mind: specifically, one that recognizes that an action can be optimal, in a self-interested sense, even if it can no longer cause any future benefit. A mind that can identify such actions might put them in a different category which enables it to perform them, in defiance of the (futureward) consequentialist concerns that normally need to motivate it. Our evolutionary history has put us through such “Parfitian filters”, and the corresponding actions, viewed from the inside, feel like “something we should do”, even if we don’t do it, and even if we recognize the lack of a future benefit. Therein lies the origin of our moral intuitions, as well as the basis for creating the category “morality” in the first place.

Introduction: What kind of mind survives Parfit’s Dilemma?

Parfit’s Dilemma – my version – goes like this: You are lost in the desert and near death. A superbeing known as Omega finds you and considers whether to take you back to civilization and stabilize you. It is a perfect predictor of what you will do, and only plans to rescue you if it predicts that you will, upon recovering, give it $0.01 from your bank account. If it doesn’t predict you’ll pay, you’re left in the desert to die. [1]

So what kind of mind wakes up from this? One that would give Omega the money. Most importantly, the mind is not convinced to withhold payment on the basis that the benefit was received only in the past. Even if it recognizes that no future benefit will result from this decision—and only future costs will result—it decides to make the payment anyway.

If a mind is likely to encounter such dilemmas, it would be an advantage to have a decision theory capable of making this kind of “un-consequentialist” decision. And if a decision theory passes through time by being lossily stored by a self-replicating gene (and some decompressing apparatus), then only those that shift to encoding this kind of mentality will be capable of propagating themselves through Parfit’s Hitchhiker-like scenarios (call these scenarios “Parfitian filters”).

Sustainable self-replication as a Parfitian filter

Though evolutionary psychology has its share of pitfalls, one question should have an uncontroversial solution: “Why do parents care for their children, usually at great cost to themselves?” The answer is that their desires are largely set by evolutionary processes, in which a “blueprint” is slightly modified over time, and the more effective self-replicating blueprint-pieces dominate the construction of living things. Parents that did not have sufficient “built-in desire” to care for their children would be weeded out; what’s left is (genes that construct) minds that do have such a desire.

This process can be viewed as a Parfitian filter: regardless of how much parents might favor their own survival and satisfaction, they could not get to that point unless they were “attached” to a decision theory that outputs actions sufficiently more favorable toward one’s children than one’s self. Addendum (per pjeby’s comment): The parallel to Parfit’s Hitchhiker is this: Natural selection is the Omega, and the mind propagated through generations by natural selection is the hitchhiker. The mind only gets to the “decide to pay”/​”decide to care for children” if it had the right decision theory before the “rescue”/​”copy to next generation”.

Explanatory value of utility functions

Let us turn back to Parfit’s Dilemma, an idealized example of a Parfitian filter, and consider the task of explaining why someone decided to pay Omega. For simplicity, we’ll limit ourselves to two theories:

Theory 1a: The survivor’s utility function places positive weight on benefits both to the survivor and to Omega; in this case, the utility of “Omega receiving the $0.01” (as viewed by the survivor’s function) exceeds the utility of keeping it.

Theory 1b: The survivor’s utility function only places weight on benefits to him/​herself; however, the survivor is limited to using decision theories capable of surviving this Parfitian filter.

The theories are observationally equivalent, but 1a is worse because it makes strictly more assumptions: in particular, the questionable one that the survivor somehow values Omega in some terminal, rather than instrumental sense. [2] The same analysis can be carried over to the earlier question about natural selection, albeit disturbingly. Consider these two analogous theories attempting to explain the behavior of parents:

Theory 2a: Parents have a utility function that places positive weight on both themselves and their children.

Theory 2b: Parents have a utility function that places positive weight on only themselves (!!!); however, they are limited to implementing decision theories capable of surviving natural selection.

The point here is not to promote some cynical, insulting view of parents; rather, I will show how this “acausal self-interest” so closely aligns with the behavior we laud as moral.

SAMELs vs. CaMELs, Morality vs. Selfishness

So what makes an issue belong in the “morality” category in the first place? For example, the decision of which ice cream flavor to choose is not regarded as a moral dilemma. (Call this Dilemma A.) How do you turn it into a moral dilemma? One way is to make the decision have implications for the well-being of others: “Should you eat your favorite ice cream flavor, instead of your next-favorite, if doing so shortens the life of another person?” (Call this Dilemma B.)

Decision-theoretically, what is the difference between A and B? Following Gary Drescher’s treatment in Chapter 7 of Good and Real, I see another salient difference: You can reach the optimal decision in A by looking only at causal means-end links (CaMELs), while Dilemma B requires that you consider the subjunctive acausal means-end links (SAMELs). Less jargonishly, in Dilemma B, an ideal agent will recognize that their decision to pick their favorite ice cream at the expense of another person suggests that others in the same position will do (and have done) likewise, for the same reason. In contrast, an agent in Dilemma A (as stated) will do no worse as a result of ignoring all such entailments.

More formally, a SAMEL is a relationship between your choice and the satisfaction of a goal, in which your choice does not (futurewardly) cause the goal’s achievement or failure, while in a CaMEL, it does. Drescher argues that actions that implicitly recognize SAMELs tend to be called “ethical”, while those that only recognize CaMELs tend to be called “selfish”. I will show how these distinctions (between causal and acausal, ethical and unethical) shed light on moral dilemmas, and on how we respond to them, by looking at some familiar arguments.

Joshua Greene, Revisited: When rationalizing wins

A while back, LW readers discussed Greene’s dissertation on morality. In it, he reviews experiments in which people are given moral dilemmas and asked to justify their position. The twist: normally people justify their position by reference to some consequence, but that consequence is carefully removed from being a possibility in the dilemma’s set-up. The result? The subjects continued to argue for their position, invoking such stopsigns as, “I don’t know, I can’t explain it, [sic] I just know it’s wrong” (p. 151, citing Haidt).

Greene regards this as misguided reasoning, and interprets it to mean that people are irrationally making choices, excessively relying on poor intuitions. He infers that we need to fundamentally change how we think and talk about moral issues so as to eliminate these questionable barriers in our reasoning.

In light of Parfitian filters and SAMELs, I think a different inference is available to us. First, recall that there are cases where the best choices don’t cause a future benefit. In those cases, an agent will not be able to logically point to such a benefit as justification, even despite the choice’s optimality. Furthermore, if an agent’s decision theory was formed through evolution, their propensity to act on SAMELs (selected for due to its optimality) arose long before they were capable of careful self-reflective analysis of their choices. This, too, can account for why most people a) opt for something that doesn’t cause a future benefit, b) stick to that choice with or without such a benefit, and c) place it in a special category (“morality”) when justifying their action.

This does not mean we should give up on rationally grounding our decision theory, “because rationalizers win too!” Nor does it mean that everyone who retreats to a “moral principles” defense is really acting optimally. Rather, it means it is far too strict to require that our decisions all cause a future benefit; we need to count acausal “consequences” (SAMELs) on par with causal ones (CaMELs) – and moral intuitions are a mechanism that can make us do this.

As Drescher notes, the optimality of such acausal benefits can be felt, intuitively, when making a decision, even if they are insufficient to override other desires, and even if we don’t recognize it in those exact terms (pp. 318-9):

Both the one-box intuition in Newcomb’s Problem (an intuition you can feel … even if you ultimately decide to take both boxes), and inclinations toward altruistic … behavior (inclinations you likewise can feel even if you end up behaving otherwise), involve what I have argued are acausal means-end relations. Although we do not … explicitly regard the links as means-end relations, as a practical matter we do tend to treat them exactly as only means-end relations should be treated: our recognition of the relation between the action and the goal influences us to take the action (even if contrary influences sometimes prevail).

I speculate that it is not coincidental that in practice, we treat these means-end relations as what they really are. Rather, I suspect that the practical recognition of means-end relations is fundamental to our cognitive machinery: it treats means-end relations (causal and acausal) as such because doing so is correct – that is, because natural selection favored machinery that correctly recognizes and acts on means-end relations without insisting that they be causal….

If we do not explicitly construe those moral intuitions as recognitions of subjunctive means-end links, we tend instead to perceive the intuitions as recognitions of some otherwise-ungrounded inherent deservedness by others of being treated well (or, in the case of retribution, of being treated badly).

To this we can add the Parfit’s Hitchhiker problem: how do you feel, internally, about not paying Omega? One could just as easily criticize your desire to pay Omega as “rationalization”, as you cannot identify a future benefit caused by your action. But the problem, if any, lies in failing to recognize acausal benefits, not in your desire to pay.

The Prisoner’s Dilemma, Revisited: Self-sacrificial caring is (sometimes) self-optimizing

In this light, consider the Prisoner’s Dilemma. Basically, you and your partner-in-crime are deciding whether to rat each other out; the sum of the benefit to you both is highest if you stay silent, but one can do better at the cost of the other by confessing. (Label this scenario that is used to teach it as the “Literal Prisoner’s Dilemma Situation”, or LPDS.)

Eliezer Yudkowsky previously claimed in The True Prisoner’s Dilemma that mentioning the LPDS introduces a major confusion (and I agreed): real people in that situation do not, intuitively, see the payoff matrix as it’s presented. To most of us our satisfaction with the outcome is not solely a function of how much jail time we avoid: we also care about the other person, and don’t want to be a backstabber. So, the argument goes, we need a really contrived situation to get a payoff matrix like that.

I suggest an alternate interpretation of this disconnect: the payoff matrix is correct, but the humans facing the dilemma have been Parfitian-filtered to the point where their decision theory contains dispositions that assist them in winning on these problems, even given that payoff matrix. To see why, consider another set of theories to choose from, like the two above:

Theory 3a: Humans in a literal Prisoner’s Dilemma (LPDS) have a positive weight in their utility function both for themselves, and their accomplices, and so would be hurt to see the other one suffer jail time.

Theory 3b: Humans in a literal Prisoner’s Dilemma (LPDS) have a positive weight in their utility function only for themselves, but are limited to using a decision theory that survived past social/​biological Parfitian filters.

As with the point about parents, the lesson is not that you don’t care about your friends; rather, it’s that your actions based on caring are the same as that of a self-interested being with a good decision theory. What you recognize as “just wrong” could be the feeling of a different “reasoning module” acting.


By viewing moral intuitions as mechanism that allows propagation through Parfitian filters, we can better understand:

1) what moral intuitions are (the set of intuitions that were selected for because they saw optimality in the absence of a causal link);

2) why they arose (because agents with them pass through the Parfitian filters that weed out others, evolution being one of them); and

3) why we view this as a relevant category boundary in the first place (because they are all similar in that they elevate the perceived benefit of an action that lacks a self-serving, causal benefit).


[1] My variant differs in that there is no communication between you and Omega other than knowledge of your conditional behaviors, and the price is absurdly low to make sure the relevant intuitions in your mind are firing.

[2] Note that 1b’s assumption of constraints on the agent’s decision theory does not penalize it, as this must be assumed in both cases, and additional implications of existing assumptions do not count as additional assumptions for purposes of gauging probabilities.