# Decision Theory FAQ

Co-authored with crazy88. Please let us know when you find mistakes, and we’ll fix them. Last updated 03-27-2013.

Contents:

## 1. What is decision theory?

Decision theory, also known as rational choice theory, concerns the study of preferences, uncertainties, and other issues related to making “optimal” or “rational” choices. It has been discussed by economists, psychologists, philosophers, mathematicians, statisticians, and computer scientists.

We can divide decision theory into three parts (Grant & Zandt 2009; Baron 2008). Normative decision theory studies what an ideal agent (a perfectly rational agent, with infinite computing power, etc.) would choose. Descriptive decision theory studies how non-ideal agents (e.g. humans) actually choose. Prescriptive decision theory studies how non-ideal agents can improve their decision-making (relative to the normative model) despite their imperfections.

For example, one’s normative model might be expected utility theory, which says that a rational agent chooses the action with the highest expected utility. Replicated results in psychology describe humans repeatedly failing to maximize expected utility in particular, predictable ways: for example, they make some choices based not on potential future benefits but on irrelevant past efforts (the “sunk cost fallacy”). To help people avoid this error, some theorists prescribe some basic training in microeconomics, which has been shown to reduce the likelihood that humans will commit the sunk costs fallacy (Larrick et al. 1990). Thus, through a coordination of normative, descriptive, and prescriptive research we can help agents to succeed in life by acting more in accordance with the normative model than they otherwise would.

This FAQ focuses on normative decision theory. Good sources on descriptive and prescriptive decision theory include Stanovich (2010) and Hastie & Dawes (2009).

Two related fields beyond the scope of this FAQ are game theory and social choice theory. Game theory is the study of conflict and cooperation among multiple decision makers, and is thus sometimes called “interactive decision theory.” Social choice theory is the study of making a collective decision by combining the preferences of multiple decision makers in various ways.

This FAQ draws heavily from two textbooks on decision theory: Resnik (1987) and Peterson (2009). It also draws from more recent results in decision theory, published in journals such as Synthese and Theory and Decision.

## 2. Is the rational decision always the right decision?

No. Peterson (2009, ch. 1) explains:

[In 1700], King Carl of Sweden and his 8,000 troops attacked the Russian army [which] had about ten times as many troops… Most historians agree that the Swedish attack was irrational, since it was almost certain to fail… However, because of an unexpected blizzard that blinded the Russian army, the Swedes won...

Looking back, the Swedes’ decision to attack the Russian army was no doubt right, since the actual outcome turned out to be success. However, since the Swedes had no good reason for expecting that they were going to win, the decision was nevertheless irrational.

More generally speaking, we say that a decision is right if and only if its actual outcome is at least as good as that of every other possible outcome. Furthermore, we say that a decision is rational if and only if the decision maker [aka the “agent”] chooses to do what she has most reason to do at the point in time at which the decision is made.

Unfortunately, we cannot know with certainty what the right decision is. Thus, the best we can do is to try to make “rational” or “optimal” decisions based on our preferences and incomplete information.

## 3. How can I better understand a decision problem?

First, we must formalize a decision problem. It usually helps to visualize the decision problem, too.

In decision theory, decision rules are only defined relative to a formalization of a given decision problem, and a formalization of a decision problem can be visualized in multiple ways. Here is an example from Peterson (2009, ch. 2):

Suppose… that you are thinking about taking out fire insurance on your home. Perhaps it costs \$100 to take out insurance on a house worth \$100,000, and you ask: Is it worth it?

The most common way to formalize a decision problem is to break it into states, acts, and outcomes. When facing a decision problem, the decision maker aims to choose the act that will have the best outcome. But the outcome of each act depends on the state of the world, which is unknown to the decision maker.

In this framework, speaking loosely, a state is a part of the world that is not an act (that can be performed now by the decision maker) or an outcome (the question of what, more precisely, states are is a complex question that is beyond the scope of this document). Luckily, not all states are relevant to a particular decision problem. We only need to take into account states that affect the agent’s preference among acts. A simple formalization of the fire insurance problem might include only two states: the state in which your house doesn’t (later) catch on fire, and the state in which your house does (later) catch on fire.

Presumably, the agent prefers some outcomes to others. Suppose the four conceivable outcomes in the above decision problem are: (1) House and \$0, (2) House and -\$100, (3) No house and \$99,900, and (4) No house and \$0. In this case, the decision maker might prefer outcome 1 over outcome 2, outcome 2 over outcome 3, and outcome 3 over outcome 4. (We’ll discuss measures of value for outcomes in the next section.)

An act is commonly taken to be a function that takes one set of the possible states of the world as input and gives a particular outcome as output. For the above decision problem we could say that if the act “Take out insurance” has the world-state “Fire” as its input, then it will give the outcome “No house and \$99,900” as its output.

Note that decision theory is concerned with particular acts rather than generic acts, e.g. “sailing west in 1492” rather than “sailing.” Moreover, the acts of a decision problem must be alternative acts, so that the decision maker has to choose exactly one act.

Once a decision problem has been formalized, it can then be visualized in any of several ways.

One way to visualize this decision problem is to use a decision matrix:

 Fire No fire Take out insurance No house and \$99,900 House and -\$100 No insurance No house and \$0 House and \$0

Another way to visualize this problem is to use a decision tree:

The square is a choice node, the circles are chance nodes, and the triangles are terminal nodes. At the choice node, the decision maker chooses which branch of the decision tree to take. At the chance nodes, nature decides which branch to follow. The triangles represent outcomes.

Of course, we could add more branches to each choice node and each chance node. We could also add more choice nodes, in which case we are representing a sequential decision problem. Finally, we could add probabilities to each branch, as long as the probabilities of all the branches extending from each single node sum to 1. And because a decision tree obeys the laws of probability theory, we can calculate the probability of any given node by multiplying the probabilities of all the branches preceding it.

Our decision problem could also be represented as a vector — an ordered list of mathematical objects that is perhaps most suitable for computers:

[
[a1 = take out insurance,
a2 = do not];
[s1 = fire,
s2 = no fire];
[(a1, s1) = No house and \$99,900,
(a1, s2) = House and -\$100,
(a2, s1) = No house and \$0,
(a2, s2) = House and \$0]
]

For more details on formalizing and visualizing decision problems, see Skinner (1993).

## 4. How can I measure an agent’s preferences?

### 4.1. The concept of utility

It is important not to measure an agent’s preferences in terms of objective value, e.g. monetary value. To see why, consider the absurdities that can result when we try to measure an agent’s preference with money alone.

Suppose you may choose between (A) receiving a million dollars for sure, and (B) a 50% chance of winning either \$3 million or nothing. The expected monetary value (EMV) of your act is computed by multiplying the monetary value of each possible outcome by its probability. So, the EMV of choice A is (1)(\$1 million) = \$1 million. The EMV of choice B is (0.5)(\$3 million) + (0.5)(\$0) = \$1.5 million. Choice B has a higher expected monetary value, and yet many people would prefer the guaranteed million.

Why? For many people, the difference between having \$0 and \$1 million is subjectively much larger than the difference between having \$1 million and \$3 million, even if the latter difference is larger in dollars.

To capture an agent’s subjective preferences, we use the concept of utility. A utility function assigns numbers to outcomes such that outcomes with higher numbers are preferred to outcomes with lower numbers. For example, for a particular decision maker — say, one who has no money — the utility of \$0 might be 0, the utility of \$1 million might be 1000, and the utility of \$3 million might be 1500. Thus, the expected utility (EU) of choice A is, for this decision maker, (1)(1000) = 1000. Meanwhile, the EU of choice B is (0.5)(1500) + (0.5)(0) = 750. In this case, the expected utility of choice A is greater than that of choice B, even though choice B has a greater expected monetary value.

Note that those from the field of statistics who work on decision theory tend to talk about a “loss function,” which is simply an inverse utility function. For an overview of decision theory from this perspective, see Berger (1985) and Robert (2001). For a critique of some standard results in statistical decision theory, see Jaynes (2003, ch. 13).

### 4.2. Types of utility

An agent’s utility function can’t be directly observed, so it must be constructed — e.g. by asking them which options they prefer for a large set of pairs of alternatives (as on WhoIsHotter.com). The number that corresponds to an outcome’s utility can convey different information depending on the utility scale in use, and the utility scale in use depends on how the utility function is constructed.

Decision theorists distinguish three kinds of utility scales:

1. Ordinal scales (“12 is better than 6”). In an ordinal scale, preferred outcomes are assigned higher numbers, but the numbers don’t tell us anything about the differences or ratios between the utility of different outcomes.

2. Interval scales (“the difference between 12 and 6 equals that between 6 and 0”). An interval scale gives us more information than an ordinal scale. Not only are preferred outcomes assigned higher numbers, but also the numbers accurately reflect the difference between the utility of different outcomes. They do not, however, necessarily reflect the ratios of utility between different outcomes. If outcome A has utility 0, outcome B has utility 6, and outcome C has utility 12 on an interval scale, then we know that the difference in utility between outcomes A and B and between outcomes B and C is the same, but we can’t know whether outcome B is “twice as good” as outcome A.

3. Ratio scales (“12 is exactly twice as valuable as 6”). Numerical utility assignments on a ratio scale give us the most information of all. They accurately reflect preference rankings, differences, and ratios. Thus, we can say that an outcome with utility 12 is exactly twice as valuable to the agent in question as an outcome with utility 6.

Note that neither experienced utility (happiness) nor the notions of “average utility” or “total utility” discussed by utilitarian moral philosophers are the same thing as the decision utility that we are discussing now to describe decision preferences. As the situation merits, we can be even more specific. For example, when discussing the type of decision utility used in an interval scale utility function constructed using Von Neumann & Morgenstern’s axiomatic approach (see section 8), some people use the term VNM-utility.

Now that you know that an agent’s preferences can be represented as a “utility function,” and that assignments of utility to outcomes can mean different things depending on the utility scale of the utility function, we are ready to think more formally about the challenge of making “optimal” or “rational” choices. (We will return to the problem of constructing an agent’s utility function later, in section 8.3.)

## 5. What do decision theorists mean by “risk,” “ignorance,” and “uncertainty”?

Peterson (2009, ch. 1) explains:

In decision theory, everyday terms such as risk, ignorance, and uncertainty are used as technical terms with precise meanings. In decisions under risk the decision maker knows the probability of the possible outcomes, whereas in decisions under ignorance the probabilities are either unknown or non-existent. Uncertainty is either used as a synonym for ignorance, or as a broader term referring to both risk and ignorance.

In this FAQ, a “decision under ignorance” is one in which probabilities are not assigned to all outcomes, and a “decision under uncertainty” is one in which probabilities are assigned to all outcomes. The term “risk” will be reserved for discussions related to utility.

## 6. How should I make decisions under ignorance?

A decision maker faces a “decision under ignorance” when she (1) knows which acts she could choose and which outcomes they may result in, but (2) is unable to assign probabilities to the outcomes.

(Note that many theorists think that all decisions under ignorance can be transformed into decisions under uncertainty, in which case this section will be irrelevant except for subsection 6.1. For details, see section 7.)

### 6.1. The dominance principle

To borrow an example from Peterson (2009, ch. 3), suppose that Jane isn’t sure whether to order hamburger or monkfish at a new restaurant. Just about any chef can make an edible hamburger, and she knows that monkfish is fantastic if prepared by a world-class chef, but she also recalls that monkfish is difficult to cook. Unfortunately, she knows too little about this restaurant to assign any probability to the prospect of getting good monkfish. Her decision matrix might look like this:

 Good chef Bad chef Monkfish good monkfish terrible monkfish Hamburger edible hamburger edible hamburger No main course hungry hungry

Here, decision theorists would say that the “hamburger” choice dominates the “no main course” choice. This is because choosing the hamburger leads to a better outcome for Jane no matter which possible state of the world (good chef or bad chef) turns out to be true.

This dominance principle comes in two forms:

• Weak dominance: One act is more rational than another if (1) all its possible outcomes are at least as good as those of the other, and if (2) there is at least one possible outcome that is better than that of the other act.

• Strong dominance: One act is more rational than another if all of its possible outcome are better than that of the other act.

The dominance principle can also be applied to decisions under uncertainty (in which probabilities are assigned to all the outcomes). If we assign probabilities to outcomes, it is still rational to choose one act over another act if all its outcomes are at least as good as the outcomes of the other act.

However, the dominance principle only applies (non-controversially) when the agent’s acts are independent of the state of the world. So consider the decision of whether to steal a coat:

 Charged with theft Not charged with theft Theft Jail and coat Freedom and coat No theft Jail Freedom

In this case, stealing the coat dominates not doing so but isn’t necessarily the rational decision. After all, stealing increases your chance of getting charged with theft and might be irrational for this reason. So dominance doesn’t apply in cases like this where the state of the world is not independent of the agents act.

On top of this, not all decision problems include an act that dominates all the others. Consequently additional principles are often required to reach a decision.

### 6.2. Maximin and leximin

Some decision theorists have suggested the maximin principle: if the worst possible outcome of one act is better than the worst possible outcome of another act, then the former act should be chosen. In Jane’s decision problem above, the maximin principle would prescribe choosing the hamburger, because the worst possible outcome of choosing the hamburger (“edible hamburger”) is better than the worst possible outcome of choosing the monkfish (“terrible monkfish”) and is also better than the worst possible outcome of eating no main course (“hungry”).

If the worst outcomes of two or more acts are equally good, the maximin principle tells you to be indifferent between them. But that doesn’t seem right. For this reason, fans of the maximin principle often invoke the lexical maximin principle (“leximin”), which says that if the worst outcomes of two or more acts are equally good, one should choose the act for which the second worst outcome is best. (If that doesn’t single out a single act, then the third worst outcome should be considered, and so on.)

Why adopt the leximin principle? Advocates point out that the leximin principle transforms a decision problem under ignorance into a decision problem under partial certainty. The decision maker doesn’t know what the outcome will be, but they know what the worst possible outcome will be.

But in some cases, the leximin rule seems clearly irrational. Imagine this decision problem, with two possible acts and two possible states of the world:

 s1 s2 a1 \$1 \$10,001.01 a2 \$1.01 \$1.01

In this situation, the leximin principle prescribes choosing a2. But most people would agree it is rational to risk losing out on a single cent for the chance to get an extra \$10,000.

### 6.3. Maximax and optimism-pessimism

The maximin and leximin rules focus their attention on the worst possible outcomes of a decision, but why not focus on the best possible outcome? The maximax principle prescribes that if the best possible outcome of one act is better than the best possible outcome of another act, then the former act should be chosen.

More popular among decision theorists is the optimism-pessimism rule (aka the alpha-index rule). The optimism-pessimism rule prescribes that one consider both the best and worst possible outcome of each possible act, and then choose according to one’s degree of optimism or pessimism.

Here’s an example from Peterson (2009, ch. 3):

 s1 s2 s3 s4 s5 s6 a1 55 18 28 10 36 100 a2 50 87 55 90 75 70

We represent the decision maker’s level of optimism on a scale of 0 to 1, where 0 is maximal pessimism and 1 is maximal optimism. For a1, the worst possible outcome is 10 and the best possible outcome is 100. That is, min(a1) = 10 and max(a1) = 100. So if the decision maker is 0.85 optimistic, then the total value of a1 is (0.85)(100) + (1 − 0.85)(10) = 86.5, and the total value of a2 is (0.85)(90) + (1 − 0.85)(50) = 84. In this situation, the optimism-pessimism rule prescribes action a1.

If the decision maker’s optimism is 0, then the optimism-pessimism rule collapses into the maximin rule because (0)(max(ai)) + (1 − 0)(min(ai)) = min(ai). And if the decision maker’s optimism is 1, then the optimism-pessimism rule collapses into the maximax rule. Thus, the optimism-pessimism rule turns out to be a generalization of the maximin and maximax rules. (Well, sort of. The minimax and maximax principles require only that we measure value on an ordinal scale, whereas the optimism-pessimism rule requires that we measure value on an interval scale.)

The optimism-pessimism rule pays attention to both the best-case and worst-case scenarios, but is it rational to ignore all the outcomes in between? Consider this example:

 s1 s2 s3 a1 1 2 100 a2 1 99 100

The maximum and minimum values for a1 and a2 are the same, so for every degree of optimism both acts are equally good. But it seems obvious that one should choose a2.

### 6.4. Other decision principles

Many other decision principles for dealing with decisions under ignorance have been proposed, including minimax regret, info-gap, and maxipok. For more details on making decisions under ignorance, see Peterson (2009) and Bossert et al. (2000).

One queer feature of the decision principles discussed in this section is that they willfully disregard some information relevant to making a decision. Such a move could make sense when trying to find a decision algorithm that performs well under tight limits on available computation (Brafman & Tennenholtz (2000)), but it’s unclear why an ideal agent with infinite computing power (fit for a normative rather than a prescriptive theory) should willfully disregard information.

## 7. Can decisions under ignorance be transformed into decisions under uncertainty?

Can decisions under ignorance be transformed into decisions under uncertainty? This would simplify things greatly, because there is near-universal agreement that decisions under uncertainty should be handled by “maximizing expected utility” (see section 11 for clarifications), whereas decision theorists still debate what should be done about decisions under ignorance.

For Bayesians (see section 10), all decisions under ignorance are transformed into decisions under uncertainty (Winkler 2003, ch. 5) when the decision maker assigns an “ignorance prior” to each outcome for which they don’t know how to assign a probability. (Another way of saying this is to say that a Bayesian decision maker never faces a decision under ignorance, because a Bayesian must always assign a prior probability to events.) One must then consider how to assign priors, an important debate among Bayesians (see section 10).

Many non-Bayesian decision theorists also think that decisions under ignorance can be transformed into decisions under uncertainty due to something called the principle of insufficient reason. The principle of insufficient reason prescribes that if you have literally no reason to think that one state is more probable than another, then one should assign equal probability to both states.

One objection to the principle of insufficient reason is that it is very sensitive to how states are individuated. Peterson (2009, ch. 3) explains:

Suppose that before embarking on a trip you consider whether to bring an umbrella or not. [But] you know nothing about the weather at your destination. If the formalization of the decision problem is taken to include only two states, viz. rain and no rain, [then by the principle of insufficient reason] the probability of each state will be 12. However, it seems that one might just as well go for a formalization that divides the space of possibilities into three states, viz. heavy rain, moderate rain, and no rain. If the principle of insufficient reason is applied to the latter set of states, their probabilities will be 13. In some cases this difference will affect our decisions. Hence, it seems that anyone advocating the principle of insufficient reason must [defend] the rather implausible hypothesis that there is only one correct way of making up the set of states.

Advocates of the principle of insufficient reason might respond that one must consider symmetric states. For example if someone gives you a die with n sides and you have no reason to think the die is biased, then you should assign a probability of 1/​n to each side. But, Peterson notes:

...not all events can be described in symmetric terms, at least not in a way that justifies the conclusion that they are equally probable. Whether Ann’s marriage will be a happy one depends on her future emotional attitude toward her husband. According to one description, she could be either in love or not in love with him; then the probability of both states would be 12. According to another equally plausible description, she could either be deeply in love, a little bit in love or not at all in love with her husband; then the probability of each state would be 13.

## 8. How should I make decisions under uncertainty?

A decision maker faces a “decision under uncertainty” when she (1) knows which acts she could choose and which outcomes they may result in, and she (2) assigns probabilities to the outcomes.

Decision theorists generally agree that when facing a decision under uncertainty, it is rational to choose the act with the highest expected utility. This is the principle of expected utility maximization (EUM).

Decision theorists offer two kinds of justifications for EUM. The first has to do with the law of large numbers (see section 8.1). The second has to do with the axiomatic approach (see sections 8.2 through 8.6).

### 8.1. The law of large numbers

The “law of large numbers,” which states that in the long run, if you face the same decision problem again and again and again, and you always choose the act with the highest expected utility, then you will almost certainly be better off than if you choose any other acts.

There are two problems with using the law of large numbers to justify EUM. The first problem is that the world is ever-changing, so we rarely if ever face the same decision problem “again and again and again.” The law of large numbers says that if you face the same decision problem infinitely many times, then the probability that you could do better by not maximizing expected utility approaches zero. But you won’t ever face the same decision problem infinitely many times! Why should you care what would happen if a certain condition held, if you know that condition will never hold?

The second problem with using the law of large numbers to justify EUM has to do with a mathematical theorem known as gambler’s ruin. Imagine that you and I flip a fair coin, and I pay you \$1 every time it comes up heads and you pay me \$1 every time it comes up tails. We both start with \$100. If we flip the coin enough times, one of us will face a situation in which the sequence of heads or tails is longer than we can afford. If a long-enough sequence of heads comes up, I’ll run out of \$1 bills with which to pay you. If a long-enough sequence of tails comes up, you won’t be able to pay me. So in this situation, the law of large numbers guarantees that you will be better off in the long run by maximizing expected utility only if you start the game with an infinite amount of money (so that you never go broke), which is an unrealistic assumption. (For technical convenience, assume utility increases linearly with money. But the basic point holds without this assumption.)

### 8.2. The axiomatic approach

The other method for justifying EUM seeks to show that EUM can be derived from axioms that hold regardless of what happens in the long run.

In this section we will review perhaps the most famous axiomatic approach, from Von Neumann and Morgenstern (1947). Other axiomatic approaches include Savage (1954), Jeffrey (1983), and Anscombe & Aumann (1963).

### 8.3. The Von Neumann-Morgenstern utility theorem

The first decision theory axiomatization appeared in an appendix to the second edition of Von Neumann & Morgenstern’s Theory of Games and Economic Behavior (1947). An important point to note up front is that, in this axiomatization, Von Neumann and Morgenstern take the options that the agent chooses between to not be acts, as we’ve defined them, but lotteries (where a lottery is a set of outcomes, each paired with a probability). As such, while discussing their axiomatization, we will talk of lotteries. (Despite making this distinction, acts and lotteries are closely related. Under the conditions of uncertainty that we are considering here, each act will be associated with some lottery and so preferences over lotteries could be used to determine preferences over acts, if so desired).

The key feature of the Von Neumann and Morgenstern axiomatization is a proof that if a decision maker states her preferences over a set of lotteries, and if her preferences conform to a set of intuitive structural constraints (axioms), then we can construct a utility function (on an interval scale) from her preferences over lotteries and show that she acts as if she maximizes expected utility with respect to that utility function.

What are the axioms to which an agent’s preferences over lotteries must conform? There are four of them.

1. The completeness axiom states that the agent must bother to state a preference for each pair of lotteries. That is, the agent must prefer A to B, or prefer B to A, or be indifferent between the two.

2. The transitivity axiom states that if the agent prefers A to B and B to C, she must also prefer A to C.

3. The independence axiom states that, for example, if an agent prefers an apple to an orange, then she must also prefer the lottery [55% chance she gets an apple, otherwise she gets cholera] over the lottery [55% chance she gets an orange, otherwise she gets cholera]. More generally, this axiom holds that a preference must hold independently of the possibility of another outcome (e.g. cholera).

4. The continuity axiom holds that if the agent prefers A to B to C, then there exists a unique p (probability) such that the agent is indifferent between [p(A) + (1 - p)(C)] and [outcome B with certainty].

The continuity axiom requires more explanation. Suppose that A = \$1 million, B = \$0, and C = Death. If p = 0.5, then the agent’s two lotteries under consideration for the moment are:

1. (0.5)(\$1M) + (1 − 0.5)(Death) [win \$1M with 50% probability, die with 50% probability]

2. (1)(\$0) [win \$0 with certainty]

Most people would not be indifferent between \$0 with certainty and [50% chance of \$1M, 50% chance of Death] — the risk of Death is too high! But if you have continuous preferences, there is some probability p for which you’d be indifferent between these two lotteries. Perhaps p is very, very high:

1. (0.999999)(\$1M) + (1 − 0.999999)(Death) [win \$1M with 99.9999% probability, die with 0.0001% probability]

2. (1)(\$0) [win \$0 with certainty]

Perhaps now you’d be indifferent between lottery 1 and lottery 2. Or maybe you’d be more willing to risk Death for the chance of winning \$1M, in which case the p for which you’d be indifferent between lotteries 1 and 2 is lower than 0.999999. As long as there is some p at which you’d be indifferent between lotteries 1 and 2, your preferences are “continuous.”

Given this setup, Von Neumann and Morgenstern proved their theorem, which states that if the agent’s preferences over lotteries obeys their axioms, then:

• The agent’s preferences can be represented by a utility function that assigns higher utility to preferred lotteries.

• The agent acts in accordance with the principle of maximizing expected utility.

• All utility functions satisfying the above two conditions are “positive linear transformations” of each other. (Without going into the details: this is why VNM-utility is measured on an interval scale.)

### 8.4. VNM utility theory and rationality

An agent which conforms to the VNM axioms is sometimes said to be “VNM-rational.” But why should “VNM-rationality” constitute our notion of rationality in general? How could VNM’s result justify the claim that a rational agent maximizes expected utility when facing a decision under uncertainty? The argument goes like this:

1. If an agent chooses lotteries which it prefers (in decisions under uncertainty), and if its preferences conform to the VNM axioms, then it is rational. Otherwise, it is irrational.

2. If an agent chooses lotteries which it prefers (in decisions under uncertainty), and if its preferences conform to the VNM axioms, then it maximizes expected utility.

3. Therefore, a rational agent maximizes expected utility (in decisions under uncertainty).

Von Neumann and Morgenstern proved premise 2, and the conclusion follows from premise 1 and 2. But why accept premise 1?

Few people deny that it would be irrational for an agent to choose a lottery which it does not prefer. But why is it irrational for an agent’s preferences to violate the VNM axioms? I will save that discussion for section 8.6.

### 8.5. Objections to VNM-rationality

Several objections have been raised to Von Neumann and Morgenstern’s result:

1. The VNM axioms are too strong. Some have argued that the VNM axioms are not self-evidently true. See section 8.6.

2. The VNM system offers no action guidance. A VNM-rational decision maker cannot use VNM utility theory for action guidance, because she must state her preferences over lotteries at the start. But if an agent can state her preferences over lotteries, then she already knows which lottery to choose. (For more on this, see section 9.)

3. In the VNM system, utility is defined via preferences over lotteries rather than preferences over outcomes. To many, it seems odd to define utility with respect to preferences over lotteries. Many would argue that utility should be defined in relation to preferences over outcomes or world-states, and that’s not what the VNM system does. (Also see section 9.)

### 8.6. Should we accept the VNM axioms?

The VNM preference axioms define what it is for an agent to be VNM-rational. But why should we accept these axioms? Usually, it is argued that each of the axioms are pragmatically justified because an agent which violates the axioms can face situations in which they are guaranteed end up worse off (from their own perspective).

In sections 8.6.1 and 8.6.2 I go into some detail about pragmatic justifications offered for the transitivity and completeness axioms. For more detail, including arguments about the justification of the other axioms, see Peterson (2009, ch. 8) and Anand (1993).

#### 8.6.1. The transitivity axiom

Consider the money-pump argument in favor of the transitivity axiom (“if the agent prefers A to B and B to C, she must also prefer A to C”).

Imagine that a friend offers to give you exactly one of her three… novels, x or y or z… [and] that your preference ordering over the three novels is… [that] you prefer x to y, and y to z, and z to x… [That is, your preferences are cyclic, which is a type of intransitive preference relation.] Now suppose that you are in possession of z, and that you are invited to swap z for y. Since you prefer y to z, rationality obliges you to swap. So you swap, and temporarily get y. You are then invited to swap y for x, which you do, since you prefer x to y. Finally, you are offered to pay a small amount, say one cent, for swapping x for z. Since z is strictly [preferred to] x, even after you have paid the fee for swapping, rationality tells you that you should accept the offer. This means that you end up where you started, the only difference being that you now have one cent less. This procedure is thereafter iterated over and over again. After a billion cycles you have lost ten million dollars, for which you have got nothing in return. (Peterson 2009, ch. 8)

Similar arguments (e.g. Gustafsson 2010) aim to show that the other kind of intransitive preferences (acyclic preferences) are irrational, too.

(Of course, pragmatic arguments need not be framed in monetary terms. We could just as well construct an argument showing that an agent with intransitive preferences can be “pumped” of all their happiness, or all their moral virtue, or all their Twinkies.)

#### 8.6.2. The completeness axiom

The completeness axiom (“the agent must prefer A to B, or prefer B to A, or be indifferent between the two”) is often attacked by saying that some goods or outcomes are incommensurable — that is, they cannot be compared. For example, must a rational agent be able to state a preference (or indifference) between money and human welfare?

Perhaps the completeness axiom can be justified with a pragmatic argument. If you think it is rationally permissible to swap between two incommensurable goods, then one can construct a money pump argument in favor of the completeness axiom. But if you think it is not rational to swap between incommensurable goods, then one cannot construct a money pump argument for the completeness axiom. (In fact, even if it is rational to swap between incommensurable goods, Mandler, 2005 has demonstrated that an agent that allows their current choices to depend on the previous ones can avoid being money pumped.)

And in fact, there is a popular argument against the completeness axiom: the “small improvement argument.” For details, see Chang (1997) and Espinoza (2007).

Note that in revealed preference theory, according to which preferences are revealed through choice behavior, there is no room for incommensurable preferences because every choice always reveals a preference relation of “better than,” “worse than,” or “equally as good as.”

Another proposal for dealing with the apparent incommensurability of some goods (such as money and human welfare) is the multi-attribute approach:

In a multi-attribute approach, each type of attribute is measured in the unit deemed to be most suitable for that attribute. Perhaps money is the right unit to use for measuring financial costs, whereas the number of lives saved is the right unit to use for measuring human welfare. The total value of an alternative is thereafter determined by aggregating the attributes, e.g. money and lives, into an overall ranking of available alternatives...

Several criteria have been proposed for choosing among alternatives with multiple attributes… [For example,] additive criteria assign weights to each attribute, and rank alternatives according to the weighted sum calculated by multiplying the weight of each attribute with its value… [But while] it is perhaps contentious to measure the utility of very different objects on a common scale, …it seems equally contentious to assign numerical weights to attributes as suggested here....

[Now let us] consider a very general objection to multi-attribute approaches. According to this objection, there exist several equally plausible but different ways of constructing the list of attributes. Sometimes the outcome of the decision process depends on which set of attributes is chosen. (Peterson 2009, ch. 8)

For more on the multi-attribute approach, see Keeney & Raiffa (1993).

Having considered the transitivity and completeness axioms, we can now turn to independence (a preference holds independently of considerations of other possible outcomes). Do we have any reason to reject this axiom? Here’s one reason to think we might: in a case known as the Allais paradox Allais (1953) it may seem reasonable to act in a way that contradicts independence.

The Allais paradox asks us to consider two decisions (this version of the paradox is based on Yudkowsky (2008)).The first decision involves the choice between:

(1A) A certain \$24,000; and (1B) A 3334 chance of \$27,000 and a 134 chance of nothing.

The second involves the choice between:

(2A) A 34% chance of \$24, 000 and a 66% chance of nothing; and (2B) A 33% chance of \$27, 000 and a 67% chance of nothing.

Experiments have shown that many people prefer (1A) to (1B) and (2B) to (2A). However, these preferences contradict independence. Option 2A is the same as [a 34% chance of option 1A and a 66% chance of nothing] while 2B is the same as [a 34% chance of option 1B and a 66% chance of nothing]. So independence implies that anyone that prefers (1A) to (1B) must also prefer (2A) to (2B).

When this result was first uncovered, it was presented as evidence against the independence axiom. However, while the Allais paradox clearly reveals that independence fails as a descriptive account of choice, it’s less clear what it implies about the normative account of rational choice that we are discussing in this document. As noted in Peterson (2009, ch. 4), however:

[S]ince many people who have thought very hard about this example still feel that it would be rational to stick to the problematic preference pattern described above, there seems to be something wrong with the expected utility principle.

However, Peterson then goes on to note that, many people, like the statistician Leonard Savage, argue that it is people’s preference in the Allais paradox that are in error rather than the independence axiom. If so, then the paradox seems to reveal the danger of relying too strongly on intuition to determine the form that should be taken by normative theories of rational.

The Allais paradox is far from the only case where people fail to act in accordance with EUM. Another well-known case is the Ellsberg paradox (the following is taken from Resnik (1987):

An urn contains ninety uniformly sized balls, which are randomly distributed. Thirty of the balls are yellow, the remaining sixty are red or blue. We are not told how many red (blue) balls are in the urn – except that they number anywhere from zero to sixty. Now consider the following pair of situations. In each situation a ball will be drawn and we will be offered a bet on its color. In situation A we will choose between betting that it is yellow or that it is red. In situation B we will choose between betting that it is red or blue or that it is yellow or blue.

If we guess the correct color, we will receive a payout of \$100. In the Ellsberg paradox, many people bet yellow in situation A and red or blue in situation B. Further, many people make these decisions not because they are indifferent in both situations, and so happy to choose either way, but rather because they have a strict preference to choose in this manner.

However, such behavior cannot be in accordance with EUM. In order for EUM to endorse a strict preference for choosing yellow in situation A, the agent would have to assign a probability of more than 13 to the ball selected being blue. On the other hand, in order for EUM to endorse a strict preference for choosing red or blue in situation B the agent would have to assign a probability of less than 13 to the selected ball being blue. As such, these decisions can’t be jointly endorsed by an agent following EUM.

Those who deny that decisions making under ignorance can be transformed into decision making under uncertainty have an easy response to the Ellsberg paradox: as this case involves deciding under a situation of ignorance, it is irrelevant whether people’s decisions violate EUM in this case as EUM is not applicable to such situations.

Those who believe that EUM provides a suitable standard for choice in such situations, however, need to find some other way of responding to the paradox. As with the Allais paradox, there is some disagreement about how best to do so. Once again, however, many people, including Leonard Savage, argue that EUM reaches the right decision in this case. It is our intuitions that are flawed (see again Resnik (1987) for a nice summary of Savage’s argument to this conclusion).

#### 8.6.5. The St Petersburg paradox

Another objection to the VNM approach (and to expected utility approaches generally), the St. Petersburg paradox, draws on the possibility of infinite utilities. The St. Petersburg paradox is based around a game where a fair coin is tossed until it lands heads up. At this point, the agent receives a prize worth 2n utility, where n is equal to the number of times the coin was tossed during the game. The so-called paradox occurs because the expected utility of choosing to play this game is infinite and so, according to a standard expected utility approach, the agent should be willing to pay any finite amount to play the game. However, this seems unreasonable. Instead, it seems that the agent should only be willing to pay a relatively small amount to do so. As such, it seems that the expected utility approach gets something wrong.

Various responses have been suggested. Most obviously, we could say that the paradox does not apply to VNM agents, since the VNM theorem assigns real numbers to all lotteries, and infinity is not a real number. But it’s unclear whether this escapes the problem. After all, at it’s core, the St. Petersburg paradox is not about infinite utilities but rather about cases where expected utility approaches seem to overvalue some choice, and such cases seem to exist even in finite cases. For example, if we let L be a finite limit on utility we could consider the following scenario (from Peterson, 2009, p. 85):

A fair coin is tossed until it lands heads up. The player thereafter receives a prize worth min {2n · 10-100, L} units of utility, where n is the number of times the coin was tossed.

In this case, even if an extremely low value is set for L, it seems that paying this amount to play the game is unreasonable. After all, as Peterson notes, about nine times out of ten an agent that plays this game will win no more than 8 · 10-100 utility. If paying 1 utility is, in fact, unreasonable in this case, then simply limiting an agent’s utility to some finite value doesn’t provide a defence of expected utility approaches. (Other problems abound. See Yudkowsky, 2007 for an interesting finite problem and Nover & Hajek, 2004 for a particularly perplexing problem with links to the St Petersburg paradox.)

As it stands, there is no agreement about precisely what the St Petersburg paradox reveals. Some people accept one of the various resolutions of the case and so find the paradox unconcerning. Others think the paradox reveals a serious problem for expected utility theories. Still others think the paradox is unresolved but don’t think that we should respond by abandoning expected utility theory.

## 9. Does axiomatic decision theory offer any action guidance?

For the decision theories listed in section 8.2, it’s often claimed the answer is “no.” To explain this, I must first examine some differences between direct and indirect approaches to axiomatic decision theory.

Peterson (2009, ch. 4) explains:

In the indirect approach, which is the dominant approach, the decision maker does not prefer a risky act [or lottery] to another because the expected utility of the former exceeds that of the latter. Instead, the decision maker is asked to state a set of preferences over a set of risky acts… Then, if the set of preferences stated by the decision maker is consistent with a small number of structural constraints (axioms), it can be shown that her decisions can be described as if she were choosing what to do by assigning numerical probabilities and utilities to outcomes and then maximising expected utility...

[In contrast] the direct approach seeks to generate preferences over acts from probabilities and utilities directly assigned to outcomes. In contrast to the indirect approach, it is not assumed that the decision maker has access to a set of preferences over acts before he starts to deliberate.

The axiomatic decision theories listed in section 8.2 all follow the indirect approach. These theories, it might be said, cannot offer any action guidance because they require an agent to state its preferences over acts “up front.” But an agent that states its preferences over acts already knows which act it prefers, so the decision theory can’t offer any action guidance not already present in the agent’s own stated preferences over acts.

Peterson (2009, ch .10) gives a practical example:

For example, a forty-year-old woman seeking advice about whether to, say, divorce her husband, is likely to get very different answers from the [two approaches]. The [indirect approach] will advise the woman to first figure out what her preferences are over a very large set of risky acts, including the one she is thinking about performing, and then just make sure that all preferences are consistent with certain structural requirements. Then, as long as none of the structural requirements is violated, the woman is free to do whatever she likes, no matter what her beliefs and desires actually are… The [direct approach] will [instead] advise the woman to first assign numerical utilities and probabilities to her desires and beliefs, and then aggregate them into a decision by applying the principle of maximizing expected utility.

Thus, it seems only the direct approach offers an agent any action guidance. But the direct approach is very recent (Peterson 2008; Cozic 2011), and only time will show whether it can stand up to professional criticism.

Warning: Peterson’s (2008) direct approach is confusingly called “non-Bayesian decision theory” despite assuming Bayesian probability theory.

For other attempts to pull action guidance from normative decision theory, see Fallenstein (2012) and Stiennon (2013).

## 10. How does probability theory play a role in decision theory?

In order to calculate the expected utility of an act (or lottery), it is necessary to determine a probability for each outcome. In this section, I will explore some of the details of probability theory and its relationship to decision theory.

For further introductory material to probability theory, see Howson & Urbach (2005), Grimmet & Stirzacker (2001), and Koller & Friedman (2009). This section draws heavily on Peterson (2009, chs. 6 & 7) which provides a very clear introduction to probability in the context of decision theory.

### 10.1. The basics of probability theory

Intuitively, a probability is a number between 0 or 1 that labels how likely an event is to occur. If an event has probability 0 then it is impossible and if it has probability 1 then it can’t possibly be false. If an event has a probability between these values, then this event it is more probable the higher this number is.

As with EUM, probability theory can be derived from a small number of simple axioms. In the probability case, there are three of these, which are named the Kolmogorov axioms after the mathematician Andrey Kolmogorov. The first of these states that probabilities are real numbers between 0 and 1. The second, that if a set of events are mutually exclusive and exhaustive then their probabilities should sum to 1. The third that if two events are mutually exclusive then the probability that one or the other of these events will occur is equal to the sum of their individual probabilities.

From these three axioms, the remainder of probability theory can be derived. In the remainder of this section, I will explore some aspects of this broader theory.

### 10.2. Bayes theorem for updating probabilities

From the perspective of decision theory, one particularly important aspect of probability theory is the idea of a conditional probability. These represent how probable something is given a piece of information. So, for example, a conditional probability could represent how likely it is that it will be raining, conditioning on the fact that the weather forecaster predicted rain. A powerful technique for calculating conditional probabilities is Bayes theorem (see Yudkowsky, 2003 for a detailed introduction). This formula states that:

Bayes theorem is used to calculate the probability of some event, A, given some evidence, B. As such, this formula can be used to update probabilities based on new evidence. So if you are trying to predict the probability that it will rain tomorrow and someone gives you the information that the weather forecaster predicted that it will do so then this formula tells you how to calculate a new probability that it will rain based on your existing information. The initial probability in such cases (before the information is factored into account) is called the prior probability and the result of applying Bayes theorem is a new, posterior probability.

Bayes theorem can be seen as solving the problem of how to update prior probabilities based on new information. However, it leaves open the question of how to determine the prior probability in the first place. In some cases, there will be no obvious way to do so. One solution to this problem suggests that any reasonable prior can be selected. Given enough evidence, repeated applications of Bayes theorem will lead this prior probability to be updated to much the same posterior probability, even for people with widely different initial priors. As such, the initially selected prior is less crucial than it may at first seem.

### 10.3. How should probabilities be interpreted?

There are two main views about what probabilities mean: objectivism and subjectivism. Loosely speaking, the objectivist holds that probabilities tell us something about the external world while the subjectivist holds that they tell us something about our beliefs. Most decision theorists hold a subjectivist view about probability. According to this sort of view, probabilities represent a subjective degrees of belief. So to say the probability of rain is 0.8 is to say that the agent under consideration has a high degree of belief that it will rain (see Jaynes, 2003 for a defense of this view). Note that, according to this view, another agent in the same circumstance could assign a different probability that it will rain.

#### 10.3.1. Why should degrees of belief follow the laws of probability?

One question that might be raised against the subjective account of probability is why, on this account, our degrees of belief should satisfy the Kolmogorov axioms. For example, why should our subjective degrees of belief in mutually exclusive, exhaustive events add to 1? One answer to this question shows that agents whose degrees of belief don’t satisfy these axioms will be subject to Dutch Book bets. These are bets where the agent will inevitably lose money. Peterson (2009, ch. 7) explains:

Suppose, for instance, that you believe to degree 0.55 that at least one person from India will win a gold medal in the next Olympic Games (event G), and that your subjective degree of belief is 0.52 that no Indian will win a gold medal in the next Olympic Games (event ¬G). Also suppose that a cunning bookie offers you a bet on both of these events. The bookie promises to pay you \$1 for each event that actually takes place. Now, since your subjective degree of belief that G will occur is 0.55 it would be rational to pay up to \$1·0.55 = \$0.55 for entering this bet. Furthermore, since your degree of belief in ¬G is 0.52 you should be willing to pay up to \$0.52 for entering the second bet, since \$1·0.52 = \$0.52. However, by now you have paid \$1.07 for taking on two bets that are certain to give you a payoff of \$1 no matter what happens...Certainly, this must be irrational. Furthermore, the reason why this is irrational is that your subjective degrees of belief violate the probability calculus.

It can be proven that an agent is subject to Dutch Book bets if, and only if, their degrees of belief violate the axioms of probability. This provides an argument for why degrees of beliefs should satisfy these axioms.

#### 10.3.2. Measuring subjective probabilities

Another challenges raised by the subjective view is how we can measure probabilities. If these represent subjective degrees of belief there doesn’t seem to be an easy way to determine these based on observations of the world. However, a number of responses to this problem have been advanced, one of which is explained succinctly by Peterson (2009, ch. 7):

The main innovations presented by… Savage can be characterised as systematic procedures for linking probability… to claims about objectively observable behavior, such as preference revealed in choice behavior. Imagine, for instance, that we wish to measure Caroline’s subjective probability that the coin she is holding in her hand will land heads up the next time it is tossed. First, we ask her which of the following very generous options she would prefer.

A: “If the coin lands heads up you win a sports car; otherwise you win nothing.”

B: “If the coin does not land heads up you win a sports car; otherwise you win nothing.”

Suppose Caroline prefers A to B. We can then safely conclude that she thinks it is more probable that the coin will land heads up rather than not. This follows from the assumption that Caroline prefers to win a sports car rather than nothing, and that her preference between uncertain prospects is entirely determined by her beliefs and desires with respect to her prospects of winning the sports car...

Next, we need to generalise the measurement procedure outlined above such that it allows us to always represent Caroline’s degrees of belief with precise numerical probabilities. To do this, we need to ask Caroline to state preferences over a much larger set of options and then reason backwards… Suppose, for instance, that Caroline wishes to measure her subjective probability that her car worth \$20,000 will be stolen within one year. If she considers \$1,000 to be… the highest price she is prepared to pay for a gamble in which she gets \$20,000 if the event S: “The car stolen within a year” takes place, and nothing otherwise, then Caroline’s subjective probability for S is 1,00020,000 = 0.05, given that she forms her preferences in accordance with the principle of maximising expected monetary value...

The problem with this method is that very few people form their preferences in accordance with the principle of maximising expected monetary value. Most people have a decreasing marginal utility for money...

Fortunately, there is a clever solution to [this problem]. The basic idea is to impose a number of structural conditions on preferences over uncertain options [e.g. the transitivity axiom]. Then, the subjective probability function is established by reasoning backwards while taking the structural axioms into account: Since the decision maker preferrred some uncertain options to others, and her preferences… satisfy a number of structure axioms, the decision maker behaves as if she were forming her preferences over uncertain options by first assigning subjective probabilities and utilities to each option and thereafter maximising expected utility.

A peculiar feature of this approach is, thus, that probabilities (and utilities) are derived from ‘within’ the theory. The decision maker does not prefer an uncertain option to another because she judges the subjective probabilities (and utilities) of the outcomes to be more favourable than those of another. Instead, the… structure of the decision maker’s preferences over uncertain options logically implies that they can be described as if her choices were governed by a subjective probability function and a utility function...

...Savage’s approach [seeks] to explicate subjective interpretations of the probability axioms by making certain claims about preferences over… uncertain options. But… why on earth should a theory of subjective probability involve assumptions about preferences, given that preferences and beliefs are separate entities? Contrary to what is claimed by [Savage and others], emotionally inert decision makers failing to muster any preferences at all… could certainly hold partial beliefs.

Other theorists, for example DeGroot (1970), propose other approaches:

DeGroot’s basic assumption is that decision makers can make qualitative comparisons between pairs of events, and judge which one they think is most likely to occur. For example, he assumes that one can judge whether it is more, less, or equally likely, according to one’s own beliefs, that it will rain today in Cambridge than in Cairo. DeGroot then shows that if the agent’s qualitative judgments are sufficiently fine-grained and satisfy a number of structural axioms, then [they can be described by a probability distribution]. So in DeGroot’s… theory, the probability function is obtained by fine-tuning qualitative data, thereby making them quantitative.

## 11. What about “Newcomb’s problem” and alternative decision algorithms?

Saying that a rational agent “maximizes expected utility” is, unfortunately, not specific enough. There are a variety of decision algorithms which aim to maximize expected utility, and they give different answers to some decision problems, for example “Newcomb’s problem.”

In this section, we explain these decision algorithms and show how they perform on Newcomb’s problem and related “Newcomblike” problems.

General sources on this topic include: Campbell & Sowden (1985), Ledwig (2000), Joyce (1999), and Yudkowsky (2010). Moertelmaier (2013) discusses Newcomblike problems in the context of the agent-environment framework.

### 11.1. Newcomblike problems and two decision algorithms

I’ll begin with an exposition of several Newcomblike problems, so that I can refer to them in later sections. I’ll also introduce our first two decision algorithms, so that I can show how one’s choice of decision algorithm affects an agent’s outcomes on these problems.

#### 11.1.1. Newcomb’s Problem

Newcomb’s problem was formulated by the physicist William Newcomb but first published in Nozick (1969). Below I present a version of it inspired by Yudkowsky (2010).

A superintelligent machine named Omega visits Earth from another galaxy and shows itself to be very good at predicting events. This isn’t because it has magical powers, but because it knows more science than we do, has billions of sensors scattered around the globe, and runs efficient algorithms for modeling humans and other complex systems with unprecedented precision — on an array of computer hardware the size of our moon.

Omega presents you with two boxes. Box A is transparent and contains \$1000. Box B is opaque and contains either \$1 million or nothing. You may choose to take both boxes (called “two-boxing”), or you may choose to take only box B (called “one-boxing”). If Omega predicted you’ll two-box, then Omega has left box B empty. If Omega predicted you’ll one-box, then Omega has placed \$1M in box B.

By the time you choose, Omega has already left for its next game — the contents of box B won’t change after you make your decision. Moreover, you’ve watched Omega play a thousand games against people like you, and on every occasion Omega predicted the human player’s choice accurately.

Should you one-box or two-box?

Here’s an argument for two-boxing. The \$1M either is or is not in the box; your choice cannot affect the contents of box B now. So, you should two-box, because then you get \$1K plus whatever is in box B. This is a straightforward application of the dominance principle (section 6.1). Two-boxing dominantes one-boxing.

Convinced? Well, here’s an argument for one-boxing. On all those earlier games you watched, everyone who two-boxed received \$1K, and everyone who one-boxed received \$1M. So you’re almost certain that you’ll get \$1K for two-boxing and \$1M for one-boxing, which means that to maximize your expected utility, you should one-box.

Nozick (1969) reports:

I have put this problem to a large number of people… To almost everyone it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

This is not a “merely verbal” dispute (Chalmers 2011). Decision theorists have offered different algorithms for making a choice, and they have different outcomes. Translated into English, the first algorithm (evidential decision theory or EDT) says “Take actions such that you would be glad to receive the news that you had taken them.” The second algorithm (causal decision theory or CDT) says “Take actions which you expect to have a positive effect on the world.”

Many decision theorists have the intuition that CDT is right. But a CDT agent appears to “lose” on Newcomb’s problem, ending up with \$1000, while an EDT agent gains \$1M. Proponents of EDT can ask proponents of CDT: “If you’re so smart, why aren’t you rich?” As Spohn (2012) writes, “this must be poor rationality that complains about the reward for irrationality.” Or as Yudkowsky (2010) argues:

An expected utility maximizer should maximize utility — not formality, reasonableness, or defensibility...

In response to EDT’s apparent “win” over CDT on Newcomb’s problem, proponents of CDT have presented similar problems on which a CDT agent “wins” and an EDT agent “loses.” Proponents of EDT, meanwhile, have replied with additional Newcomblike problems on which EDT wins and CDT loses. Let’s explore each of them in turn.

#### 11.1.2. Evidential and causal decision theory

First, however, we will consider our two decision algorithms in a little more detail.

EDT can be described simply: according to this theory, agents should use conditional probabilities when determining the expected utility of different acts. Specifically, they should use the probability of the world being in each possible state conditioning on them carrying out the act under consideration. So in Newcomb’s problem they consider the probability that Box B contains \$1 million or nothing conditioning on the evidence provided by their decision to one-box or two-box. This is how the theory formalizes the notion of an act providing good news.

CDT is more complex, at least in part because it has been formulated in a variety of different ways and these formulations are equivalent to one another only if certain background assumptions are met. However, a good sense of the theory can be gained by considering the counterfactual approach, which is one of the more intuitive of these formulations. This approach utilizes the probabilities of certain counterfactual conditionals, which can be thought of as representing the causal influence of an agent’s acts on the state of the world. These conditionals take the form “if I were to carry out a certain act, then the world would be in a certain state.” So in Newcomb’s problem, for example, this formulation of CDT considers the probability of the counterfactuals like “if I were to one-box, then Box B would contain \$1 million” and, in doing so, considers the causal influence of one-boxing on the contents of the boxes.

The same distinction can be made in formulaic terms. Both EDT and CDT agree that decision theory should be about maximizing expected utility where the expected utility of an act, A, given a set of possible outcomes, O, is defined as follows:

.

In this equation, V(A & O) represents the value to the agent of the combination of an act and an outcome. So this is the utility that the agent will receive if they carry out a certain act and a certain outcome occurs. Further, PrAO represents the probability of each outcome occurring on the supposition that the agent carries out a certain act. It is in terms of this probability that CDT and EDT differ. EDT uses the conditional probability, Pr(O|A), while CDT uses the probability of subjunctive conditionals, Pr(A

O).

Using these two versions of the expected utility formula, it’s possible to demonstrate in a formal manner why EDT and CDT give the advice they do in Newcomb’s problem. To demonstrate this it will help to make two simplifying assumptions. First, we will presume that each dollar of money is worth 1 unit of utility to the agent (and so will presume that the agent’s utility is linear with money). Second, we will presume that Omega is a perfect predictor of human actions so that if the agent two-boxes it provides definitive evidence that there is nothing in the opaque box and if the agent one-boxes it provides definitive evidence that there is \$1 million in this box. Given these assumptions, EDT calculates the expected utility of each decision as follows:

Given that one-boxing has a higher expected utility according to these calculations, an EDT agent will one-box.

On the other hand, given that the agent’s decision doesn’t causally influence Omega’s earlier prediction, CDT will use the same probability regardless of whether you one or two box. The decision endorsed will be the same regardless of what probability we use so, to demonstrate the theory, we can simply arbitrarily assign an 0.5 probability that the opaque box has nothing in it and an 0.5 probability that it has one million dollars in it. CDT then calculates the expected utility of each decision as follows:

Given that two-boxing has a higher expected utility according to these calculations, a CDT agent will two-box. This approach demonstrates the result given more informally in the previous section: CDT agents will two-box in Newcomb’s problem and EDT agents will one box.

As mentioned before, there are also alternative formulations of CDT. What are these? For example, David Lewis (1981) and Brian Skyrms (1980) both present approaches that rely on the partition of the world into states to capture causal information, rather than counterfactual conditionals. On Lewis’s version of this account, for example, the agent calculates the expected utility of acts using their unconditional credence in states of the world that are dependency hypotheses, which are descriptions of the possible ways that the world can depend on the agent’s actions. These dependency hypotheses intrinsically contain the required causal information.

Other traditional approaches to CDT include the imaging approach of Sobel (1980) (also see Lewis 1981) and the unconditional expectations approach of Leonard Savage (1954). Those interested in the various traditional approaches to CDT would be best to consult Lewis (1981), Weirich (2008), and Joyce (1999). More recently, work in computer science on a tool called causal Bayesian networks has led to an innovative approach to CDT that has received some recent attention in the philosophical literature (Pearl 2000, ch. 4 and Spohn 2012).

Now we return to an analysis of decision scenarios, armed with EDT and the counterfactual formulation of CDT.

#### 11.1.3. Medical Newcomb problems

Medical Newcomb problems share a similar form but come in many variants, including Solomon’s problem (Gibbard & Harper 1976) and the smoking lesion problem (Egan 2007). Below I present a variant called the “chewing gum problem” (Yudkowsky 2010):

Suppose that a recently published medical study shows that chewing gum seems to cause throat abscesses — an outcome-tracking study showed that of people who chew gum, 90% died of throat abscesses before the age of 50. Meanwhile, of people who do not chew gum, only 10% die of throat abscesses before the age of 50. The researchers, to explain their results, wonder if saliva sliding down the throat wears away cellular defenses against bacteria. Having read this study, would you choose to chew gum? But now a second study comes out, which shows that most gum-chewers have a certain gene, CGTA, and the researchers produce a table showing the following mortality rates:

 CGTA present CGTA absent Chew Gum 89% die 8% die Don’t chew 99% die 11% die

This table shows that whether you have the gene CGTA or not, your chance of dying of a throat abscess goes down if you chew gum. Why are fatalities so much higher for gum-chewers, then? Because people with the gene CGTA tend to chew gum and die of throat abscesses. The authors of the second study also present a test-tube experiment which shows that the saliva from chewing gum can kill the bacteria that form throat abscesses. The researchers hypothesize that because people with the gene CGTA are highly susceptible to throat abscesses, natural selection has produced in them a tendency to chew gum, which protects against throat abscesses. The strong correlation between chewing gum and throat abscesses is not because chewing gum causes throat abscesses, but because a third factor, CGTA, leads to chewing gum and throat abscesses.

Having learned of this new study, would you choose to chew gum? Chewing gum helps protect against throat abscesses whether or not you have the gene CGTA. Yet a friend who heard that you had decided to chew gum (as people with the gene CGTA often do) would be quite alarmed to hear the news — just as she would be saddened by the news that you had chosen to take both boxes in Newcomb’s Problem. This is a case where [EDT] seems to return the wrong answer, calling into question the validity of the… rule “Take actions such that you would be glad to receive the news that you had taken them.” Although the news that someone has decided to chew gum is alarming, medical studies nonetheless show that chewing gum protects against throat abscesses. [CDT’s] rule of “Take actions which you expect to have a positive physical effect on the world” seems to serve us better.

One response to this claim, called the tickle defense (Eells, 1981), argues that EDT actually reaches the right decision in such cases. According to this defense, the most reasonable way to construe the “chewing gum problem” involves presuming that CGTA causes a desire (a mental “tickle”) which then causes the agent to be more likely to chew gum, rather than CGTA directly causing the action. Given this, if we presume that the agent already knows their own desires and hence already knows whether they’re likely to have the CGTA gene, chewing gum will not provide the agent with further bad news. Consequently, an agent following EDT will chew in order to get the good news that they have decreased their chance of getting abscesses.

Unfortunately, the tickle defense fails to achieve its aims. In introducing this approach, Eells hoped that EDT could be made to mimic CDT but without an allegedly inelegant reliance on causation. However, Sobel (1994, ch. 2) demonstrated that the tickle defense failed to ensure that EDT and CDT would decide equivalently in all cases. On the other hand, those who feel that EDT originally got it right by one-boxing in Newcomb’s problem will be disappointed to discover that the tickle defense leads an agent to two-box in some versions of Newcomb’s problem and so solves one problem for the theory at the expense of introducing another.

So just as CDT “loses” on Newcomb’s problem, EDT will “lose” on Medical Newcomb problems (if the tickle defense fails) or will join CDT and “lose” on Newcomb’s Problem itself (if the tickle defense succeeds).

#### 11.1.4. Newcomb’s soda

There are also similar problematic cases for EDT where the evidence provided by your decision relates not to a feature that you were born (or created) with but to some other feature of the world. One such scenario is the Newcomb’s soda problem, introduced in Yudkowsky (2010):

You know that you will shortly be administered one of two sodas in a double-blind clinical test. After drinking your assigned soda, you will enter a room in which you find a chocolate ice cream and a vanilla ice cream. The first soda produces a strong but entirely subconscious desire for chocolate ice cream, and the second soda produces a strong subconscious desire for vanilla ice cream. By “subconscious” I mean that you have no introspective access to the change, any more than you can answer questions about individual neurons firing in your cerebral cortex. You can only infer your changed tastes by observing which kind of ice cream you pick.

It so happens that all participants in the study who test the Chocolate Soda are rewarded with a million dollars after the study is over, while participants in the study who test the Vanilla Soda receive nothing. But subjects who actually eat vanilla ice cream receive an additional thousand dollars, while subjects who actually eat chocolate ice cream receive no additional payment. You can choose one and only one ice cream to eat. A pseudo-random algorithm assigns sodas to experimental subjects, who are evenly divided (50/​50) between Chocolate and Vanilla Sodas. You are told that 90% of previous research subjects who chose chocolate ice cream did in fact drink the Chocolate Soda, while 90% of previous research subjects who chose vanilla ice cream did in fact drink the Vanilla Soda. Which ice cream would you eat?

In this case, an EDT agent will decide to eat chocolate ice cream as this would provide evidence that they drank the chocolate soda and hence that they will receive \$1 million after the experiment. However, this seems to be the wrong decision and so, once again, the EDT agent “loses”.

#### 11.1.5. Bostrom’s meta-Newcomb problem

In response to attacks on their theory, the proponent of EDT can present alternative scenarios where EDT “wins” and it is CDT that “loses”. One such case is the meta-Newcomb problem proposed in Bostrom (2001). Adapted to fit my earlier story about Omega the superintelligent machine (section 11.1.1), the problem runs like this: Either Omega has already placed \$1M or nothing in box B (depending on its prediction about your choice), or else Omega is watching as you choose and after your choice it will place \$1M into box B only if you have one-boxed. But you don’t know which is the case. Omega makes its move before the human player’s choice about half the time, and the rest of the time it makes its move after the player’s choice.

But now suppose there is another superintelligent machine, Meta-Omega, who has a perfect track record of predicting both Omega’s choices and the choices of human players. Meta-Omega tells you that either you will two-box and Omega will “make its move” after you make your choice, or else you will one-box and Omega has already made its move (and gone on to the next game, with someone else).

Here, an EDT agent one-boxes and walks away with a million dollars. On the face of it, however, a CDT agent faces a dilemma: if she two-boxes then Omega’s action depends on her choice, so the “rational” choice is to one-box. But if the CDT agent one-boxes, then Omega’s action temporally precedes (and is thus physically independent of) her choice, so the “rational” action is to two-box. It might seem, then, that a CDT agent will be unable to reach any decision in this scenario. However, further reflection reveals that the issue is more complicated. According to CDT, what the agent ought to do in this scenario depends on their credences about their own actions. If they have a high credence that they will two-box, they ought to one-box and if they have a high credence that they will one-box, they ought to two box. Given that the agent’s credences in their actions are not given to us in the description of the meta-Newcomb problem, the scenario is underspecified and it is hard to know what conclusions should be drawn from it.

#### 11.1.6. The psychopath button

Fortunately, another case has been introduced where, according to CDT, what an agent ought to do depends on their credences about what they will do. This is the psychopath button, introduced in Egan (2007):

Paul is debating whether to press the “kill all psychopaths” button. It would, he thinks, be much better to live in a world with no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. Paul very strongly prefers living in a world with psychopaths to dying. Should Paul press the button?

Many people think Paul should not. After all, if he does so, he is almost certainly a psychopath and so pressing the button will almost certainly cause his death. This is also the response that an EDT agent will give. After all, pushing the button would provide the agent with the bad news that they are almost certainly a psychopath and so will die as a result of their action.

On the other hand, if Paul is fairly certain that he is not a psychopath, then CDT will say that he ought to press the button. CDT will note that, given Paul’s confidence that he isn’t a psychopath, his decision will almost certainly have a positive impact as it will result in the death of all psychopaths and Paul’s survival. On the face of it, then, a CDT agent would decide inappropriately in this case by pushing the button. Importantly, unlike in the meta-Newcomb problem, the agent’s credences about their own behavior are specified in Egan’s full version of this scenario (in non-numeric terms, the agent thinks they’re unlikely to be a psychopath and hence unlikely to press the button).

However, in order to produce this problem for CDT, Egan made a number of assumptions about how an agent should decide when what they ought to do depends on what they think they will do. In response, alternative views about deciding in such cases have been advanced (particular in Arntzenius, 2008 and Joyce, 2012). Given these factors, opinions are split about whether the psychopath button problem does in fact pose a challenge to CDT.

#### 11.1.7. Parfit’s hitchhiker

Not all decision scenarios are problematic for just one of EDT or CDT. There are also cases that can be presented where both an EDT agent and a CDT agent will both “lose”. One such case is Parfit’s Hitchhiker (Parfit, 1984, p. 7):

Suppose that I am driving at midnight through some desert. My car breaks down. You are a stranger, and the only other driver near. I manage to stop you, and I offer you a great reward if you rescue me. I cannot reward you now, but I promise to do so when we reach my home. Suppose next that I am transparent, unable to deceive others. I cannot lie convincingly. Either a blush, or my tone of voice, always gives me away. Suppose, finally, that I know myself to be never self-denying. If you drive me to my home, it would be worse for me if I gave you the promised reward. Since I know that I never do what will be worse for me, I know that I shall break my promise. Given my inability to lie convincingly, you know this too. You do not believe my promise, and therefore leave me stranded in the desert.

In this scenario the agent “loses” if they would later refuse to give the stranger the reward. However, both EDT agents and CDT agents will refuse to do so. After all, by this point the agent will already be safe so giving the reward can neither provide good news about, nor cause, their safety. So this seems to be a case where both theories “lose”.

#### 11.1.8. Transparent Newcomb’s problem

There are also other cases where both EDT and CDT “lose”. One of these is the Transparent Newcomb’s problem which, in at least one version, is due to Drescher (2006, p. 238-242). This scenario is like the original Newcomb’s problem but, in this case, both boxes are transparent so you can see their contents when you make your decision. Again, Omega has filled box A with \$1000 and Box B with either \$1 million or nothing based on a prediction of your behavior. Specifically, Omega has predicted how you would decide if you witnessed \$1 million in Box B. If Omega predicted that you would one-box in this case, he placed \$1 million in Box B. On the other hand, if Omega predicted that you would two-box in this case then he placed nothing in Box B.

Both EDT and CDT agents will two-box in this case. After all, the contents of the boxes are determined and known so the agent’s decision can neither provide good news about what they contain nor cause them to contain something desirable. As with two-boxing in the original version of Newcomb’s problem, many philosophers will endorse this behavior.

However, it’s worth noting that Omega will almost certainly have predicted this decision and so filled Box B with nothing. CDT and EDT agents will end up with \$1000. On the other hand, just as in the original case, the agent that one-boxes will end up with \$1 million. So this is another case where both EDT and CDT “lose”. Consequently, to those that agree with the earlier comments (in section 11.1.1) that a decision theory shouldn’t lead an agent to “lose”, neither of these theories will be satisfactory.

#### 11.1.9. Counterfactual mugging

Another similar case, known as counterfactual mugging, was developed in Nesov (2009):

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it \$100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your \$100. But see, the Omega tells you that if the coin came up heads instead of tails, it’d give you \$10000, but only if you’d agree to give it \$100 if the coin came up tails.

Should you give up the \$100?

Both CDT and EDT say no. After all, giving up your money neither provides good news about nor influences your chances of getting \$10 000 out of the exchange. Further, this intuitively seems like the right decision. On the face of it, then, it is appropriate to retain your money in this case.

However, presuming you take Omega to be perfectly trustworthy, there seems to be room to debate this conclusion. If you are the sort of agent that gives up the \$100 in counterfactual mugging then you will tend to do better than the sort of agent that won’t give up the \$100. Of course, in the particular case at hand you will lose but rational agents often lose in specific cases (as, for example, when such an agent loses a rational bet). It could be argued that what a rational agent should not do is be the type of agent that loses. Given that agents that refuse to give up the \$100 are the type of agent that loses, there seem to be grounds to claim that counterfactual mugging is another case where both CDT and EDT act inappropriately.

#### 11.1.10. Prisoner’s dilemma

Before moving on to a more detailed discussion of various possible decision theories, I’ll consider one final scenario: the prisoner’s dilemma. Resnik (1987, pp. 147-148 ) outlines this scenario as follows:

Two prisoners...have been arrested for vandalism and have been isolated from each other. There is sufficient evidence to convict them on the charge for which they have been arrested, but the prosecutor is after bigger game. He thinks that they robbed a bank together and that he can get them to confess to it. He summons each separately to an interrogation room and speaks to each as follows: “I am going to offer the same deal to your partner, and I will give you each an hour to think it over before I call you back. This is it: If one of you confesses to the bank robbery and the other does not, I will see to it that the confessor gets a one-year term and that the other guy gets a twenty-five year term. If you both confess, then it’s ten years apiece. If neither of you confesses, then I can only get two years apiece on the vandalism charge...”

The decision matrix of each vandal will be as follows:

 Partner confesses Partner lies Confess 10 years in jail 1 year in jail Lie 25 years in jail 2 years in jail

Faced with this scenario, a CDT agent will confess. After all, the agent’s decision can’t influence their partner’s decision (they’ve been isolated from one another) and so the agent is better off confessing regardless of what their partner chooses to do. According to the majority of decision (and game) theorists, confessing is in fact the rational decision in this case.

Despite this, however, an EDT agent may lie in a prisoner’s dilemma. Specifically, if they think that their partner is similar enough to them, the agent will lie because doing so will provide the good news that they will both lie and hence that they will both get two years in jail (good news as compared with the bad news that they will both confess and hence that they will get 10 years in jail).

To many people, there seems to be something compelling about this line of reasoning. For example, Douglas Hofstadter (1985, pp. 737-780) has argued that an agent acting “superrationally” would co-operate with other superrational agents for precisely this sort of reason: a superrational agent would take into account the fact that other such agents will go through the same thought process in the prisoner’s dilemma and so make the same decision. As such, it is better that that the decision that both agents reach be to lie than that it be to confess. More broadly, it could perhaps be argued that a rational agent should lie in the prisoner’s dilemma as long as they believe that they are similar enough to their partner that they are likely to reach the same decision.

It is unclear, then, precisely what should be concluded from the prisoner’s dilemma. However, for those that are sympathetic to Hofstadter’s point or the line of reasoning appealed to by the EDT agent, the scenario seems to provide an additional reason to seek out an alternative theory to CDT.

### 11.2. Benchmark theory (BT)

One recent response to the apparent failure of EDT to decide appropriately in medical Newcomb problems and CDT to decide appropriately in the psychopath button is Benchmark Theory (BT) which was developed in Wedgwood (2011) and discussed further in Briggs (2010).

In English, we could think of this decision algorithm as saying that agents should decide so as to give their future self good news about how well off they are compared to how well off they could have been. In formal terms, BT uses the following formula to calculate the expected utility of an act, A:

.

In other words, it uses the conditional probability, as in EDT but calculates the value differently (as indicated by the use of V’ rather than V). V’ is calculated relative to a benchmark value in order to give a comparative measure of value (both of the above sources go into more detail about this process).

Taking the informal perspective, in the chewing gum problem, BT will note that by chewing gum, the agent will always get the good news that they are comparatively better off than they could have been (because chewing gum helps control throat abscesses) whereas by not chewing, the agent will always get the bad news that they could have been comparatively better off by chewing. As such, a BT agent will chew in this scenario.

Further, BT seems to reach what many consider to be the right decision in the psychopath button. In this case, the BT agent will note that if they push the button they will get the bad news that they are almost certainly a psychopath and so that they would have been comparatively much better off by not pushing (as pushing will kill them). On the other hand, if they don’t push they will get the less bad news that they are almost certainly not a psychopath and so could have been comparatively a little better off it they had pushed the button (as this would have killed all the psychopaths but not them). So refraining from pushing the button gives the less bad news and so is the rational decision.

On the face of it, then, there seem to be strong reasons to find BT compelling: it decides appropriately in these scenarios while, according to some people, EDT and CDT only decide appropriately in one or the other of them.

Unfortunately, a BT agent will fail to decide appropriately in other scenarios. First, those that hold that one-boxing is the appropriate decision in Newcomb’s problem will immediately find a flaw in BT. After all, in this scenario two-boxing gives the good news that the agent did comparatively better than they could have done (because they gain the \$1000 from Box A which is more than they would have received otherwise) while one-boxing brings the bad news that they did comparatively worse than they could have done (as they did not receive this money). As such, a BT agent will two-box in Newcomb’s problem.

Further, Briggs (2010) argues, though Wedgwood (2011) denies, that BT suffers from other problems. As such, even for those who support two-boxing in Newcomb’s problem, it could be argued that BT doesn’t represent an adequate theory of choice. It is unclear, then, whether BT is a desirable replacement to alternative theories.

### 11.3. Timeless decision theory (TDT)

Yudkowsky (2010) offers another decision algorithm, timeless decision theory or TDT (see also Altair, 2013). Specifically, TDT is intended as an explicit response to the idea that a theory of rational choice should lead an agent to “win”. As such, it will appeal to those who think it is appropriate to one-box in Newcomb’s problem and chew in the chewing gum problem.

In English, this algorithm can be approximated as saying that an agent ought to choose as if CDT were right but they were determining not their actual decision but rather the result of the abstract computation of which their decision is one concrete instance. Formalizing this decision algorithm would require a substantial document in its own right and so will not be carried out in full here. Briefly, however, TDT is built on top of causal Bayesian networks (Pearl, 2000) which are graphs where the arrows represent causal influence. TDT supplements these graphs by adding nodes representing abstract computations and taking the abstract computation that determines an agent’s decision to be the object of choice rather than the concrete decision itself (see Yudkowsky, 2010 for a more detailed description).

Returning to an informal discussion, an example will help clarify the form taken by TDT: imagine that two perfect replicas of a person are placed in identical rooms and asked to make the same decision. While each replica will make their own decision, in doing so, they will be carrying out the same computational process. As such, TDT will say that the replicas ought to act as if they are determining the result of this process and hence as if they are deciding the behavior of both copies.

Something similar can be said about Newcomb’s problem. In this case it is almost like there is again a replica of the agent: Omega’s model of the agent that it used to predict the agent’s behavior. Both the original agent and this “replica” responds to the same abstract computational process as one another. In other words, both Omega’s prediction and the agent’s behavior are influenced by this process. As such, TDT advises the agent to act as if they are determining the result of this process and, hence, as if they can determine Omega’s box filling behavior. As such, a TDT agent will one-box in order to determine the result of this abstract computation in a way that leads to \$1 million being placed in Box B.

TDT also succeeds in other areas. For example, in the chewing gum problem there is no “replica” agent so TDT will decide in line with standard CDT and choose to chew gum. Further, in the prisoner’s dilemma, a TDT agent will lie if its partner is another TDT agent (or a relevantly similar agent). After all, in this case both agents will carry out the same computational process and so TDT will advise that the agent act as if they are determining this process and hence simultaneously determining both their own and their partner’s decision. If so then it is better for the agent that both of them lie than that both of them confess.

However, despite its success, TDT also “loses” in some decision scenarios. For example, in counterfactual mugging, a TDT agent will not choose to give up the \$100. This might seem surprising. After all, as with Newcomb’s problem, this case involves Omega predicting the agent’s behavior and hence involves a “replica”. However, this case differs in that the agent knows that the coin came up heads and so knows that they have nothing to gain by giving up the money.

For those who feel that a theory of rational choice should lead an agent to “win”, then, TDT seems like a step in the right direction but further work is required if it is to “win” in the full range of decision scenarios.

### 11.4. Decision theory and “winning”

In the previous section, I discussed TDT, a decision algorithm that could be advanced as replacements for CDT and EDT. One of the primary motivations for developing TDT is a sense that both CDT and EDT fail to reason in a desirable manner in some decision scenarios. However, despite acknowledging that CDT agents end up worse off in Newcomb’s Problem, many (and perhaps the majority of) decision theorists are proponents of CDT. On the face of it, this may seem to suggest that these decision theorists aren’t interested in developing a decision algorithm that “wins” but rather have some other aim in mind. If so then this might lead us to question the value of developing one-boxing decision algorithms.

However, the claim that most decision theorists don’t care about finding an algorithm that “wins” mischaracterizes their position. After all, proponents of CDT tend to take the challenge posed by the fact that CDT agents “lose” in Newcomb’s problem seriously (in the philosophical literature, it’s often referred to as the Why ain’cha rich? problem). A common reaction to this challenge is neatly summarized in Joyce (1999, p. 153-154 ) as a response to a hypothetical question about why, if two-boxing is rational, the CDT agent does not end up as rich as an agent that one-boxes:

Rachel has a perfectly good answer to the “Why ain’t you rich?” question. “I am not rich,” she will say, “because I am not the kind of person [Omega] thinks will refuse the money. I’m just not like you, Irene [the one-boxer]. Given that I know that I am the type who takes the money, and given that [Omega] knows that I am this type, it was reasonable of me to think that the \$1,000,000 was not in [the box]. The \$1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it.”

Irene may want to press the point here by asking, “But don’t you wish you were like me, Rachel?”… Rachel can and should admit that she does wish she were more like Irene… At this point, Irene will exclaim, “You’ve admitted it! It wasn’t so smart to take the money after all.” Unfortunately for Irene, her conclusion does not follow from Rachel’s premise. Rachel will patiently explain that wishing to be a [one-boxer] in a Newcomb problem is not inconsistent with thinking that one should take the \$1,000 whatever type one is. When Rachel wishes she was Irene’s type she is wishing for Irene’s options, not sanctioning her choice… While a person who knows she will face (has faced) a Newcomb problem might wish that she were (had been) the type that [Omega] labels a [one-boxer], this wish does not provide a reason for being a [one-boxer]. It might provide a reason to try (before [the boxes are filled]) to change her type if she thinks this might affect [Omega’s] prediction, but it gives her no reason for doing anything other than taking the money once she comes to believes that she will be unable to influence what [Omega] does.

In other words, this response distinguishes between the winning decision and the winning type of agent and claims that two-boxing is the winning decision in Newcomb’s problem (even if one-boxers are the winning type of agent). Consequently, insofar as decision theory is about determining which decision is rational, on this account CDT reasons correctly in Newcomb’s problem.

For those that find this response perplexing, an analogy could be drawn to the chewing gum problem. In this scenario, there is near unanimous agreement that the rational decision is to chew gum. However, statistically, non-chewers will be better off than chewers. As such, the non-chewer could ask, “if you’re so smart, why aren’t you healthy?” In this case, the above response seems particularly appropriate. The chewers are less healthy not because of their decision but rather because they’re more likely to have an undesirable gene. Having good genes doesn’t make the non-chewer more rational but simply more lucky. The proponent of CDT simply makes a similar response to Newcomb’s problem: one-boxers aren’t richer because of their decision but rather because of the type of agent that they were when the boxes were filled.

One final point about this response is worth noting. A proponent of CDT can accept the above argument but still acknowledge that, if given the choice before the boxes are filled, they would be rational to choose to modify themselves to be a one-boxing type of agent (as Joyce acknowledged in the above passage and as argued for in Burgess, 2004). To the proponent of CDT, this is unproblematic: if we are sometimes rewarded not for the rationality of our decisions in the moment but for the type of agent we were at some past moment then it should be unsurprising that changing to a different type of agent might be beneficial.

The response to this defense of two-boxing in Newcomb’s problem has been divided. Many find it compelling but others, like Ahmed and Price (2012) think it does not adequately address to the challenge:

It is no use the causalist’s whining that foreseeably, Newcomb problems do in fact reward irrationality, or rather CDT-irrationality. The point of the argument is that if everyone knows that the CDT-irrational strategy will in fact do better on average than the CDT-rational strategy, then it’s rational to play the CDT-irrational strategy.

Given this, there seem to be two positions one could take on these issues. If the response given by the proponent of CDT is compelling, then we should be attempting to develop a decision theory that two-boxes on Newcomb’s problem. Perhaps the best theory for this role is CDT but perhaps it is instead BT, which many people think reasons better in the psychopath button scenario. On the other hand, if the response given by the proponents of CDT is not compelling, then we should be developing a theory that one-boxes in Newcomb’s problem. In this case, TDT, or something like it, seems like the most promising theory currently on offer.

• I don’t really think Newcomb’s problem or any of its variations belong in here. Newcomb’s problem is not a decision theory problem, the real difficulty is translating the underspecified English into a payoff matrix.

The ambiguity comes from the the combination of the two claims, (a) Omega being a perfect predictor and (b) the subject being allowed to choose after Omega has made its prediction. Either these two are inconsistent, or they necessitate further unstated assumptions such as backwards causality.

First, let us assume (a) but not (b), which can be formulated as follows: Omega, a computer engineer, can read your code and test run it as many times as he would like in advance. You must submit (simple, unobfuscated) code which either chooses to one- or two-box. The contents of the boxes will depend on Omega’s prediction of your code’s choice. Do you submit one- or two-boxing code?

Second, let us assume (b) but not (a), which can be formulated as follows: Omega has subjected you to the Newcomb’s setup, but because of a bug in its code, its prediction is based on someone else’s choice than yours, which has no correlation with your choice whatsoever. Do you one- or two-box?

Both of these formulations translate straightforwardly into payoff matrices and any sort of sensible decision theory you throw at them give the correct solution. The paradox disappears when the ambiguity between the two above possibilities are removed. As far as I can see, all disagreement between one-boxers and two-boxers are simply a matter of one-boxers choosing the first and two-boxers choosing the second interpretation. If so, Newcomb’s paradox is not as much interesting as poorly specified. The supposed superiority of TDT over CDT either relies on the paradox not reducing to either of the above or by fiat forcing CDT to work with the wrong payoff matrices.

I would be interested to see an unambiguous and nontrivial formulation of the paradox.

• Allowing Omega to do its prediction by time travel directly contradicts box B contains either \$0 or \$1,000,000 before the game begins, and once the game begins even the Predictor is powerless to change the contents of the boxes. Also, this obviously make one-boxing the correct choice.

• Allowing Omega to accurately simulate the subject reduces to problem to submit code for Omega to evaluate; this is not exactly paradoxical, but then the player is called upon to choose which boxes to take actually means the code then runs and returns its expected value, which clearly reduces to one-boxing.

• Making Omega an imperfect predictor, with an accuracy of p<1.0 simply creates a superposition of the first and second case above, which still allows for straightforward analysis.

• Allowing unpredictable, probabilistic strategies violates the supposed predictive power of Omega, but again cleanly reduces to payoff matrices.

• Finally, the number of variations such as the psychopath button are completely transparent, once you decide between choice is magical and free will and stuff which leads to pressing the button, and the supposed choice is deterministic and there is no choice to make, but code which does not press the button is clearly the most healthy.

• Thanks for this post; it articulates many of the thoughts I’ve had on the apparent inconsistency of common decision-theoretic paradoxes such as Newcomb’s problem. I’m not an expert in decision theory, but I have a computer science background and significant exposure to these topics, so let me give it a shot.

The strategy I have been considering in my attempt to prove a paradox inconsistent is to prove a contradiction using the problem formulation. In Newcomb’s problem, suppose each player uses a fair coin flip to decide whether to one-box or two-box. Then Omega could not have a sustained correct prediction rate above 50%. But the problem formulation says Omega does; therefore the problem must be inconsistent.

Alternatively, Omega knew the outcome of the coin flip in advance; let’s say Omega has access to all relevant information, including any supposed randomness used by the decision-maker. Then we can consider the decision to already have been made; the idea of a choice occurring after Omega has left is illusory (i.e. deterministic; anyone with enough information could have predicted it.) Admittedly, as you say quite eloquently:

Choice is not something inherent to a system, but a feature of an outsider’s model of a system, in much the same sense as random is not something inherent to a Eeny, meeny, miny, moe however much it might seem that way to children.

In this case of the all-knowing Omega, talking about what someone should choose after Omega has left seems mistaken. The agent is no longer free to make an arbitrary decision at run-time, since that would have backwards causal implications; we can, without restricting which algorithm is chosen, require the decision-making algorithm to be written down and provided to Omega prior to the whole simulation. Since Omega can predict the agent’s decision, the agent’s decision does determine what’s in the box, despite the usual claim of no causality. Taking that into account, CDT doesn’t fail after all.

It really does seem to me like most of these supposed paradoxes of decision theory have these inconsistent setups. I see that wedrifid says of coin flips:

If the FAQ left this out then it is indeed faulty. It should either specify that if Omega predicts the human will use that kind of entropy then it gets a “Fuck you” (gets nothing in the big box, or worse) or, at best, that Omega awards that kind of randomization with a proportional payoff (ie. If behavior is determined by a fair coin then the big box contains half the money.)

This is a fairly typical (even “Frequent”) question so needs to be included in the problem specification. But it can just be considered a minor technical detail.

I would love to hear from someone in further detail on these issues of consistency. Have they been addressed elsewhere? If so, where?

• The strategy I have been considering in my attempt to prove a paradox inconsistent is to prove a contradiction using the problem formulation.

This seems like a worthy approach to paradoxes! I’m going to suggest the possibility of broadening your search slightly. Specifically, to include the claim “and this is paradoxical” as one of the things that can be rejected as producing contradictions. Because in this case there just isn’t a paradox. You take the one box, get rich and if there is a decision theory that says to take both boxes you get a better theory. For this reason “Newcomb’s Paradox” is a misnomer and I would only use “Newcomb’s Problem” as an acceptable name.

In Newcomb’s problem, suppose each player uses a fair coin flip to decide whether to one-box or two-box. Then Omega could not have a sustained correct prediction rate above 50%. But the problem formulation says Omega does; therefore the problem must be inconsistent.

Yes, if the player is allowed access to entropy that Omega cannot have then it would be absurd to also declare that Omega can predict perfectly. If the coin flip is replaced with a quantum coinflip then the problem becomes even worse because it leaves an Omega that can perfectly predict what will happen but is faced with a plainly inconsistent task of making contradictory things happen. The problem specification needs to include a clause for how ‘randomization’ is handled.

Alternatively, Omega knew the outcome of the coin flip in advance; let’s say Omega has access to all relevant information, including any supposed randomness used by the decision-maker. Then we can consider the decision to already have been made; the idea of a choice occurring after Omega has left is illusory (i.e. deterministic; anyone with enough information could have predicted it.)

Here is where I should be able to link you to the wiki page on free will where you would be given an explanation of why the notion that determinism is incompatible with choice is a confusion. Alas that page still has pretentious “Find Out For Yourself” tripe on it instead of useful content. The wikipedia page on compatibilism is somewhat useful but not particularly tailored to a reductionist decision theory focus.

In this case of the all-knowing Omega, talking about what someone should choose after Omega has left seems mistaken. The agent is no longer free to make an arbitrary decision at run-time, since that would have backwards causal implications; we can, without restricting which algorithm is chosen, require the decision-making algorithm to be written down and provided to Omega prior to the whole simulation. Since Omega can predict the agent’s decision, the agent’s decision does determine what’s in the box, despite the usual claim of no causality. Taking that into account, CDT doesn’t fail after all.

There have been attempts to create derivatives of CDT that work like that. That replace the “C” from conventional CDT with a type of causality that runs about in time as you mention. Such decision theories do seem to handle most of the problems that CDT fails at. Unfortunately I cannot recall the reference.

I would love to hear from someone in further detail on these issues of consistency. Have they been addressed elsewhere? If so, where?

I’m not sure which further details you are after. Are you after a description of Newcomb’s problem that includes the details necessary to make it consistent? Or about other potential inconsistencies? Or other debates about whether the problems are inconsistent?

• I’m not sure which further details you are after.

Thanks for the response! I’m looking for a formal version of the viewpoint you reiterated at the beginning of your most recent comment:

Yes, if the player is allowed access to entropy that Omega cannot have then it would be absurd to also declare that Omega can predict perfectly. [...] The problem specification needs to include a clause for how ‘randomization’ is handled.

That makes a lot of sense, but I haven’t been able to find it stated formally. Wolpert and Benford’s papers (using game theory decision trees or alternatively plain probability theory) seem to formally show that the problem formulation is ambiguous, but they are recent papers, and I haven’t been able to tell how well they stand up to outside analysis.

If there is a consensus that the sufficient use of randomness prevents Omega from having perfect or nearly perfect predictions, then why is Newcomb’s problem still relevant? If there’s no randomness, wouldn’t an appropriate application of CDT result in one-boxing since the decision-maker’s choice and Omega’s prediction are both causally determined by the decision-maker’s algorithm, which was fixed prior to the making of the decision?

There have been attempts to create derivatives of CDT that work like that. That replace the “C” from conventional CDT with a type of causality that runs about in time as you mention. Such decision theories do seem to handle most of the problems that CDT fails at. Unfortunately I cannot recall the reference.

I’m curious: why can’t normal CDT handle it by itself? Consider two variants of Newcomb’s problem:

1. At run-time, you get to choose the actual decision made in Newcomb’s problem. Omega made its prediction without any information about your choice or what algorithms you might use to make it. In other words, Omega doesn’t have any particular insight into your decision-making process. This means at run-time you are free to choose between one-boxing and two-boxing without backwards causal implications. In this case Omega cannot make perfect or nearly perfect predictions, for reasons of randomness which we already discussed.

2. You get to write the algorithm, the output of which will determine the choice made in Newcomb’s problem. Omega gets access to the algorithm in advance of its prediction. No run-time randomness is allowed. In this case, Omega can be a perfect predictor. But the correct causal network shows that both the decision-maker’s “choice” as well as Omega’s prediction are causally downstream from the selection of the decision-making algorithm. CDT holds in this case because you aren’t free at run-time to make any choice other than what the algorithm outputs. A CDT algorithm would identify two consistent outcomes: (one-box && Omega predicted one-box), and (two-box && Omega predicted two-box). Coded correctly, it would prefer whichever consistent outcome had the highest expected utility, and so it would one-box.

(Note: I’m out of my depth here, and I haven’t given a great deal of thought to precommitment and the possibility of allowing algorithms to rewrite themselves.)

• You can consider an ideal agent that uses argmax E to find what it chooses, where E is some environment function . Then what you arrive at is that argmax gets defined recursively—E contains argmax as well—and it just so happens that the resulting expression is only well defined if there’s nothing in the first box and you choose both boxes. I’m writing a short paper about that.

• There have been attempts to create derivatives of CDT that work like that. That replace the “C” from conventional CDT with a type of causality that runs about in time as you mention. Such decision theories do seem to handle most of the problems that CDT fails at. Unfortunately I cannot recall the reference.

You may be thinking of Huw Price’s paper available here

• I agree; wherever there is paradox and endless debate, I have always found ambiguity in the initial posing of the question. An unorthodox mathematician named Norman Wildberger just released a new solution by unambiguously specifying what we know about Omega’s predictive powers.

• I seems to me that what he gives is not so much a new solution as a neat generalized formulation. His formula gives you different results depending on whether you’re a causal decision theorist or not.

The causal decision theorist will say that his pA should be considered to be P(prediction = A|do(A)) and pB is P(prediction = B|do(B)), which will, unless you assume backward causation, just be P(prediction = A) and P(prediction = B) and thus sum to 1, hence the inequality at the end doesn’t hold and you should two-box.

• I do not agree that a CDT must conclude that P(A)+P(B) = 1. The argument only holds if you assume the agent’s decision is perfectly unpredictable, i.e. that there can be no correlation between the prediction and the decision. This contradicts one of the premises of Newcomb’s Paradox, which assumes an entity with exactly the power to predict the agent’s choice. Incidentally, this reduces to the (b) but not (a) from above.

By adopting my (a) but not (b) from above, i.e. Omega as a programmer and the agent as predictable code, you can easily see that P(A)+P(B) = 2, which means one-boxing code will perform the best.

Further elaboration of the above:

Imagine John, who never understood how the days of the week succeed each other. Rather, each morning, a cab arrives to take him to work if it is a work day, else he just stays at home. Omega must predict if he will go to work or not the before the cab would normally arrive. Omega knows that weekdays are generally workdays, while weekends are not, but Omega does not know the ins and outs of particular holidays such as fourth of July. Omega and John play this game each day of the week for a year.

Tallying the results, John finds that the score is as follows: P( Omega is right | I go to work) = 1.00, P( Omega is right | I do not go to work) = 0.85, which sums to 1.85. John, seeing that the sum is larger than 1.00, concludes that Omega seems to have rather good predictive power about whether he will go to work, but is somewhat short of perfect accuracy. He realizes that this has a certain significance for what bets he should take with Omega, regarding whether he will go to work tomorrow or not.

• I do not agree that a CDT must conclude that P(A)+P(B) = 1. The argument only holds if you assume the agent’s decision is perfectly unpredictable, i.e. that there can be no correlation between the prediction and the decision. This contradicts one of the premises of Newcomb’s Paradox, which assumes an entity with exactly the power to predict the agent’s choice. Incidentally, this reduces to the (b) but not (a) from above.

By adopting my (a) but not (b) from above, i.e. Omega as a programmer and the agent as predictable code, you can easily see that P(A)+P(B) = 2, which means one-boxing code will perform the best.

But that’s not CDT reasoning. CDT uses surgery instead of conditionalization, that’s the whole point. So it doesn’t look at P(prediction = A|A), but at P(prediction = A|do(A)) = P(prediction = A).

Your example with the cab doesn’t really involve a choice at all, because John’s going to work is effectively determined completely by the arrival of the cab.

• I am not sure where our disagreement lies at the moment.

Are you using choice to signify strongly free will? Because that means the hypothetical Omega is impossible without backwards causation, leaving us at (b) but not (a) and the whole of Newcomb’s paradox moot. Whereas, if you include in Newcomb’s paradox, the choice of two-boxing will actually cause the big box to be empty, whereas the choice of one-boxing will actually cause the big box to contain a million dollars by a mechanism of backwards causation, then any CDT model will solve the problem.

Perhaps we can narrow down our disagreement by taking the following variation of my example, where there is at least a bit more of choice involved:

Imagine John, who never understood why he gets thirsty. Despite there being a regularity in when he chooses to drink, this is for him a mystery. Every hour, Omega must predict whether John will choose to drink within the next hour. Omega’s prediction is made secret to John until after the time interval has passed. Omega and John play this game every hour for a month, and it turns out that while far from perfect, Omega’s predictions are a bit better than random. Afterwards, Omega explains that it beats blind guesses by knowing that John will very rarely wake up in the middle of the night to drink, and that his daily water consumption follows a normal distribution with a mean and standard deviation that Omega has estimated.

• I am not sure where our disagreement lies at the moment.

I’m not entirely sure either. I was just saying that a causal decision theorist will not be moved by Wildberger’s reasoning, because he’ll say that Wildberger is plugging in the wrong probabilities: when calculating an expectation, CDT uses not conditional probability distributions but surgically altered probability distributions. You can make that result in one-boxing if you assume backwards causation.

I think the point we’re actually talking about (or around) might be the question of how CDT reasoning relates to you (a). I’m not sure that the causal decision theorist has to grant that he is in fact interpreting the problem as “not (a) but (b)”. The problem specification only contains the information that so far, Omega has always made correct predictions. But the causal decision theorist is now in a position to spoil Omega’s record, if you will. Omega has already made a prediction, and whatever the causal decision theorist does now isn’t going to change that prediction. The fact that Omega’s predictions have been absolutely correct so far doesn’t enter into the picture. It just means that for all agents x that are not the causal decision theorist, P(x does A|Omega predicts that x does A) = 1 (and the same for B, and whatever value than 1 you might want for an imperfect predictor Omega).

About the way you intend (a), the causal decision theorist would probably say that’s backward causation and refuse to accept it.

One way of putting it might be that the causal decision theorist simply has no way of reasoning with the information that his choice is predetermined, which is what I think you intend to convey with (a). Therefore, he has no way of (hypothetically) inferring Omega’s prediction from his own (hypothetical) action (because he’s only allowed to do surgery, not conditionalization).

Are you using choice to signify strongly free will?

No, actually. Just the occurrence of a deliberation process whose outcome is not immediately obvious. In both your examples, that doesn’t happen: John’s behavior simply depends on the arrival of the cab or his feeling of thirst, respectively. He doesn’t, in a substantial sense, make a decision.

• (Thanks for discussing!)

I will address your last paragraph first. The only significant difference between my original example and the proper Newcomb’s paradox is that, in Newcomb’s paradox, Omega is made a predictor by fiat and without explanation. This allows perfect prediction and choice to sneak into the same paragraph without obvious contradiction. It seems, if I try to make the mode of prediction transparent, you protest there is no choice being made.

From Omega’s point of view, its Newcomb subjects are not making choices in any substantial sense, they are just predictably acting out their own personality. That is what allows Omega its predictive power. Choice is not something inherent to a system, but a feature of an outsider’s model of a system, in much the same sense as random is not something inherent to a Eeny, meeny, miny, moe however much it might seem that way to children.

As for the rest of our disagreement, I am not sure why you insist that CDT must work with a misleading model. The standard formulation of Newcomb’s paradox is inconsistent or underspecified. Here are some messy explanations for why, in list form:

• Omega predicts accurately, then you get to choose is a false model, because Omega has predicted you will two-box, then you get to choose does not actually let you choose; one-boxing is an illegal choice, and two-boxing the only legal choice (In Soviet Russia joke goes here)

• You get to choose, then Omega retroactively fixes the contents of the boxes is fine and CDT solves it by one-boxing

• Omega tries to predict but is just blindly guessing, then you really get to choose is fine and CDT solves it by two-boxing

• You know that Omega has perfect predictive power and are free to be committed to either one- or two-boxing as you prefer is nowhere near similar to the original Newcomb’s formulation, but is obviously solved by one-boxing

• You are not sure about Omega’s predictive power and are torn between trying to ‘game’ it and cooperating with it is not Newcomb’s problem

• Your choice has to be determined by a deterministic algorithm, but you are not allowed to know this when designing the algorithm, so you must instead work in ignorance and design it by a false dominance principle is just cheating

• Omega predicts accurately, then you get to choose is a false model, because Omega has predicted you will two-box, then you get to choose does not actually let you choose; one-boxing is an illegal choice, and two-boxing the only legal choice (In Soviet Russia joke goes here)

Not if you’re a compatibilist, which Eliezer is last I checked.

• The post scav made more or less represents my opinion here. Compatibilism, choice, free will and determinism are too many vague definitions for me to discuss with. For compatibilism to make any sort of sense to me, I would need a new definition of free will. It is already difficult to discuss how stuff is, without simultaneously having to discuss how to use and interpret words.

Trying to leave the problematic words out of this, my claim is that the only reason CDT ever gives a wrong answer in a Newcomb’s problem is that you are feeding it the wrong model. http://​​lesswrong.com/​​lw/​​gu1/​​decision_theory_faq/​​8kef elaborates on this without muddying the waters too much with the vaguely defined terms.

• I don’t think compatibilist means that you can pretend two logically mutually exclusive propositions can both be true. If it is accepted as a true proposition that Omega has predicted your actions, then your actions are decided before you experience the illusion of “choosing” them. Actually, whether or not there is an Omega predicting your actions, this may still be true.

Accepting the predictive power of Omega, it logically follows that when you one-box you will get the \$1M. A CDT-rational agent only fails on this if it fails to accept the prediction and constructs a (false) causal model that includes the incoherent idea of “choosing” something other than what must happen according to the laws of physics. Does CDT require such a false model to be constructed? I dunno. I’m no expert.

The real causal model is that some set of circumstances decided what you were going to “choose” when presented with Omega’s deal, and those circumstances also led to Omega’s 100% accurate prediction.

If being a compatibilist leads you to reject the possibility of such a scenario, then it also logically excludes the perfect predictive power of Omega and Newcomb’s problem disappears.

But in the problem as stated, you will only two-box if you get confused about the situation or you don’t want \$1M for some reason.

• “then your actions are decided before you experience the illusion of “choosing” them.”

Where’s the illusion? If I choose something according to my own preferences, why should it be an illusion merely because someone else can predict that choice if they know said preferences? Why does their knowledge of my action affect my decision-making powers?

The problem is you’re using the words “decided” and “choosing” confusingly with—different meanings at the same time. One meaning is having the final input on the action I take—the other meaning seems to be a discussion of when the output can be calculated.

The output can be calculated before I actually even insert the input, sure—but it’s still my input, and therefore my decision—nothing illusory about it, no matter how many people calculated said input in advance: even though they calculated it was I who controlled it.

• The knowledge of your future action is only knowledge if it has a probability of 1. Omega acquiring that knowledge by calculation or otherwise does not affect your choice, but it is a consequence of that knowledge being able to exist (whether Omega has it or not) that means your choice is determined absolutely.

What happens next is exactly the everyday meaning of “choosing”. Signals zap around your brain in accordance with the laws of physics and evaluate courses of action according to some neural representation of your preferences, and one course of action is the one you will “decide” to do. Soon afterwards, your conscious mind becomes aware of the decision and feels like it made it. That’s one part of the illusion of choice.

EDIT: I’m assuming you’re a human. A rational agent need not have this incredibly clunky architecture.

The second part of the illusion is specific to this very artificial problem. The counterfactual (you choose the opposite of what Omega predicted) just DOESN’T EXIST. It has probability 0. It’s not even that it could have happened in another branch of the multiverse—it is logically precluded by the condition of Omega being able to know with probability 1 what you will choose. 1 − 1 = 0.

• The knowledge of your future action is only knowledge if it has a probability of 1.

Do you think Newcomb’s Box fundamentally changes if Omega is only right with a probability of 99.9999999999999%?

Signals zap around your brain in accordance with the laws of physics and evaluate courses of action according to some neural representation of your preferences, and one course of action is the one you will “decide” to do. Soon afterwards, your conscious mind becomes aware of the decision and feels like it made it.

That process “is” my mind—there’s no mind anywhere which can be separate from those signals. So you say that my mind feels like it made a decision but you think this is false? I think it makes sense to say that my mind feels like it made a decision and it’s completely right most of the time.

My mind would be only having the “illusion” of choice if someone else, someone outside my mind, intervened between the signals and implanted a different decision, according to their own desires, and the rest of my brain just rationalized the already pretaken choice. But as long as the process is truly internal, the process is truly my mind’s—and my mind’s feeling that it made the choice corresponds to reality.

“The counterfactual (you choose the opposite of what Omega predicted) just DOESN’T EXIST.”

That the opposite choice isn’t made in any universe, doesn’t mean that the actually made choice isn’t real—indeed the less real the opposite choice, the more real your actual choice.

Taboo the word “choice”, and let’s talk about “decision-making process”. Your decision-making process exists in your brain, and therefore it’s real. It doesn’t have to be uncertain in outcome to be real—it’s real in the sense that it is actually occuring. Occuring in a deterministic manner, YES—but how does that make the process any less real?

Is gravity unreal or illusionary because it’s deterministic and predictable? No. Then neither is your decision-making process unreal or illusionary.

• Yes, it is your mind going through a decision making process. But most people feel that their conscious mind is the part making decisions and for humans, that isn’t actually true, although attention seems to be part of consciousness and attention to different parts of the input probably influences what happens. I would call that feeling of making a decision consciously when that isn’t really happening somewhat illusory.

The decision making process is real, but my feeling of there being an alternative I could have chosen instead (even though in this universe that isn’t true) is inaccurate. Taboo “illusion” too if you like, but we can probably agree to call that a different preference for usage of the words and move on.

Incidentally, I don’t think Newcomb’s problem changes dramatically as Omega’s success rate varies. You just get different expected values for one-boxing and two-boxing on a continuous scale, don’t you?

• Regarding illegal choices, the transparent variation makes it particularly clear, i.e. you can’t take both boxes if you see a million in first box, and take 1 box otherwise.

You can walk backwards from your decision to the point where a copy of you had been made, and then forward to the point where a copy is processed by the Omega, to find the relation of your decision to the box state causally.

• I agree with the content, though I am not sure if I approve of a terminology where causation traverses time like a two-way street.

• Underlying physics is symmetric in time. If you assume that the state of the world is such that one box is picked up by your arm, that imposes constraints on both the future and the past light cone. If you do not process the constraints on the past light cone then your simulator state does not adhere to the laws of physics, namely, the decision arises out of thin air by magic.

If you do process constraints fully then the action to take one box requires pre-copy state of “you” that leads to decision to pick one box, which requires money in one box; action to take 2 boxes likewise, after processing constraints, requires no money in the first box. (“you” is a black box which is assumed to be non-magical, copyable, and deterministic, for the purpose of the exercise).

edit: came up with an example. Suppose ‘you’ is a robotics controller, you know you’re made of various electrical components, you’re connected to the battery and some motors. You evaluate a counter factual where you put a current onto a wire for some time. Constraints imposed on the past: battery has been charged within last 10 hours, because else it couldn’t supply enough current. If constraints contradict known reality then you know you can’t do this action. Suppose there’s a replacement battery pack 10 meters away from the robot, the robot is unsure if 5 hours ago the packs have been swapped; in the alternative that they haven’t been, it would not have enough charge to get to the extra pack, in the alternative that they have been swapped, it doesn’t need to get to the spent extra pack. Evaluating the hypothetical where it got to the extra pack it knows the packs have been swapped in the past and extra pack is spent. (Of course for simplicity one can do all sorts of stuff, such as electrical currents coming out of nowhere, but outside the context of philosophical speculation the cause of the error is very clear).

• We do, by and large, agree. I just thought, and still think, the terminology is somewhat misleading. This is probably not a point I should press, because I have no mandate to dictate how words should be used, and I think we understand each other, but maybe it is worth a shot.

I fully agree that some values in the past and future can be correlated. This is more or less the basis of my analysis of Newcomb’s problem, and I think it is also what you mean by imposing constraints on the past light cone. I just prefer to use different words for backwards correlation and forwards causation.

I would say that the robot getting the extra pack necessitates that it had already been charged and did not need the extra pack, while not having been charged earlier would cause it to fail to recharge itself. I think there is a significant difference between how not being charged causes the robot to run out of power, versus how running out of power necessitates that is has not been charged.

You may of course argue that the future and the past are the same from the viewpoint of physics, and that either can said to cause another. However, as long as people consider the future and the past to be conceptually completely different, I do not see the hurry to erode these differences in the language we use. It probably would not be a good idea to make tomorrow refer to both the day before and the day after today, either.

I guess I will repeat: This is probably not a point I should press, because I have no mandate to dictate how words should be used.

• I’d be the first to agree on terminology here. I’m not suggesting that choice of the box causes money in the box, simply that those two are causally connected, in the physical sense. The whole issue seems to stem from taking the word ‘causal’ from causal decision theory, and treating it as more than mere name, bringing in enormous amounts of confused philosophy which doesn’t capture very well how physics work.

When deciding, you evaluate hypotheticals of you making different decisions. A hypothetical is like a snapshot of the world state. Laws of physics very often have to be run backwards from the known state to deduce past state, and then forwards again to deduce future state. E.g. a military robot sees a hand grenade flying into it’s field of view, it calculates motion backwards to find where it was thrown from, finding location of the grenade thrower, then uses model of grenade thrower to predict another grenade in the future.

So, you process the hypothetical where you picked up one box, to find how much money you get. You have the known state: you picked one box. You deduce that past state of deterministic you must have been Q which results in picking up one box, a copy of that state has been made, and that state resulted in prediction of 1 box. You conclude that you get 1 million. You do same for picking 2 boxes, the previous state must be R, etc, you conclude you get 1000 . You compare, and you pick the universe where you get 1 box.

(And with regards to the “smoking lesion” problem, smoking lesion postulates a blatant logical contradiction—it postulates that the lesion affects the choice, which contradicts that the choice is made by the agent we are speaking of. As a counter example to a decision theory, it is laughably stupid)

• Excellent.

I think laughably stupid is a bit too harsh. As I understand thing, confusion regarding Newcomb’s leads to new decision theories, which in turn makes the smoking lesion problem interesting because the new decision theories introduce new, critical weaknesses in order to solve Newcomb’s problem. I do, agree, however, that the smoking lesion problem is trivial if you stick to a sensible, CDT model.

• The problems with EDT are quite ordinary… its looking for good news, and also, it is kind of under-specified (e.g. some argue it’d two-box in Newcomb’s after learning physics). A decision theory can not be disqualified for giving ‘wrong’ answer in the hypothetical that 2*2=5 or in the hypothetical that a or not a = false, or in the hypothetical that the decision is simultaneously controlled by the decision theory, and set, without involvement of the decision theory, by the lesion (and a random process if correlation is imperfect).

• From Omega’s point of view, its Newcomb subjects are not making choices in any substantial sense, they are just predictably acting out their own personality.

I probably wasn’t expressing myself quite clearly. I think the difference is this: Newcomb subjects are making a choice from their own point of view. Your Johns aren’t really make a choice even from their internal perspective: they just see if the cab arrives/​if they’re thirsty and then without deliberation follow what their policy for such cases prescribes. I think this difference is substantial enough intuitively so that the John cases can’t be used as intuition pumps for anything relating to Newcomb’s.

The standard formulation of Newcomb’s paradox is inconsistent or underspecified.

I don’t think it is, actually. It just seems so because it presupposes that your own choice is predetermined, which is kind of hard to reason with when you’re right in the process of making the choice. But that’s a problem with your reasoning, not with the scenario. In particular, the CDT agent has a problem with conceiving of his own choice as predetermined, and therefore has trouble formulating Newcomb’s problem in a way that he can use—he has to choose between getting two-boxing as the solution or assuming backward causation, neither of which is attractive.

• Then I guess I will try to leave it to you to come up with a satisfactory example. The challenge is to include Newcomblike predictive power for Omega, but not without substantiating how Omega achieves this, while still passing your own standards of subject makes choice from own point of view. It is very easy to accidentally create paradoxes in mathematics, by assuming mutually exclusive properties for an object, and the best way to discover these is generally to see if it is possible construct or find an instance of the object described.

I don’t think it is, actually. It just seems so because it presupposes that your own choice is predetermined, which is kind of hard to reason with when you’re right in the process of making the choice. But that’s a problem with your reasoning, not with the scenario. In particular, the CDT agent has a problem with conceiving of his own choice as predetermined, and therefore has trouble formulating Newcomb’s problem in a way that he can use—he has to choose between getting two-boxing as the solution or assuming backward causation, neither of which is attractive.

This is not a failure of CDT, but one of your imagination. Here is a simple, five minute model which has no problems conceiving Newcomb’s problem without any backwards causation:

• T=0: Subject is initiated in a deterministic state which can be predicted by Omega.

• T=1: Omega makes an accurate prediction for the subject’s decision in Newcomb’s problem by magic /​ simulation /​ reading code /​ infallible heuristics. Denote the possible predictions P1 (one-box) and P2.

• T=2: Omega sets up Newcomb’s problem with appropriate box contents.

• T=3: Omega explains the setup to the subject and disappears.

• T=4: Subject deliberates.

• T=5: Subject chooses either C1 (one-box) or C2.

• T=6: Subject opens box(es) and receives payoff dependent on P and C.

You can pretend to enter this situation at T=4 as suggested by the standard Newcomb’s problem. Then you can use the dominance principle and you will lose. But this just using a terrible model. You entered at T=0, because you were needed at T=1 for Omega’s inspection. If you did not enter the situation at T=0, then you can freely make a choice C at T=5 without any correlation to P, but that is not Newcomb’s problem.

Instead, at T=4 you become aware of the situation, and your decision making algorithm must return a value for C. If you consider this only from T=4 and onward, this is completely uninteresting, because C is already determined. At T=1, P was determined to either P1 or P2, and the value of C follow directly from this. Obviously, healthy one-boxing code wins and unhealthy two-boxing code loses, but there is no choice being made here, just different code with different return values being rewarded differently, and that is not Newcomb’s problem either.

Finally, we will work under illusion of choice with Omega as a perfect predictor. We realize that T=0 is the critical moment, seeing as all subsequent T follows directly from this. We work backwards as follows:

• T=6: My preferences are P1C2 > P1C1 > P2C2 > P2C1.

• T=5: I should choose either C2 or C1 depending on the current value of P.

• T=4: this is when all this introspection is happening

• T=3: this is why

• T=2: I would really like there to be a million dollars present.

• T=1: I want Omega to make prediction P1.

• T=0: Whew, I’m glad I could do all this introspection which made me realize that I want P1 and the way to achieve this is C1. It would have been terrible if my decision making just worked by the dominance principle. Luckily, the epiphany I just had, C1, was already predetermined at T=0, Omega would have been aware of this at T=1 and made the prediction P1, so (...) and P1 C1 = a million dollars is mine.

Shorthand version of all the above; if the decision is necessarily predetermined before T=4, then you should not pretend you make the decision at T=4. Insert a decision making step at T=0.5, which causally determines the value of P and C. Apply your CDT to this step.

This is the only way of doing CDT honestly, and it is the slightest bit messy, but that is exactly what happens when you create a reference to the decision the decision theory is going to make in the future in the problem itself with perfect correlation to the decision before the decision has overtly been made. This sort of self reference creates impossibilities out of the thin air every day of week, such as when Pinocchio says my nose will grow now. The good news is that this way of doing it is a lot less messy than inventing a new, superfluous decision theory, and it also allows you to deal with problems like the psychopath button without any trouble whatsoever.

• But isn’t this precisely the basic idea behind TDT?

The algorithm you are suggesting goes something like this: Chose that action which, if it had been predetermined at T=0 that you would take it, would lead to the maximal-utility outcome. You can call that CDT, but it isn’t. Sure, it’ll use causal reasoning for evaluating the counterfactual, but not everything that uses causal reasoning is CDT. CDT is surgically altering the action node (and not some precommitment node) and seeing what happens.

• If you take a careful look at the model, you will realize that the agent has to be precommited, in the sense that what he is going to do is already fixed. Otherwise, the step at T=1 is impossible. I do not mean that he has precommited himself consciously to win at Newcomb’s problem, but trivially, a deterministic agent must be precommited.

It is meaningless to apply any sort of decision theory to a deterministic system. You might as well try to apply decision theory to the balls in a game of billiards, which assign high utility to remaining on the table but have no free choices to make. For decision theory to have a function, there needs to be a choice to be made between multiple, legal options.

As far as I have understood, your problem is that, if you apply CDT with an action node at T=4, it gives the wrong answer. At T=4, there is only one option to choose, so the choice of decision theory is not exactly critical. If you want to analyse Newcomb’s problem, you have to insert an action node at T<1, while there is still a choice to be made, and CDT will do this admirably.

• As far as I have understood, your problem is that, if you apply CDT with an action node at T=4, it gives the wrong answer. At T=4, there is only one option to choose, so the choice of decision theory is not exactly critical.

Yes, it is. The point is that you run your algorithm at T=4, even if it is deterministic and therefore its output is already predetermined. Therefore, you want an algorithm that, executed at T=4, returns one-boxing. CDT does simply not do that.

Ultimately, it seems that we’re disagreeing about terminology. You’re apparently calling something CDT even though it does not work by surgically altering the node for the action under consideration (that action being the choice of box, not the precommitment at T<1) and then looking at the resulting expected utilities.

• If you apply CDT at T=4 with a model which builds in the knowledge that the choice C and the prediction P are perfectly correlated, it will one-box. The model is exceedingly simple:

• T’=0: Choose either C1 or C2

• T’=1: If C1, then gain 1000. If C2, then gain 1.

This excludes the two other impossibilities, C1P2 and C2P1, since these violate the correlation constraint. CDT makes a wrong choice when these two are included, because then you have removed the information of the correlation constraint from the model, changing the problem to one in which Omega is not a predictor.

What is your problem with this model?

• Okay, so I take it to be the defining characteristic of CDT that it uses of counterfactuals. So far, I have been arguing on the basis of a Pearlean conception of counterfactuals, and then this is what happens:

Your causal network has three variables, A (the algorithm used), P (Omega’s prediction), C (the choice). The causal connections are A → P and A → C. There is no causal connection between P and C.

Now the CDT algorithm looks at counterfactuals with the antecedent C1. In a Pearlean picture, this amounts to surgery on the C-node, so no inference contrary to the direction of causality is possible. Hence, whatever the value of the P-node, it will seem to the CDT algorithm not to depend on the choice.

Therefore, even if the CDT algorithm knows that its choice is predetermined, it cannot make use of that in its decision, because it cannot update contrary to the direction of causality.

Now it turns out that natural language counterfactuals work very much, but not quite like Pearl’s counterfactuals: they allow a limited amount of backtracking contrary to the direction of causality, depending on a variety of psychological factors. So if you had a theory of counterfactuals that allowed backtracking in a case like Newcomb’s problem, then a CDT-algorithm employing that conception of counterfactuals would one-box. The trouble would of course be to correctly state the necessary conditions for backtracking. The messy and diverse psychological and contextual factors that seem to be at play in natural language won’t do.

• Could you try to maybe give a straight answer to, what is your problem with my model above? It accurately models the situation. It allows CDT to give a correct answer. It does not superficially resemble the word for word statement of Newcomb’s problem.

Therefore, even if the CDT algorithm knows that its choice is predetermined, it cannot make use of that in its decision, because it cannot update contrary to the direction of causality.

You are trying to use a decision theory to determine which choice an agent should make, after the agent has already had its algorithm fixed, which causally determines which choice the agent must make. Do you honestly blame that on CDT?

• Could you try to maybe give a straight answer to, what is your problem with my model above? It accurately models the situation. It allows CDT to give a correct answer.

No, it does not, that’s what I was trying to explain. It’s what I’ve been trying to explain to you all along: CDT cannot make use of the correlation between C and P. CDT cannot reason backwards in time. You do know how surgery works, don’t you? In order for CDT to use the correlation, you need a causal arrow from C to P—that amounts to backward causation, which we don’t want. Simple as that.

You are trying to use a decision theory to determine which choice an agent should make, after the agent has already had its algorithm fixed, which causally determines which choice the agent must make.

I’m not sure what the meaning of this is. Of course the decision algorithm is fixed before it’s run, and therefore its output is predetermined. It just doesn’t know its own output before it has computed it. And I’m not trying to figure out what the agent should do—the agent is trying to figure that out. Our job is to figure out which algorithm the agent should be using.

PS: The downvote on your post above wasn’t from me.

• You are applying a decision theory to the node C, which means you are implicitly stating: there are multiple possible choices to be made at this point, and this decision can be made independent of nodes not in front of this one. This means that your model does not model the Newcomb’s problem we have been discussing—it models another problem, where C can have values independent of P, which is indeed solved by two-boxing.

It is not the decision theory’s responsibility to know that the values of node C is somehow supposed to retrospectively alter the state of the branch the decision theory is working in. This is, however,a consequence of the modelling you do. You are on purpose applying CDT too late in your network, such that P and thus the cost of being a two-boxer has gone over the horizon and such that the node C must affect P backwards, not because the problem actually contains backwards causality, but because you want to fix the value of nodes in the wrong order.

If you do not want to make the assumption of free choice at C, then you can just not promote it to an action node. If the decision at C is casually determined from A, then you can apply a decision theory at node A and follow the causal inference. Then you will, once again, get a correct answer from CDT, this time for the version of Newcomb’s problem where A and C are fully correlated.

If you refuse to reevaluate your model, then we might as well leave it at this. I do agree that if you insist on applying CDT at C in your model, then it will two-box. I do not agree that this is a problem.

• You don’t promote C to the action node, it is the action node. That’s the way the decision problem is specified: do you one-box or two-box? If you don’t accept that, then you’re talking about a different decision problem. But in Newcomb’s problem, the algorithm is trying to decide that. It’s not trying to decide which algorithm it should be (or should have been). Having the algorithm pretend—as a means of reaching a decision about C—that it’s deciding which algorithm to be is somewhat reminiscent of the idea behind TDT and has nothing to do with CDT as traditionally conceived of, despite the use of causal reasoning.

• In AI, you do not discuss it in terms of anthropomorphic “trying to decide”. For example, there’s a “Model based utility based agent” . Computing what the world will be like if a decision is made in a specific way is part of the model of the world, i.e. part of the laws of physics as the agent knows them. If this physics implements the predictor at all, model-based utility-based agent will one-box.

• I don’t see at all what’s wrong or confusing about saying that an agent is trying to decide something; or even, for that matter, that an algorithm is trying to decide something, even though that’s not a precise way of speaking.

More to the point, though, doesn’t what you describe fit EDT and CDT both, with each theory having a different way of computing “what the world will be like if the decision is made in a specific way”?

• Decision theories do not compute what the world will be like. Decision theories select the best choice, given a model with this information included. How the world works is not something a decision theory figures out, it is not a physicist and it has no means to perform experiments outside of its current model. You need take care of that yourself, and build it into your model.

If a decision theory had the weakness that certain, possible scenarios could not be modeled, that would be a problem. Any decision theory will have the feature that they work with the model they are given, not with the model they should have been given.

• Causality is under specified, whereas the laws of physics are fairly well defined, especially for a hypothetical where you can e.g. assume deterministic Newtonian mechanics for sake of simplifying the analysis. You have the hypothetical: sequence of commands to the robotic manipulator. You process the laws of physics to conclude that this sequence of commands picks up one box of unknown weight. You need to determine weight of the box to see if this sequence of commands will lead to the robot tipping over. Now, you see, to determine that sort of thing, models of physical world tend to walk backwards and forwards in time: for example if your window shatters and a rock flies in, you can conclude that there’s a rock thrower in the direction that the rock came from, and you do it by walking backwards in time.

• So it’s basically EDT, where you just conditionalize on the action being performed?

• In a way, albeit it does not resemble how EDT tends to be presented.

On the CDT, formally speaking, what do you think P(A if B) even is? Keep in mind that given some deterministic, computable laws of physics, given that you ultimately decide an option B, in the hypothetical that you decide an option C where C!=B , it will be provable that C=B , i.e. you have a contradiction in the hypothetical.

• In a way, albeit it does not resemble how EDT tends to be presented.

So then how does it not fall prey to the problems of EDT? It depends on the precise formalization of “computing what the world will be like if the action is taken, according to the laws of physics”, of course, but I’m having trouble imagining how that would not end up basically equivalent to EDT.

On the CDT, formally speaking, what do you think P(A if B) even is?

That is not the problem at all, it’s perfectly well-defined. I think if anything, the question would be what CDT’s P(A if B) is intuitively.

• So then how does it not fall prey to the problems of EDT?

What are those, exactly? The “smoking lesion”? It specifies that output of decision theory correlates with lesion. Who knows how, but for it to actually correlate with decision of that decision theory other than via the inputs to decision theory, it got to be our good old friend Omega doing some intelligent design and adding or removing that lesion. (And if it does through the inputs, then it’ll smoke).

That is not the problem at all, it’s perfectly well-defined.

Given world state A which evolves into world state B (computable, deterministic universe), the hypothetical “what if world state A evolved into C where C!=B” will lead, among other absurdities, to a proof that B=C contradicting that B!=C (of course you can ensure that this particular proof won’t be reached with various silly hacks but you’re still making false assumptions and arriving at false conclusions). Maybe what you call ‘causal’ decision theory should be called ‘acausal’ because it in fact ignores causes of the decision, and goes as far as to break down it’s world model to do so. If you don’t do contradictory assumptions, then you have a world state A that evolves into world state B, and world state A’ that evolves into world state C, and in the hypothetical that the state becomes C!=B, the prior state got to be A’!=A . Yeah, it looks weird to westerners with their philosophy of free will and your decisions having the potential to send the same world down a different path. I am guessing it is much much less problematic if you were more culturally exposed to determinism/​fatalism. This may be a very interesting topic, within comparative anthropology.

The main distinction between philosophy and mathematics (or philosophy done by mathematicians) seem to be that in the latter, if you get yourself a set of assumptions leading to contradictory conclusions (example: in Newcomb’s on one hand it can be concluded that agents which 1 box walk out with more money, on the other hand , agents that choose to two-box get strictly more money than those that 1-box), it is generally concluded that something is wrong with the assumptions, rather than argued which of the conclusions is truly correct given the assumptions.

• The values of A, C and P are all equivalent. You insist on making CDT determine C in a model where it does not know these are correlated. This is a problem with your model.

• You are applying a decision theory to the node C, which means you are implicitly stating: there are multiple possible choices to be made at this point, and this decision can be made independent of nodes not in front of this one.

Yes. That’s basically the definition of CDT. That’s also why CDT is no good. You can quibble about the word but in “the literature”, ‘CDT’ means just that.

• Well, a practically important example is a deterministic agent which is copied and then copies play prisoner’s dilemma against each other.

There you have agents that use physics. Those, when evaluating hypothetical choices, use some model of physics, where an agent can model itself as a copyable deterministic process which it can’t directly simulate (i.e. it knows that the matter inside it’s head obeys known laws of physics). In the hypothetical that it cooperates, after processing the physics, it is found that copy cooperates, in the hypothetical that it defects, it is found that copy defects.

And then there’s philosophers. The worse ones don’t know much about causality. They presumably have some sort of ill specified oracle that we don’t know how to construct, which will tell them what is a ‘consequence’ and what is a ‘cause’ , and they’ll only process the ‘consequences’ of the choice as the ‘cause’. This weird oracle tells us that other agent’s choice is not a ‘consequence’ of the decision, so it can not be processed. It’s very silly and not worth spending brain cells on.

• Playing prisoner’s dilemma against a copy of yourself is mostly the same problem as Newcomb’s. Instead of Omega’s prediction being perfectly correlated with your choice, you have an identical agent whose choice will be perfectly correlated with yours—or, possibly, randomly distributed in the same manner. If you can also assume that both copies know this with certainty, then you can do the exact same analysis as for Newcomb’s problem.

Whether you have a prediction made by an Omega or a decision made by a copy really does not matter, as long as they both are automatically going to be the same as your own choice, by assumption in the problem statement.

• The copy problem is well specified, though. Unlike the “predictor”. I clarified more in private. The worst part about Newcomb’s is that all the ex religious folks seem to substitute something they formerly knew as ‘god’ for predictor. The agent can also be further specified; e.g. as a finite Turing machine made of cogs and levers and tape with holes in it. The agent can’t simulate itself directly, of course, but it knows some properties of itself without simulation. E.g. it knows that in the alternative that it chooses to cooperate, it’s initial state was in set A—the states that result in cooperation, in the alternative that it chooses to defect, it’s initial state was in set B—the states that result in defection, and that no state is in both sets.

• I’m with incogn on this one: either there is predictability or there is choice; one cannot have both.

Incogn is right in saying that, from omega’s point of view, the agent is purely deterministic, i.e. more or less equivalent to a computer program. Incogn is slightly off-the-mark in conflating determinism with predictability: a system can be deterministic, but still not predictable; this is the foundation of cryptography. Deterministic systems are either predictable or are not. Unless Newcombs problem explicitly allows the agent to be non-deterministic, but this is unclear.

The only way a deterministic system becomes unpredictable is if it incorporates a source of randomness that is stronger than the ability of a given intelligence to predict. There are good reasons to believe that there exist rather simple sources of entropy that are beyond the predictive power of any fixed super-intelligence—this is not just the foundation of cryptography, but is generically studied under the rubric of ‘chaotic dynamical systems’. I suppose you also have to believe that P is not NP. Or maybe I should just mutter ‘Turing Halting Problem’. (unless omega is taken to be a mythical comp-sci “oracle”, in which case you’ve pushed decision theory into that branch of set theory that deals with cardinal numbers larger than the continuum, and I’m pretty sure you are not ready for the dragons that lie there.)

If the agent incorporates such a source of non-determinism, then omega is unable to predict, and the whole paradox falls down. Either omega can predict, in which case EDT, else omega cannot predict, in which case CDT. Duhhh. I’m sort of flabbergasted, because these points seem obvious to me … the Newcomb paradox, as given, seems poorly stated.

• either there is predictability or there is choice

Think of real people making choices and you’ll see it’s the other way around. The carefully chosen paths are the predictable ones if you know the variables involved in the choice. To be unpredictable, you need think and choose less.

Hell, the archetypical imagery of someone giving up on choice is them flipping a coin or throwing a dart with closed eyes—in short resorting to unpredictability in order to NOT choose by themselves.

• I’m with incogn on this one: either there is predictability or there is choice; one cannot have both.

Either your claim is false or you are using a definition of at least one of those two words that means something different to the standard usage.

• I do not think the standard usage is well defined, and avoiding these terms altogether is not possible, seeing as they are in the definition of the problem we are discussing.

Interpretations of the words and arguments for the claim are the whole content of the ancestor post. Maybe you should start there instead of quoting snippets out of context and linking unrelated fallacies? Perhaps, by specifically stating the better and more standard interpretations?

• Huh? Can you explain? Normally, one states that a mechanical device is “predicatable”: given its current state and some effort, one can discover its future state. Machines don’t have the ability to choose. Normally, “choice” is something that only a system possessing free will can have. Is that not the case? Is there some other “standard usage”? Sorry, I’m a newbie here, I honestly don’t know more about this subject, other than what i can deduce by my own wits.

• Machines don’t have the ability to choose.

Machines don’t have preferences, by which I mean they have no conscious self-awareness of a preferred state of the world—they can nonetheless execute “if, then, else” instructions.

That such instructions do not follow their preferences (as they lack such) can perhaps be considered sufficient reason to say that machines don’t have the ability to choose—that they’re deterministic doesn’t… “Determining something” and “Choosing something” are synonyms, not opposites after all.

• Newcomb’s problem makes the stronger precondition that the agent is both predictable and that in fact one action has been predicted. In that specific situation, it would be hard to argue against that one action being determined and immutable, even if in general there is debate about the relationship between determinism and predictability.

• Hmm, the FAQ, as currently worded, does not state this. It simply implies that the agent is human, that omega has made 1000 correct predictions, and that omega has billions of sensors and a computer the size of the moon. That’s large, but finite. One may assign some finite complexity to Omega—say 100 bits per atom times the number of atoms in the moon, whatever. I believe that one may devise pseudo-random number generators that can defy this kind of compute power. The relevant point here is that Omega, while powerful, is still not “God” (infinite, infallible, all-seeing), nor is it an “oracle” (in the computer-science definition of an “oracle”: viz a machine that can decide undecidable computational problems).

• I do not want to make estimates on how and with what accuracy Omega can predict. There is not nearly enough context available for this. Wikipedia’s version has no detail whatsoever on the nature of Omega. There seems to be enough discussion to be had, even with the perhaps impossible assumption that Omega can predict perfectly, always, and that this can be known by the subject with absolute certainty.

• I think I agree, by and large, despite the length of this post.

Whether choice and predictability are mutually exclusive depends on what choice is supposed to mean. The word is not exactly well defined in this context. In some sense, if variable > threshold then A, else B is a choice.

I am not sure where you think I am conflating. As far as I can see, perfect prediction is obviously impossible unless the system in question is deterministic. On the other hand, determinism does not guarantee that perfect prediction is practical or feasible. The computational complexity might be arbitrarily large, even if you have complete knowledge of an algorithm and its input. I can not really see the relevance to my above post.

Finally, I am myself confused as to why you want two different decision theories (CDT and EDT) instead of two different models for the two different problems conflated into the single identifier Newcomb’s paradox. If you assume a perfect predictor, and thus full correlation between prediction and choice, then you have to make sure your model actually reflects that.

Let’s start out with a simple matrix, P/​C/​1/​2 are shorthands for prediction, choice, one-box, two-box.

• P1 C1: 1000

• P1 C2: 1001

• P2 C1: 0

• P2 C2: 1

If the value of P is unknown, but independent of C: Dominance principle, C=2, entirely straightforward CDT.

If, however, the value of P is completely correlated with C, then the matrix above is misleading, P and C can not be different and are really only a single variable, which should be wrapped in a single identifier. The matrix you are actually applying CDT to is the following one:

• (P&C)1: 1000

• (P&C)2: 1

The best choice is (P&C)=1, again by straightforward CDT.

The only failure of CDT is that it gives different, correct solutions to different, problems with a properly defined correlation of prediction and choice. The only advantage of EDT is that it is easier to cheat in this information without noticing it—even when it would be incorrect to do so. It is entirely possible to have a situation where prediction and choice are correlated, but the decision theory is not allowed to know this and must assume that they are uncorrelated. The decision theory should give the wrong answer in this case.

• Either omega can predict, in which case EDT, else omega cannot predict, in which case CDT. Duhhh.

If Omega cannot predict, TDT will two-box.

I like how he just brute forces the problem with (simple) mathematics, but I am not sure if it is a good thing to deal with a paradox without properly investigating why it seems to be a paradox in the first place. It is sort of like saying that this super convincing card trick you have seen, there is actually no real magic involved without taking time to address what seems to require magic and how it is done mundanely.

• I think this is a very clear account of the issues with these problems.

I like your explanations of how correct model choice leads to CDT getting it right all the time; similarly it seems correct model choice should let EDT get it right all the time. In this light CDT and EDT are really heuristics for how to make decisions with simplified models.

• Thanks for your post, it was a good summary of decision theory basics. Some corrections:

In the Allais paradox, choice (2A) should be “A 34% chance of 24,000\$ and a 66% chance of nothing” (now 27,000\$).

A typo in title 10.3.1., the title should probably be “Why should degrees of belief follow the laws of probability?”.

In 11.1.10. Prisoner’s dilemma, the Resnik quotation mentions a twenty-five year term, yet the decision matrix has “20 years in jail” as an outcome.

• Also,

Experiments have shown that many people prefer (1A) to (1B) and (2B) to (2A)...So independence implies that anyone that prefers (1A) to (1B) must also prefer (2B) to (2A).

Shouldn’t independence have people who prefer (1A) to (1B) prefer (2A) to (2B)?

• Thanks. Fixed for the next update of the FAQ.

• Also,

Experiments have shown that many people prefer (1A) to (1B) and (2B) to (2A)...So independence implies that anyone that prefers (1A) to (1B) must also prefer (2B) to (2A).

Shouldn’t independence have people who prefer (1A) to (1B) prefer (2A) to (2B)?

EDIT:

But because the direct approach is very recent (Peterson 2008; Cozic 2011), and only time will show whether it can stand up to professional criticism.

Either the word “because” or “and” is out of place here.

I only notice these things because this FAQ is great and I’m trying to understand every detail that I can.

• Thanks Pinyaka, changed for next edit (and glad to hear you’re finding it useful).

• Typo at 11.4:

Ahmed and Price (2012) think it does not adequately address to the challenge

• Easy explanation for the Ellsberg Paradox: We humans treat the urn as if it was subjected to two kinds of uncertainties.

• The first kind is which ball I will actually draw. It feels “truly random”.

• The second kind is how many red (and blue) balls there actually are. This one is not truly random.

Somehow, we prefer to chose the “truly random” option. I think I can sense why: when it’s “truly random”, I know no potentially hostile agent messed up with me. I mean, I could chose “red” in situation A, but then the organizers could have put 60 blue balls just to mess with me!

Put it simply, choosing “red” opens me up for external sentient influence, and therefore risk being outsmarted. This particular risk aversion sounds like a pretty sound heuristic.

• Yes, exactly, and in our modern marketing-driven culture, one almost expects to be gamed by salesmen or sneaky game-show hosts. In this culture, its a prudent, even ‘rational’ response.

• What about mentioning the St. Petersburg paradox? This is a pretty striking issue for EUM, IMHO.

• I concur. Plus, the St. Petersburg paradox was the impetus for Daniel Bernoulli’s invention of the concept of utility.

• The St Petersburg paradox actually sounds to me a lot like Pascal’s Mugging. That is, you are offered a very small chance at a very large amount of utility, (or in the case of Pascal Mugging, of not loosing a large amount of utility), with a very high expected value if you accept the deal, but because the deal has such a low chance of paying out, a smart person will turn it down, despite that having less expected value than accepting.

• I’m finding the “counterfactual mugging” challenging. At this point, the rules of the game seem to be “design a thoughtless, inert, unthinking algorithm, such as CDT or EDT or BT or TDT, which will always give the winning answer.” Fine. But for the entire range of Newcomb’s problems, we are pitting this dumb-as-a-rock algo against a super-intelligence. By the time we get to the counterfactual mugging, we seem to have a scenario where omega is saying “I will reward you only if you are a trusting rube who can be fleeced.” Now, if you are a trusting rube who can be fleeced, then you can be pumped, a la the pumping examples in previous sections: how many times will omega ask you for \$100 before you wisen up and realize that you are being extorted?

This shift of focus to pumping also shows up in the Prisoner’s dilemma, specifically, the recent results from Freeman Dyson & William Press. They point out that an intelligent agent can extort any evolutionary algorithm. Basically, if you know the zero-determinant strategy, and your opponent doesn’t, than you can mug the opponent (repeatedly). I think the same applies for the counterfactual mugging: omega has a “theory of mind”, while the idiot decision algo fails to have one. If your decision algo tries to learn from history (i.e. from repeated muggings), using basic evolutionary algo’s, then it will continue to be mugged (forever): it can’t win.

To borrow Press & Dyson’s vocabulary: if you want to have an algorithmic decision theory that can win in the presence of (super-)intelligences, then you must endow that algorithm with a “theory of mind”: you’re algorithm has got to start modelling omega, to determine what its actions will be.

• VNM utility isn’t any of the types you listed. Ratios (a-b)/​|c-d| of VNM utilities aren’t meaningful, only ratios (a-b)/​|c-b|.

• I think I’m missing the point of what you’re saying here so I was hoping that if I explained why I don’t understand, perhaps you could clarify.

VNM-utility is unique up to a positive linear transformation. When a utility function is unique up to a positive linear transformation, it is an interval (/​cardinal scale). So VNM-utility is an interval scale.

This is the standard story about VNM-utility (which is to say, I’m not claiming this because it seems right to me but rather because this is the accepted mainstream view of VNM-utility). Given that this is a simple mathematical property, I presume the mainstream view will be correct.

So if your comment is correct in terms of the presentation in the FAQ then either we’ve failed to correctly define VNM-utility or we’ve failed to correctly define interval scales in accordance with the mainstream way of doing so (or, I’ve missed something). Are you able to pinpoint which of these you think has happened?

One final comment. I don’t see why ratios (a-b)/​|c-d| aren’t meaningful. For these to be meaningful, it seems to me that it would need to be that [(La+k)-(Lb+k)]/​[(Lc+k)-(Ld+k)] = (a-b)/​(c-d) for all L and K (as VNM-utilities are unique up to a positive linear transformation) and it seems clear enough that this will be the case:

[(La+k)-(Lb+k)]/​[(Lc+k)-(Ld+k)] = [L(a-b)]/​[L(c-d)] = (a-b)/​(c-d)

Again, could you clarify what I’m missing (I’m weaker on axiomatizations of decision theory than I am on other aspects of decision theory and you’re a mathematician so I’m perfectly willing to accept that I’m missing something but it’d be great if you could explain what it is)?

• Oops, you are absolutely right. (a-b)/​|c-d| is meaningful after all. Not sure why I failed to notice that. Thanks for pointing that out.

• I would *really* appreciate any help from lesswrong readers in helping me understand something really basic about the standard money pump argument for transitivity of preferences.

So clearly there can be situations, like in a game of Rock Scissors Paper (or games featuring non-transitive dice, like ‘Efron’s dice’) where faced with pairwise choices it seems rational to have non-transitive preferences. And it could be that these non-transitive games/​situations pay out money (or utility or whatever) if you make the right choice.

But so then if these kinds of non-transitive games/​situations are paying out money (or utility or whatever) I don’t quite see how the standard money pump considerations apply? Sure, I might pay some amount to have Rock over Scissors, and some small amount to have Scissors over paper, and some small amount to have Paper over Rock, etc. But if each time I am also *gaining* at least as much money by making these choices, then I am not being turned into a money pump.

So this seems like a really simple counter-example- i.e. far too simple!! – where having non-transitive preferences seems rational and also financially advantageous What am I missing? I realise I almost certainly have some totally basic and stupid misunderstanding of what the money pump argument is supposed to show.

• Rock paper scissors isn’t an example of nontransitive preferences. Consider Alice playing the game against Bob. It is not the case that Alice prefers playing rock to playing scissors, and playing scissors to playing paper, and playing paper to playing rock. Why on Earth would she have preferences like that? Instead, she prefers to choose among rock, paper and scissors with certain probabilities that maximize her chance of winning against Bob.

• Yes I phrased my point totally badly and unclearly.

Forget Rock Scissors paper—suppose team A loses to team B, B loses to C and C loses to A. Now you have the choice to bet on team A or team B to win/​lose \$1 - you choose B. Then you have the choice between B and C—you choose C. Then you have the choice between C and A—you choose A. And so on. Here I might pay anything less than \$1 in order to choose my preferred option each time. If we just look at what I am prepared to pay in order to make my pairwise choices then it seems I have become a money pump. But of course once we factor in my winning \$1 each time then I am being perfectly sensible.

So my question is just – how come this totally obvious point is not a counter-example to the money pump argument that preferences ought always to be transitive? For there seem to be situations where having cyclical preferences can pay out?

• These are decisions in different situations. Transitivity of preference is about a single situation. There should be three possible actions A, B and C that can be performed in a single situation, with B preferred to A and C preferred to B. Transitivity of preference says that C is then preferred to A in that same situation. Betting on a fight of B vs. A is not a situation where you could also bet on C, and would prefer to bet on C over betting on B.

• Also—if we have a set of 3 non-transitive dice, and I just want to roll the highest number possible, then I can prefer A to B, B to C and C to A, where all 3 dice are available to roll in the same situation.

If I get paid depending on how high a number I roll, then this would seem to prevent me from becoming a money pump over the long term.

I have never seen that kind of restriction to a single choice-situation mentioned before when transitivity is presented. E.g. there is nothing like that, as far as I can see, in Peterson’s Decision theory textbook, nor in Bonano’s presentation of transitivity in his online Textbook ‘Decision Making’. All the statements of transitivity I have read just require that if a is preferred to b in a pairwise comparison, and b is preferred to c in a pairwise comparison, then a is also preferred to c in a pairwise comparison. There is no further clause requiring that a, b, and c are all simultaneously available in a single situation.

• Presentation of Newcomb’s problem in section 11.1.1. seems faulty. What if the human flips a coin to determine whether to one-box or two-box? (or any suitable source of entropy that is beyond the predictive powers of the super-intelligence.) What happens then?

This point is danced around in the next section, but never stated outright: EDT provides exactly the right answer if humans are fully deterministic and predictable by the superintelligence. CDT gives the right answer if the human employs an unpredictable entropy source in their decision-making. It is the entropy source that makes the decision acausal from the acts of the super-intelligence.

• Presentation of Newcomb’s problem in section 11.1.1. seems faulty. What if the human flips a coin to determine whether to one-box or two-box? (or any suitable source of entropy that is beyond the predictive powers of the super-intelligence.) What happens then?

If the FAQ left this out then it is indeed faulty. It should either specify that if Omega predicts the human will use that kind of entropy then it gets a “Fuck you” (gets nothing in the big box, or worse) or, at best, that Omega awards that kind of randomization with a proportional payoff (ie. If behavior is determined by a fair coin then the big box contains half the money.)

This is a fairly typical (even “Frequent”) question so needs to be included in the problem specification. But it can just be considered a minor technical detail.

• This response challenges my intuition, and I would love to learn more about how the problem formulation is altered to address the apparent inconsistency in the case that players make choices on the basis of a fair coin flip. See my other post.

• It should either specify that if Omega predicts the human will use that kind of entropy then it gets a “Fuck you” (gets nothing in the big box, or worse) or, at best, that Omega awards that kind of randomization with a proportional payoff (ie. If behavior is determined by a fair coin then the big box contains half the money.)

Or that Omega is smart enough to predict any randomizer you have available.

• The FAQ states that omega has/​is a computer the size of the moon—that’s huge but finite. I believe its possible, with today’s technology, to create a randomizer that an omega of this size cannot predict. However smart omega is, one can always create a randomizer that omega cannot break.

• OK, but this can’t be a “minor detail”, its rather central to the nature of the problem. The back-n-forth with incogn above tries to deal with this. Put simply, either omega is able to predict, in which case EDT is right, or omega is not able to predict, in which case CDT is right.

The source of entropy need not be a fair coin: even fully deterministic systems can have a behavior so complex that predictability is untenable. Either omega can predict, and knows it can predict, or omega cannot predict, and knows that it cannot predict. The possibility that it cannot predict, yet is erroneously convinced that it can, seems ridiculous.

• Small correction, Arntzenius name has a Z (that paper is great by the way, I sent it to Yudkwosky a while ago).

There is a compliment true of both this post and of that paper, they are both very well condensed. Congratulations Luke and crazy88!

• Thanks. Will be fixed in next update. Thanks also for the positive comment.

• In the VNM system, utility is defined via preferences over acts rather than preferences over outcomes. To many, it seems odd to define utility with respect to preferences over risky acts. After all, even an agent who thinks she lives in a world where every act is certain to result in a known outcome could have preferences for some outcomes over others. Many would argue that utility should be defined in relation to preferences over outcomes or world-states, and that’s not what the VNM system does. (Also see section 9.)

It’s misleading to associate acts with lotteries over outcomes like that. In most situations, there are lotteries over outcomes that are not achievable by any act. And “lottery” makes it very clear that the probabilities of each outcome that could result are known, whereas “act” does not.

• My understanding is that in the VNM system, utility is defined over lotteries. Is this the point you’re contesting or are you happy with that but unhappy with the use of the word “acts” to describe these lotteries. In other words, do you think the portrayal of the VNM system as involving preferences over lotteries is wrong or do you think that this is right but the way we describe it conflates two notions that should remain distinct.

• The problem is with the word “acts”. Some lotteries might not be achievable by any act, so this phrasing makes it sound like the VNM only applies to the subset of lotteries that is actually possible to achieve. And I realize that you’re using the word “act” more specifically than this, but typically, people consider doing the same thing in a different context to be the same “act”, even though its consequences may depend on the context. So when I first read the paragraph I quoted after only skimming the rest, it sounded like it was claiming that the VNM system can only describe deontological preferences over actions that don’t take context into account, which is, of course, ridiculous.

Also, while it is true that the VNM system defines utility over lotteries, it is fairly trivial to modify it to use utility over outcomes (see first section of this post)

• Thanks for the clarification.

Perhaps worth noting that earlier in the document we defined acts as functions from world states to outcomes so this seems to resolve the second concern somewhat (if the context is different then presumably this is represented by the world states being different and so there will be different functions in play and hence different acts).

In terms of the first concern, while VNM may define preferences over all lotteries, there’s a sense where in any specific decision scenario, VNM is only appealed to in order to rank the achievable lotteries and not all of them. Of course, however, it’s important to note as well that this is only part of the story.

Anyway, I changed this for the next update so as to improve clarity.

• Perhaps worth noting that earlier in the document we defined acts as functions from world states to outcomes so this seems to resolve the second concern somewhat.

What? That’s what I thought “acts” meant the first time, before I read the document more thoroughly and decided that you must mean that acts are lotteries. If you are using “act” to refer to functions from world states to outcomes, then the statement that the VNM system only applies to acts is simply false, rather than misleading.

• Okay, so I’ve been reading over Peterson’s book An Introduction to Decision Theory and he uses much the same language as that used in the FAQ with one difference: he’s careful to talk about risky acts rather than just acts (when he talks about VNM, I mean, he does simply talk about acts at some other point). This seems to be a pretty common way of talking about it (people other than Peterson use this language).

Anyway, Peterson explicitly defines a “lottery” as an act (which he defines as a function from world states to outcomes) whose outcome is risky (which is to say, is determined randomly but with known probability) [I presume by the act’s outcome he means the outcome that will actually occur if that act is selected].

Would including something more explicit like this resolve your concerns or do you think that Peterson does things wrong as well (or do you think I’m misunderstanding what Peterson is doing)?

• Either Peterson does things wrong, you’re misunderstanding Peterson, or I’m misunderstanding you. When I have time, I’ll look at that book to try to figure out which, unless you manage to sort things out for me before I get to it.

• Some quotes might help.

Peterson defines an act “as a function from a set of states to a set of outcomes”

The rest of the details are contained in this quote: “The key idea in von Neumann and Morgenstern’s theory is to ask the decision maker to state a set of preferences over risky acts. These acts are called lotteries, because the outcome of each act is assumed to be randomly determined by events (with known probabilities) that cannot be controlled by the decision maker”.

The terminology of risky acts is more widespread than Peterson: http://​​staff.science.uva.nl/​​~stephane/​​Teaching/​​UncDec/​​vNM.pdf

However, I don’t particularly see the need to get caught up in the details of what some particular people said: mostly I just want a clear way of saying what needs to be said.

Perhaps the best thing to do is (a) be more explicit about what lotteries are in the VNM system; and (b) be less explicit about how lotteries and acts interact. Use of the more neutral word “options” might help here [where options are the things the agent is choosing between].

Specifically, I could explicitly note that lotteries are the options on the VNM account (which is not to say that all lotteries are options but rather that all options are lotteries on this account), outline everything in terms of lotteries and then, when talking about the issue of action guidance, I can note that VNM, at least in the standard formulation, requires that an agent already has preferences over options and note that this might seem undesirable.

• If I understand correctly, Peterson is defining “acts” and “risky acts” as completely separate things (functions from states to outcomes, and lotteries over outcomes, respectively). If that’s true, it clears up the confusion, but that seems like extraordinarily bad terminology.

• Okay, well I’ve rewritten this for the next update in a way that hopefully resolves the issues.

If you have time, once the update is posted I’d love to know whether you think the rewrite is successful. In any case, thanks for taking the time to comment so far.

• Point conceded (both your point and shminux’s). Edited for the next update.

• Perhaps worth noting that earlier in the document we defined acts as functions from world states to outcomes

I could not find a definition of “world state” in the document. All you say is

a state is a part of the world that is not an act (that can be performed now by the decision maker) or an outcome.

which is by no means a good definition. It tells you what a state is not, but not what it is. It even fails at that, given that it uses the term “part of the world” without it being previously defined.

• My understanding is that in the VNM system, utility is defined over lotteries. Is this the point you’re contesting or are you happy with that but unhappy with the use of the word “acts” to describe these lotteries. In other words, do you think the portrayal of the VNM system as involving preferences over lotteries is wrong or do you think that this is right but the way we describe it conflates two notions that should remain distinct.

• Awesome, I look forward to reading this. Thanks, Luke and crazy88!

• Typo in section 2: “attached” should read “attacked.”

• Thanks, fixed for the next update.

• In this case, even if an extremely low value is set for L, it seems that paying this amount to play the game is unreasonable. After all, as Peterson notes, about nine times out of ten an agent that plays this game will win no more than 8 · 10-100 utility.

It seems there’s an error here. Should it be “In this case, even if an extremely high value is set for L, it seems that paying a lot to play the game is unreasonable.”?

• Typo:

Usually, it is argued that each of the axioms are pragmatically justified because an agent which violates the axioms can face situations in which they are guaranteed end up worse off (from their own perspective).

guaranteed to end up worse off

• Does the horizontal axis of the decision tree in section 3 represent time? If so, I’d advocate smearing those red triangles out over the whole history of actions and events. Even though, in the particular example, it’s unlikely that the agent cares about having been insured as such, apart from the monetary payoffs, in the general case agents care about the whole history. I think that forgetting this point sometimes leads to misapplications of decision theory.

• Does the horizontal axis of the decision tree in section 3 represent time?

Yes and no. Yes, because presumably the agent’s end result re: house and money occurs after the fire and the fire will happen after the decision to take out insurance (otherwise, there’s not much point taking out insurance). No, because the diagram isn’t really about time, even if there is an accidental temporal component to it. Instead, the levels of the diagram correspond to different factors of the decision scenario: the first level is about the agent’s choice, the second level about the states of natures and the third about the final outcome.

Given that this is how the diagram works, smearing out the triangles would mix up the levels and damage the clarity of the diagram. To model an agent as caring about whether they were insured or not, we would simply modify the text next to the triangles to something like “Insurance, no house and \$99,900”, “Insurance, house and - \$100″ and so on (and then we would assign different utilities to the agent based partly on whether they were insured or not as well as whether they had a house and how much money they had).

I think that forgetting this point sometimes leads to misapplications of decision theory.

I agree, though I think that talking of utility rather than money solves many of these problems. After all, utility should already take into account an agents desire to be insured etc and so talk about utility should be less likely to fall into these traps (which isn’t to say there are never any problems)

• When reading about Transparent Newcomb’s problem: Isn’t this perfectly general? Suppose Omega says: I give everyone who subscribes to decision theory A \$1000, and give those who subscribe to other decision theories nothing. Clearly everyone who subscribes to decision theory A “wins”.

It seems that if one lives in the world with many such Omegas, and subscribing to decision theory A (vs subscribing to decision theory B) would otherwise lead to losing at most, say, \$100 per day between two successive encounters with such Omegas, then one would win overall by subscribing (or self-modifying to subscribe) to A.

In other words, if subscribing to certain decision theory changes your subjective experience of the world (not sure what proper terminology for this is), which decision theory wins will depend on the world you live in. There would simply not be a “universal” winning decision theory.

Similar thing will happen with counterfactual mugging—if you expect to encounter the coin-tossing Omega again many times then you should give up your \$100, and if not then not.

• How many times in a row will you be mugged, before you realize that omega was lying to you?

• Really you probably need start imagining Omega as a trustworthy process, e.g. a mathematical proof that tells you ‘X’—thinking it as a person seems to trip you up if you are constantly bringing up the possibility it’s lying when it says ‘X’...

• Omega is, by definition, always truthful.

• Maybe worth noting that there’s recommended reading on decision theory on the “Best textbooks on every subject” post.

On decision theory, lukeprog recommends Peterson’s An Introduction to Decision Theory over Resnik’s Choices and Luce & Raiffa’s Games and Decisions.

• Two of those are the books I recommend at the end of the first section.

• In this equation, V(A & O) represents the value to the agent of the combination of an act and an outcome. So this is the utility that the agent will receive if they carry out a certain act and a certain outcome occurs. Further, PrAO represents the probability of each outcome occurring on the supposition that the agent carries out a certain act. It is in terms of this probability that CDT and EDT differ. EDT uses the conditional probability, Pr(O|A), while CDT uses the probability of subjunctive conditionals, Pr(A

O).

Please, don’t use the same letters for the set and for the elements of this set. It’s confusing.

• The second problem with using the law of large numbers to justify EUM has to do with a mathematical theorem known as gambler’s ruin. Imagine that you and I flip a fair coin, and I pay you \$1 every time it comes up heads and you pay me \$1 every time it comes up tails. We both start with \$100. If we flip the coin enough times, one of us will face a situation in which the sequence of heads or tails is longer than we can afford. If a long-enough sequence of heads comes up, I’ll run out of \$1 bills with which to pay you. If a long-enough sequence of tails comes up, you won’t be able to pay me. So in this situation, the law of large numbers guarantees that you will be better off in the long run by maximizing expected utility only if you start the game with an infinite amount of money (so that you never go broke), which is an unrealistic assumption. (For technical convenience, assume utility increases linearly with money. But the basic point holds without this assumption.)

I don’t understand. The final result is 50% for \$0 and 50% for \$200. Expected money is \$100, same as if I didn’t play. What’s the problem?

• I think the example given to show the irrationality of leximin in certain situations doesn’t do a good job of distinguishing its failings from maximin. To usefully illustrate the difference between the two I believe a another state is required with even worse outcomes for both acts (e.g. \$0). This way the worst outcomes for both acts would be equal and so the second worst outcomes (a1:\$1, a2:\$1.01) would then be compared under the leximin strategy leading to the choice of a2 as the best act again with the acknowledgment that you miss out on the opportunity to get \$10,001.01

• In the last chapter of his book “Utility Theory for Decision Making,” Peter Fishburn published a concise rendering of Leonard Savage’s proof that “rational” preferences over events implied that one behaved “as if” he (or she) was obeying Expected Utility Theory. He furthermore proved that following Savage’s axioms implied that your utility function is bounded (he attributes this extension of the proof, in its essence, to Savage). So Subjective Expected Utility Theory has an answer to the St. Petersburg Paradox “built in” to its axioms. That seems like a point well worth mentioning in this article.

• The image of Ellsberg’s Paradox has the picture of the Yellow/​Blue bet replaced with a picture of a Yellow/​Red bet. Having looked at the picture I was about to claim that it was always rational to take the R/​B bet over Y/​R before I read the actual description.

• Will be fixed in the next update. Thanks for pointing it out.

• Isn’t there a typo in “Experiments have shown that many people prefer (1A) to (1B) and (2B) to (2A).” ? Shouldn’t it be “(2A) to (2B)” ?

Edit : hrm, no, in fact it’s like http://​​lesswrong.com/​​lw/​​gu1/​​decision_theory_faq/​​8jav said : it should be 24 000\$ instead of 27 000\$ in option A, or else it makes no sense.

• Thanks, as you note, the linked comment is right.

• Thus, the expected utility (EU) of choice A is, for this decision maker, (1)(1000) = 1000. Meanwhile, the EU of choice B is (0.5)(1500) + (0.5)(0) = 750. In this case, the expected utility of choice B is greater than that of choice A, even though choice B has a greater expected monetary value.

Choice A at 1000 is still greater than Choice B at 750

• Thanks, will be fixed in next update.

• It’s Stiennon, not Steinnon.

• Fixed for next update. Thanks.

• Minor error: In the prisoner’s dilemma example, the decision matrix has twenty years for if you cooperate and your partner defects, while the text quoted right above the matrix claims that that amount is twenty five years.

• Thanks. I’ve fixed this up in the next update (though it won’t appear on the LW version yet).

• I find it helpful to use the term “security level” to understand maximin/​leximin and “hope level” to understand maximax. “Security level” is the worst case scenario, and under maximin/​leximin we want to maximize it. “Hope level” is the best case scenario, and under maximax, we want to maximize it.

• Concerning the transitivity axiom, what about rational choices in situations of intransitivity cycles?

• 13 May 2015 18:10 UTC
0 points

Bossert link does not work: ( (Sorry. Edited to make sense.)

• (Well, sort of. The minimax and maximax principles require only that we measure value on an ordinal scale, whereas the optimism-pessimism rule requires that we measure value on an interval scale.)

I’m using this as an introduction to decision theory so I might be wrong, and I’ve read that ‘maximin’ and ‘minimax’ do have different meanings in game theory, but you exclusively use the term ‘maximin’ up to a certain point and then mention a ‘minimax principle’ once, so I can only imagine that you meant to write ‘maximin principle.’ It confused me. It’s probably best to stick with one or the other.

Also, thanks for the introduction.

ETA: I found a typo.

Usually, it is argued that each of the axioms are pragmatically justified because an agent which violates the axioms can face situations in which they are guaranteed end up worse off (from their own perspective).

should be:

Usually, it is argued that each of the axioms are pragmatically justified because an agent which violates the axioms can face situations in which they are guaranteed to end up worse off (from their own perspective).

• Minimax is equivalent to maximin in a zero sum game because every resource I win you lose out on. However, in non-zero games which encourage cooperation this is no longer case. Take for example a scenario in which we share some land. Let’s assume I have one third and you have two. If I water your crops as well as my own every other day you have agreed to water mine on the other alternate days when I must look after my kids. If I don’t water your crops you won’t water mine back but you don’t have any kids so you’d actually be able to water every day without me although you’d rather hang out with your friends at the pool if you could. If I want to minimise the maximum (minimax) utility you get I wouldn’t water your crops at all and force you to water every day. Of course you’d resent me for this and wouldn’t water my crops so I’d end up with no crop. If I want to maximise the minimum (maximin) crop yield for myself, I’ll have to water your crops too even though you have twice as many crops as me and you’ll get to hang out with your friends as well!

• Another objection to the VNM approach (and to expected utility approaches generally), the St. Petersburg paradox, draws on the possibility of infinite utilities. The St. Petersburg paradox is based around a game where a fair coin is tossed until it lands heads up. At this point, the agent receives a prize worth 2n utility, where n is equal to the number of times the coin was tossed during the game. The so-called paradox occurs because the expected utility of choosing to play this game is infinite and so, according to a standard expected utility approach, the agent should be willing to pay any finite amount to play the game. However, this seems unreasonable. Instead, it seems that the agent should only be willing to pay a relatively small amount to do so. As such, it seems that the expected utility approach gets something wrong.

Various responses have been suggested. Most obviously, we could avoid the paradox … by limiting agents’ utilities to finite values. However … these moves seem ad hoc. It’s unclear why we should set some limit to the amount of utility an agent can receive.

This is incorrect. The VNM theorem says that the utility function assigns a real number to each lottery. Infinity is not a real number, so the VNM system will never assign infinite utility to any lottery. Limiting agents’ utilities to finite values is not an ad hoc patch; it is a necessary consequence of the VNM axioms. See also http://​​lesswrong.com/​​lw/​​gr6/​​vnm_agents_and_lotteries_involving_an_infinite/​​

• Thanks! I’ve edited the article. What do you think of my edit?

• Most obviously, we could say that the paradox does not apply to VNM agents, since the VNM theorem assigns real numbers to all lotteries, and infinity is not a real number.

That works.

A fair coin is tossed until it lands heads up. The player thereafter receives a prize worth min {2^n · 10^-100, L} units of utility, where n is the number of times the coin was tossed.

In this case, even if an extremely low value is set for L, it seems that paying this amount to play the game is unreasonable.

I don’t think removing the “like 1” helps much. This phrasing leaves it unclear what “extremely low value” means, and I suspect most people who would object to maximizing expected utility when L=1 would still think it is reasonable when L=10^-99, which seems like a more reasonable interpretation of “extremely low value” when numbers like 10^-100 are mentioned.

• Thanks, a very useful overview.

• In section 8.1, your example of the gambler’s ruin postulates that both agents have the same starting resources, but this is exactly the case in which the gambler’s ruin doesn’t apply. That might be worth changing.

1. Can decisions under ignorance be transformed into decisions under uncertainty?

I’d add a comment on Jaynes’ solution for determining ignorance priors in terms of transformation groups.

I’d say that there’s no such think as an “ignorance” prior—priors are set by information. Setting a prior by symmetry or the more general transformation group is an assertion of information.

• There are numerous typos throughout the thing. Someone needs to re-read it. The math in “8.6.3. The Allais paradox” is all wrong, option 2A is not actually 34% of 1A and 66% of nothing, etc.

• 1 Mar 2013 0:53 UTC
0 points

This may not be the best place for this question, but it’s something I’ve been wondering for a while: how does causal decision theory fail us humans in the real world, here and now?

• This may not be the best place for this question, but it’s something I’ve been wondering for a while: how does causal decision theory fail us humans in the real world, here and now?

Us humans almost never use Causal Decision Theory in the real world, here and now. As such it fails us very little. What humans actually tend to use is about as similar to TDT as it is to CDT (ie. actual practice diverges from each of those ideals in different ways and couldn’t be said to be doing either.)

• All right, but how would CDT fail us, if we used it perfectly?

• All right, but how would CDT fail us, if we used it perfectly?

If we used it perfectly it would fail us very little. The ‘used perfectly’ part would prompt us to create and optimize institutions to allow the limitations of CDT to be worked around. It would result in a slightly less efficient system with some extra overheads and some wasted opportunities but it would still be rather good. Specifically it would require more structures in place for creating and enforcing precomittments and certain kind of cooperation would be unavailable.

• Ah, that’s right. CDT tells you to defect in a prisoner’s dilemma against someone identical to you; TDT tells you to cooperate. So TDT wins here.

• CDT does, however, tell you to precommit to cooperate in a prisoner’s dilemma against someone who also precommits to cooperate with you, if this is an option.

• There is one rather annoying subtext that recurs throughout the FAQ: the very casual and carefree use of the words “rational” and “irrational”, with the rather flawed idea that following some axiomatic system (e.g. VNM) and Bayes is “rational” and not doing so is “irrational”. I think this is a dis-service, and, what’s more, fails to look into the effects of intelligence, experience, training and emotion. The Allias paradox scratches the surface, as do various psych experiments. But …

The real question is “why does this or that model differ from human nature?” : this question seems to never be asked overtly, but it does seem to get an implicit answer: because humans are irrational. I don’t like that answer: I doubt that they are irrational per-se, rather, they are reacting to certain learned truths about the environment, and incorporating that into judgments,

So, for example: every day, we are bombarded with advertisers forcing us to make judgments: “if you buy our product, you will benefit in this way.” which is a decision-theoretic decision based on incomplete information. I usually make a different choice: “you can choose to pay attention to this ad, or to ignore it: if you pay attention to this ad, you trade away some of you attention span, in return for something that might be good; but if you ignore it, you sacrifice nothing, but win nothing.” I make this last choice hundreds of times a day. Maybe thousands. I am one big optimized mean green decision machine.

The advertizers have trained us in certain ways: in particular, they have trained us to disbelieve their propositions: they have a bad habit of lying, of over-selling and under-delivering. So when I see offers like “a jar contains red blue and yellow balls...” my knee-jerk reaction is “bullshit, I know that you guys are probably trying to scam me, and I’d be an idiot for picking blue instead of yellow, because I know that most typical salespeople have already removed all the blue marbles from the jar. Only a gullible fool would believe otherwise, so cut it out with that Bayesian prior snow-job. We’re not country bumpkins, you know.” ﻿

The above argument, even if made ex-post-facto, is an example of the kind of thinking that humans engage in regularly. Humans make thousands of decisions a day (Should I watch TV now? Should I go to the bathroom? Should I read this? What should I type as the next word of this sentence?) and it seems awfully naive to claim that if any of these decisions don’t follow VNM+Bayes, they are “irrational”. I think its discounting intelligence far more than it should.

• The conclusion to section “11.1.3. Medical Newcomb problems” begs a question which remains unanswered: -- “So just as CDT “loses” on Newcomb’s problem, EDT will “lose” on Medical Newcomb problems (if the tickle defense fails) or will join CDT and “lose” on Newcomb’s Problem itself (if the tickle defense succeeds).”

If I was designing a self-driving car and had to provide an algorithm for what to do during an emergency, I may choose to hard-code CDT or EDT into the system, as seems appropriate. However, as an intelligent being, not a self-driving car, I am not bound to always use EDT or always use CDT: I have the option to carefully analyse the system, and, upon discovering its acausal nature (as the medical researchers do in the second study) then I should choose to use CDT; else I should use EDT.

So the real question is: “Under what circumstances should I use EDT, and when should I use CDT”? Section 11.1.3 suggests a partial answer: when the evidence shows that the system really is acausal, and maybe use EDT the rest of the time.

• Hmm. I just got a −1 on this comment … I thought I posed a reasonable question, and I would have thought it to even be a “commonly asked question”, so why would it get a −1? Am I misunderstanding something, or am I being unclear?

• Omega is, by definition, always truthful.

EDIT: Sorry, thought this was in reply to a different comment.

• But note burger-choosing Jane (6.1) is still irrational—for she has discounted the much stronger preference of a cow not to be harmed. Rationality entails overcoming egocentric bias—and ethnocentric and anthropocentric bias—and adopting a God’s eye point-of-view that impartially gives weight to all possible first-person perspectives.

• When we say ‘rationality’, we mean instrumental rationality; getting what you want. Elsewhere, we also refer to epistemic rationality, which is believing true things. In neither case do we say anything about what you should want.

It might be a good thing to care about cows, but it’s not rationality as we understand the word. Good you bring this up though, as I can easily imagine others being confused.

• Elsewhere, we also refer to epistemic rationality, which is believing true things. In neither case do we say anything about what you should want.

This begs the question against moral realism. If it is in fact true that harming cows is bad, then epistemic rationality demands that we believe that harming cows is bad. Of course, saying that you should believe harming cows is morally wrong is different from saying that you shouldn’t choose to harm cows, but the inference from one to the other is pretty plausible. It seems fairly uncontroversial that if one believes that action X is bad, and it is in fact true that action X is bad, then one should not perform action X (ceteris paribus).

I don’t agree with davidpearce’s framing (that rationality demands that one give equal weight to all perspectives), but I also don’t agree with the claim that rationality does not tell us anything about what we should want. Perhaps instrumental rationality doesn’t, but epistemic rationality does.

• Sure—but rationality doesn’t itself imply moral conclusions. It only says that if there are moral facts then we should believe them, not that there are any (particular) moral facts, which is what David needs.

• Larks, no, pragmatist nicely captured the point I was making. If we are trying to set out what an “ideal, perfectly rational agent” would choose, then we can’t assume that such a perfectly rational agent would arbitrarily disregard a stronger preference in favour of a weaker preference. Today, asymmetries of epistemic access mean that weaker preferences often trump stronger preferences; but with tomorrow’s technology, this cognitive limitation on our decision-making procedures can be overcome.

• Isn’t the giant elephant in this room the whole issue of moral realism? I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true—the combination of physical fact and logical function against which my moral judgments are being compared. This gives my moral beliefs truth value. And having laid this out, it becomes perfectly obvious that it’s possible to build powerful optimizers who are not motivated by what I call moral truths; they are maximizing something other than morality, like paperclips. They will also meta-maximize something other than morality if you ask them to choose between possible utility functions, and will quite predictably go on picking the utility function “maximize paperclips”. Just as I correctly know it is better to be moral than to be paperclippy, they accurately evaluate that it is more paperclippy to maximize paperclips than morality. They know damn well that they’re making you unhappy and violating your strong preferences by doing so. It’s just that all this talk about the preferences that feel so intrinsically motivating to you, is itself of no interest to them because you haven’t gotten to the all-important parts about paperclips yet.

The main thing I’m not clear on in this discussion is to what extent David Pearce is being innocently mysterian vs. motivatedly mysterian. To be confused about how your happiness seems so intrinsically motivating, and innocently if naively wonder if perhaps it must be intrinsically motivating to other minds as well, is one thing. It is another thing to prefer this conclusion and so to feel a bit uncurious about anyone’s detailed explanation of how it doesn’t work like that. It is even less innocent to refuse outright to listen when somebody else tries to explain. And then strangest of all is to state powerfully and definitely that every bit of happiness must be motivating to all other minds, even though you can’t lay out step by step how the decision procedure would work. This requires overrunning your own claims to knowledge in a fundamental sense—mistaking your confusion about something for the ability to make definite claims about it. Now this of course is a very common and understandable sin, and the fact that David Pearce is crusading for happiness for all life forms should certainly count into our evaluation of his net virtue (it would certainly make me willing to drink a Pepsi with him). But I’m also not clear about where to go from here, or whether this conversation is accomplishing anything useful.

In particular it seems like David Pearce is not leveling any sort of argument we could possibly find persuasive—it’s not written so as to convince anyone who isn’t already a moral realist, or addressing the basic roots of disagreement—and that’s not a good sign. And short of rewriting the entire metaethics sequence in these comments I don’t know how I could convince him, either.

• I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true

Even among philosophers, “moral realism” is a term wont to confuse. I’d be wary about relying on it to chunk your philosophy. For instance, the simplest and least problematic definition of ‘moral realism’ is probably the doctrine...

minimal moral realism: cognitivism (moral assertions like ‘murder is bad’ have truth-conditions, express real beliefs, predicate properties of objects, etc.) + success theory (some moral assertions are true; i.e., rejection of error theory).

This seems to be the definition endorsed on SEP’s Moral Realism article. But it can’t be what you have in mind, since you accept cognitivism and reject error theory. So perhaps you mean to reject a slightly stronger claim (to coin a term):

factual moral realism: MMR + moral assertions are not true or false purely by stipulation (or ‘by definition’); rather, their truth-conditions at least partly involve empirical, worldly contingencies.

But here, again, it’s hard to find room to reject moral realism. Perhaps some moral statements, like ‘suffering is bad,’ are true only by stipulation; but if ‘punching people in the face causes suffering’ is not also true by stipulation, then the conclusion ‘punching people in the face is bad’ will not be purely stipulative. Similarly, ‘The Earth’s equatorial circumference is ~40,075.017 km’ is not true just by definition, even though we need somewhat arbitrary definitions and measurement standards to assert it. And rejecting the next doesn’t sound right either:

correspondence moral realism: FMR + moral assertions are not true or false purely because of subjects’ beliefs about the moral truth. For example, the truth-condition for ‘eating babies is bad’ are not ‘Eliezer Yudkowsky thinks eating babies is bad’, nor even ‘everyone thinks eating babies is bad’. Our opinions do play a role in what’s right and wrong, but they don’t do all the work.

So perhaps one of the following is closer to what you mean to deny:

moral transexperientialism: Moral facts are nontrivially sensitive to differences wholly independent of, and having no possible impact on, conscious experience. The goodness and badness of outcomes is not purely a matter of (i.e., is not fully fixed by) their consequences for sentients. This seems kin to Mark Johnston’s criterion of ‘response-dependence’. Something in this vicinity seems to be an important aspect of at least straw moral realism, but it’s not playing a role here.

moral unconditionalism: There is a nontrivial sense in which a single specific foundation for (e.g., axiomatization of) the moral truths is the right one—‘objectively’, and not just according to itself or any persons or arbitrarily selected authority—and all or most of the alternatives aren’t the right one. (We might compare this to the view that there is only one right set of mathematical truths, and this rightness is not trivial or circular. Opposing views include mathematical conventionalism and ‘if-thenism’.)

moral non-naturalism: Moral (or, more broadly, normative) facts are objective and worldly in an even stronger sense, and are special, sui generis, metaphysically distinct from the prosaic world described by physics.

Perhaps we should further divide this view into ‘moral platonism’, which reduces morality to logic/​math but then treats logic/​math as a transcendent, eternal Realm of Thingies and Stuff; v. ‘moral supernaturalism’, which identifies morality more with souls and ghosts and magic and gods than with logical thingies. If this distinction isn’t clear yet, perhaps we could stipulate that platonic thingies are acausal, whereas spooky supernatural moral thingies can play a role in the causal order. I think this moral supernaturalism, in the end, is what you chiefly have in mind when you criticize ‘moral realism’, since the idea that there are magical, irreducible Moral-in-Themselves Entities that can exert causal influences on us in their own right seems to be a prerequisite for the doctrine that any possible agent would be compelled (presumably by these special, magically moral objects or properties) to instantiate certain moral intuitions. Christianity and karma are good examples of moral supernaturalisms, since they treat certain moral or quasi-moral rules and properties as though they were irreducible physical laws or invisible sorcerors.

At the same time, it’s not clear that davidpearce was endorsing anything in the vicinity of moral supernaturalism. (Though I suppose a vestigial form of this assumption might still then be playing a role in the background. It’s a good thing it’s nearly epistemic spring cleaning time.) His view seems somewhere in the vicinity of unconditionalism—if he thinks anyone who disregards the interests of cows is being unconditionally epistemically irrational, and not just ‘epistemically irrational given that all humans naturally care about suffering in an agent-neutral way’. The onus is then on him and pragmatist to explain on what non-normative basis we could ever be justified in accepting a normative standard.

• I’m not sure this taxonomy is helpful from David Pearce’s perspective. David Pearce’s position is that there are universally motivating facts—facts whose truth, once known, is compelling for every possible sort of mind. This reifies his observation that the desire for happiness feels really, actually compelling to him and this compellingness seems innate to qualia, so anyone who truly knew the facts about the quale would also know that compelling sense and act accordingly. This may not correspond exactly to what SEP says under moral realism and let me know if there’s a standard term, but realism seems to describe the Pearcean (or Eliezer circa 1996) feeling about the subject—that happiness is really intrinsically preferable, that this is truth and not opinion.

From my perspective this is a confusion which I claim to fully and exactly understand, which licenses my definite rejection of the hypothesis. (The dawning of this understanding did in fact cause my definite rejection of the hypothesis in 2003.) The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something, so if you try to use your empathy to imagine another mind fully understanding this mysterious opaque data (quale) whose content is actually your internal code for “compelled to do that”, you imagine the mind being compelled to do that. You’ll be agnostic about whether or not this seems supernatural because you don’t actually know where the mysterious compellingness comes from. From my perspective, this is “supernatural” because your story inherently revolves around mental facts you’re not allowed to reduce to nonmental facts—any reduction to nonmental facts will let us construct a mind that doesn’t care once the qualia aren’t mysteriously irreducibly compelling anymore. But this is a judgment I pass from reductionist knowledge—from a Pearcean perspective, there’s just a mysteriously compelling quality about happiness, and to know this quale seems identical with being compelled by it; that’s all your story. Well, that plus the fact that anyone who says that some minds might not be compelled by happiness, seems to be asserting that happiness is objectively unimportant or that its rightness is a matter of mere opinion, which is obviously intuitively false. (As a moral cognitivist, of course, I agree that happiness is objectively important, I just know that “important” is a judgment about a certain logical truth that other minds do not find compelling. Since in fact nothing can be intrinsically compelling to all minds, I have decided not to be an error theorist as I would have to be if I took this impossible quality of intrinsic compellingness to be an unavoidable requirement of things being good, right, valuable, or important in the intuitive emotional sense. My old intuitive confusion about qualia doesn’t seem worth respecting so much that I must now be indifferent between a universe of happiness vs. a universe of paperclips. The former is still better, it’s just that now I know what “better” means.)

But if the very definitions of the debate are not automatically to judge in my favor, then we should have a term for what Pearce believes that reflects what Pearce thinks to be the case. “Moral realism” seems like a good term for “the existence of facts the knowledge of which is intrinsically and universally compelling, such as happiness and subjective desire”. It may not describe what a moral cognitivist thinks is really going on, but “realism” seems to describe the feeling as it would occur to Pearce or Eliezer-1996. If not this term, then what? “Moral non-naturalism” is what a moral cognitivist says to deconstruct your theory—the self-evident intrinsic compellingness of happiness quales doesn’t feel like asserting “non-naturalism” to David Pearce, although you could have a non-natural theory about how this mysterious observation was generated.

• This reifies his observation that the desire for happiness feels really, actually compelling to him and this compellingness seems innate to qualia

I’m not sure he’s wrong in saying that feeling the qualia of a sentient, as opposed to modeling those qualia in an affective black box without letting the feels ‘leak’ into the rest of your cognitionspace, requires some motivational effect. There are two basic questions here:

First, the Affect-Effect Question: To what extent are the character of subjective experiences like joy and suffering intrinsic or internal to the state, as opposed to constitutively bound up in functional relations that include behavioral impetuses? (For example, to what extent is it possible to undergo the phenomenology of anguish without thereby wanting the anguish to stop? And to what extent is it possible to want something to stop without being behaviorally moved, to the extent one is able and to the extent one’s other desires are inadequate overriders, to stop it?) Compare David Lewis’ ‘Mad Pain’, pain that has the same experiential character as ordinary pain but none of its functional relations (or at least not the large-scale ones). Some people think a state of that sort wouldn’t qualify as ‘pain’ at all, and this sort of relationalism lends some credibility to pearce’s view.

Second, the Third-Person Qualia Question: To what extent is phenomenological modeling (modeling a state in such a way that you, or a proper part of you, experiences that state) required for complete factual knowledge of real-world agents? One could grant that qualia are real (and really play an important role in various worldly facts, albeit perhaps physical ones) and are moreover unavoidably motivating (if you aren’t motivated to avoid something, then you don’t really fear it), but deny that an epistemically rational agent is required to phenomenologically model qualia. Perhaps there is some way to represent the same mental states without thereby experiencing them, to fully capture the worldly facts about cows without simulating their experiences oneself. If so, then knowing everything about cows would not require one to be motivated (even in some tiny powerless portion of oneself) to fulfill the values of cows. (Incidentally, it’s also possible in principle to grant the (admittedly spooky) claim that mental states are irreducible and indispensable, without thinking that you need to be in pain in order to fully and accurately model another agent’s pain; perhaps it’s possible to accurately model one phenomenology using a different phenomenology.)

And again, at this point I don’t think any of these positions need to endorse supernaturalism, i.e., the idea that special moral facts are intervening in the causal order to force cow-simulators, against their will, to try to help cows. (Perhaps there’s something spooky and supernatural about causally efficacious qualia, but for the moment I’ll continue assuming they’re physical states—mayhap physical states construed in a specific way.) All that’s being disputed, I think, is to what extent a programmer of a mind-modeler could isolate the phenomenology of states from their motivational or behavioral roles, and to what extent this programmer could model brains at all without modeling their first-person character.

As a limiting case: Assuming there are facts about conscious beings, could an agent simulate everything about those beings without ever becoming conscious itself? (And if it did become conscious, would it only be conscious inasmuch as it had tiny copies of conscious beings inside itself? Or would it also need to become conscious in a more global way, in order to access and manipulate useful information about its conscious subsystems?)

Incidentally, these engineering questions are in principle distinct both from the topic of causally efficacious irreducible Morality Stuff (what I called moral supernaturalism), and from the topic of whether moral claims are objectively right, that, causally efficacious or not, moral facts have a sort of ‘glow of One True Oughtness’ (what I called moral unconditionalism, though some might call it ‘moral absolutism’), two claims the conjunction of which it sounds like you’ve been labeling ‘moral realism’, in deference to your erstwhile meta-ethic. Whether we can motivation-externally simulate experiential states with perfect fidelity and epistemic availability-to-the-simulating-system-at-large is a question for philosophy of mind and computer science, not for meta-ethics. (And perhaps davidpearce’s actual view is closer to what you call moral realism than to my steelman. Regardless, I’m more interested in interrogating the steelman.)

“Moral non-naturalism” is what a moral cognitivist says to deconstruct your theory—the self-evident intrinsic compellingness of happiness quales doesn’t feel like asserting “non-naturalism” to David Pearce, although you could have a non-natural theory about how this mysterious observation was generated.

So terms like ‘non-naturalism’ or ‘supernaturalism’ are too theory-laden and sophisticated for what you’re imputing to Pearce (and ex-EY), which is really more of a hunch or thought-terminating-clichéplex. In that case, perhaps ‘naïve (moral) realism’ or ‘naïve absolutism’ is the clearest term you could use. (Actually, I like ‘magical absolutism’. It has a nice ring to it, and ‘magical’ gets at the proto-supernaturalism while ‘absolutism’ gets at the proto-unconditionalism. Mm, words.) Philosophers love calling views naïve, and the term doesn’t have a prior meaning like ‘moral realism’, so you wouldn’t have to deal with people griping about your choice of jargon.

This would also probably be a smart rhetorical move, since a lot of people don’t see a clear distinction between cognitivism and realism and might be turned off by your ideas qua an anti-realism theory even if they’d have loved them qua a realist theory. ‘Tis part of why I tried to taboo the term as ‘minimal moral realism’ etc., rather than endorsing just one of the definitions on offer.

• Eliezer, you remark, “The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something,” Would you propose that a mind lacking in motivation couldn’t feel blissfully happy? Mainlining heroin (I am told) induces pure bliss without desire—shades of Buddhist nirvana? Pure bliss without motivation can be induced by knocking out the dopamine system and directly administering mu opioid agonists to our twin “hedonic hotspots” in the ventral pallidum and rostral shell of the nucleus accumbens. Conversely, amplifying mesolimbic dopamine function while disabling the mu opioid pathways can induce desire without pleasure.

[I’m still mulling over some of your other points.]

• Would you propose that a mind lacking in motivation couldn’t feel blissfully happy?

Here we’re reaching the borders of my ability to be confident about my replies, but the two answers which occur to me are:

1) It’s not positive reinforcement unless feeling it makes you experience at least some preference to do it again—otherwise in what sense are the neural networks getting their plus? Heroin may not induce desire while you’re on it, but the thought of the bliss induces desire to take heroin again, once you’re off the heroin.

2) The superBuddhist no longer capable of experiencing desire or choice, even desire or choice over which thoughts to think, also becomes incapable of experiencing happiness (perhaps its neural networks aren’t even being reinforced to make certain thoughts more likely to be repeated). However, you, who are still capable of desire and who still have positively reinforcing thoughts, might be tricked into considering the superBuddhist’s experience to be analogous to your own happiness and therefore acquire a desire to be a superBuddhist as a result of imagining one—mostly on account of having been told that it was representing a similar quale on account of representing a similar internal code for an experience, without realizing that the rest of the superBuddhist’s mind now lacks the context your own mind brings to interpreting that internal coding into pleasurable positive reinforcement that would make you desire to repeat that experiential state.

• It’s a reasonably good description, though wanting and liking seem to be neurologically separate, such that liking does not necessarily reflect a motivation, nor vice-versa (see: Not for the sake of pleasure alone. Think the pleasurable but non-motivating effect of opioids such as heroin. Even in cases in which wanting and liking occur together, this does not necessarily invalidate the liking aspect as purely wanting.

Liking and disliking, good and bad feelings as qualia, especially in very intense amounts, seem to be intrinsically so to those who are immediately feeling them. Reasoning could extend and generalize this.

• Heh. Yes, I remember reading the section on noradrenergic vs. dopaminergic motivation in Pearce’s BLTC as a 16-year-old. I used to be a Pearcean, ya know, hence the Superhappies. But that distinction didn’t seem very relevant to the metaethical debate at hand.

• It’s possible (I hope) to believe future life can be based on information-sensitive gradients of (super)intelligent well-being without remotely endorsing any of my idiosyncratic views on consciousness, intelligence or anything else. That’s the beauty of hedonic recalibration. In principle at least, hedonic recalibration can enrich your quality of life and yet leave most if not all of your existing values and preference architecture intact .- including the belief that there are more important things in life than happiness.

• Agreed. The conflict between the Superhappies and the Lord Pilot had nothing to do with different metaethical theories.

Also, we totally agree on wanting future civilization to contain very smart beings who are pretty happy most of the time. We just seem to disagree about whether it’s important that they be super duper happy all of the time. The main relevance metaethics has to this is that once I understood there was no built-in axis of the universe to tell me that I as a good person ought to scale my intelligence as fast as possible so that I could be as happy as possible as soon as possible, I decided that I didn’t really want to be super happy all the time, the way I’d always sort of accepted as a dutiful obligation while growing up reading David Pearce. Yes, it might be possible to do this in a way that would leave as much as possible of me intact, but why do it at all if that’s not what I want?

There’s also the important policy-relevant question of whether arbitrarily constructed AIs will make us super happy all the time or turn us into paperclips.

• Huh, when I read the story, my impression was that it was Lord Pilot not understanding that it was a case of “Once you go black, you can’t go back”. Specifically, once you experience being superhappy, your previous metaethics stops making sense and you understand the imperative of relieving everyone of the unimaginable suffering of not being superhappy.

• I thought it was relevant to this, if not, then what was meant by motivation?

The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something

Consciousness is that of which we can be most certain of, and I would rather think that we are living in a virtual world under an universe with other, alien physical laws, than that consciousness itself is not real. If it is not reducible to nonmental facts, then nonmental facts don’t seem to account for everything there is of relevant.

From my perspective, this is “supernatural” because your story inherently revolves around mental facts you’re not allowed to reduce to nonmental facts—any reduction to nonmental facts will let us construct a mind that doesn’t care once the qualia aren’t mysteriously irreducibly compelling anymore.

• I suggest that to this array of terms, we should add moral indexicalism to designate Eliezer’s position, which by the above definition would be a special form of realism. As far as I can tell, he basically says that moral terms are hidden indexicals in Putnam’s sense.

• their consequences for sentients.

Watch out—the word “sentient” has at least two different common meanings, one of which includes cattle and the other doesn’t. EY usually uses it with the narrower meaning (for which a less ambiguous synonym is “sapient”), whereas David Pearce seems to be using it with the broader meaning.

• Ah. By ‘sentient’ I mean something that feels, by ‘sapient’ something that thinks.

To be more fine-grained about it, I’d define functional sentience as having affective (and perhaps perceptual) cognitive states (in a sense broad enough that it’s obvious cows have them, and equally obvious tulips don’t), and phenomenal sentience as having a first-person ‘point of view’ (though I’m an eliminativist about phenomenal consciousness, so my overtures to it above can be treated as a sort of extended thought experiment).

Similarly, we might distinguish a low-level kind of sapience (the ability to form and manipulate mental representations of situations, generate expectations and generalizations, and update based on new information) from a higher-level kind closer to human sapience (perhaps involving abstract and/​or hyper-productive representations à la language).

Based on those definitions, I’d say it’s obvious cows are functionally sentient and have low-level sapience, extremely unlikely they have high-level sapience, and unclear whether they have phenomenal sentience.

• Rob, many thanks for a thoughtful discussion above. But on one point, I’m confused. You say of cows that it’s “unclear whether they have phenomenal sentience.” Are you using the term “sentience” in the standard dictionary sense [“Sentience is the ability to feel, perceive, or be conscious, or to experience subjectivity”: http://​​en.wikipedia.org/​​wiki/​​Sentience ] Or are you using the term in some revisionary sense? At least if we discount radical philosophical scepticism about other minds, cows and other nonhuman vertebrates undergo phenomenal pain, anxiety, sadness, happiness and a whole bunch of phenomenal sensory experiences. For sure, cows are barely more sapient than a human prelinguistic toddler (though see e.g. http://​​www.appliedanimalbehaviour.com/​​article/​​S0168-1591(03)00294-6/​​abstract http://​​www.dailymail.co.uk/​​news/​​article-2006359/​​Moo-dini-Cow-unusual-intelligence-opens-farm-gate-tongue-herd-escape-shed.html ] But their limited capacity for abstract reasoning is a separate issue.

• Are you using the term “sentience” in the standard dictionary sense [“Sentience is the ability to feel, perceive, or be conscious, or to experience subjectivity”: http://​​en.wikipedia.org/​​wiki/​​Sentience ] Or are you using the term in some revisionary sense?

Neither. I’m claiming that there’s a monstrous ambiguity in all of those definitions, and I’m tabooing ‘sentience’ and replacing it with two clearer terms. These terms may still be problematic, but at least their problematicity is less ambiguous.

I distinguished functional sapience from phenomenal sapience. Functional sapience means having all the standard behaviors and world-tracking states associated with joy, hunger, itchiness, etc. It’s defined in third-person terms. Phenomenal sapience means having a subjective vantage point on the world; being sapient in that sense means that it feels some way (in a very vague sense) to be such a being, whereas it wouldn’t ‘feel’ any way at all to be, for example, a rock.

To see the distinction, imagine that we built a robot, or encountered an alien species, that could simulate the behaviors of sapients in a skillful and dynamic way, without actually having any experiences of its own. Would such a being necessarily be sapient? Does consistently crying out and withdrawing from some stimulus require that you actually be in pain, or could you be a mindless automaton? My answer is ‘yes, in the functional sense; and maybe, in the phenomenal sense’. The phenomenal sense is a bit mysterious, in large part because the intuitive idea of it arises from first-person introspection and not from third-person modeling or description, hence it’s difficult (perhaps impossible!) to find definitive third-person indicators of this first-person class of properties.

At least if we discount radical philosophical scepticism about other minds, cows and other nonhuman vertebrates undergo phenomenal pain, anxiety, sadness, happiness and a whole bunch of phenomenal sensory experiences.

‘Radical philosophical scepticism about other minds’ I take to entail that nothing has a mind except me. In other words, you’re claiming that the only way to doubt that there’s something it’s subjectively like to be a cow, is to also doubt that there’s something it’s subjectively like to be any human other than myself.

I find this spectacularly implausible. Again, I’m an eliminativist, but I’ll put myself in a phenomenal realist’s shoes. The neural architecture shared in common by humans is vast in comparison to the architecture shared in common between humans and cows. And phenomenal consciousness is extremely poorly understood, so we have no idea what evolutionary function it might serve or what mechanisms might need to be in place before it arises in any recognizable form. So to that extent we must also be extremely uncertain about (a) at what point(s) first-person subjectivity arises phylogenetically, and (b) at what point first-person subjectivity arises developmentally.

This phylogeny-development analogy is very important. If I doubt that cows are phenomenally conscious, I might also doubt that I myself was conscious when I was a baby, or relatively late into my fetushood. That’s perhaps a little surprising, but it’s hardly a devastating ‘radical scepticism’; it’s a perfectly tenable hypothesis. By contrast, to doubt that my friends and family members are phenomenally conscious would be like doubting that I myself was phenomenally conscious when I was 5 years old, or when I was 20, or even last month. (Perhaps my phenomenal memories are confabulations.) Equating these two forms of skepticism will require a pretty devastating argument! What do you have in mind?

• And here we see the value of replacing the symbol with the substance.

• Eliezer, in my view, we don’t need to assume meta-ethical realism to recognise that it’s irrational—both epistemically irrational and instrumentally irrational—arbitrarily to privilege a weak preference over a strong preference. To be sure, millions of years of selection pressure means that the weak preference is often more readily accessible. In the here-and-now, weak-minded Jane wants a burger asap. But it’s irrational to confuse an epistemological limitation with a deep metaphysical truth. A precondition of rational action is understanding the world. If Jane is scientifically literate, then she’ll internalise Nagel’s “view from nowhere” and adopt the God’s-eye-view to which natural science aspires. She’ll recognise that all first-person facts are ontologically on a par—and accordingly act to satisfy the stronger preference over the weaker. So the ideal rational agent in our canonical normative decision theory will impartially choose the action with the highest expected utility—not the action with an extremely low expected utility. At the risk of labouring the obvious, the difference in hedonic tone induced by eating a hamburger and a veggieburger is minimal. By contrast, the ghastly experience of having one’s throat slit is exceptionally unpleasant. Building anthropocentric bias into normative decision theory is no more rational than building geocentric bias into physics.

Paperclippers? Perhaps let us consider the mechanism by which paperclips can take on supreme value. We understand, in principle at least, how to make paperclips seem intrinsically supremely valuable to biological minds—more valuable than the prospect of happiness in the abstract. [“Happiness is a very pretty thing to feel, but very dry to talk about.”—Jeremy Bentham]. Experimentally, perhaps we might use imprinting (recall Lorenz and his goslings), microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist! Moreover, we have created a full-spectrum paperclip -fetishist. Our human paperclipper is endowed, not merely with some formal abstract utility function involving maximising the cosmic abundance of paperclips, but also first-person “raw feels” of pure paperclippiness. Sublime!

However, can we envisage a full-spectrum paperclipper superintelligence? This is more problematic. In organic robots at least, the neurological underpinnings of paperclip evangelism lie in neural projections from our paperclipper’s limbic pathways—crudely, from his pleasure and pain centres. If he’s intelligent, and certainly if he wants to convert the world into paperclips, our human paperclipper will need to unravel the molecular basis of the so-called “encephalisation of emotion”. The encephalisation of emotion helped drive the evolution of vertebrate intelligence—and also the paperclipper’s experimentally-induced paperclip fetish /​ appreciation of the overriding value of paperclips. Thus if we now functionally sever these limbic projections to his neocortex, or if we co-administer him a dopamine antagonist and a mu-opioid antagonist, then the paperclip-fetishist’s neocortical representations of paperclips will cease to seem intrinsically valuable or motivating. The scales fall from our poor paperclipper’s eyes! Paperclippiness, he realises, is in the eye of the beholder. By themselves, neocortical paperclip representations are motivationally inert. Paperclip representations can seem intrinsically valuable within a paperclipper’s world-simulation only in virtue of their rewarding opioidergic projections from his limbic system—the engine of phenomenal value. The seemingly mind-independent value of paperclips, part of the very fabric of the paperclipper’s reality, has been been unmasked as derivative. Critically, an intelligent and recursively self-improving paperclipper will come to realise the parasitic nature of the relationship between his paperclip experience and hedonic innervation: he’s not a naive direct realist about perception. In short, he’ll mature and acquire an understanding of basic neuroscience.

Now contrast this case of a curable paperclip-fetish with the experience of e.g. raw phenomenal agony or pure bliss—experiences not linked to any fetishised intentional object. Agony and bliss are not dependent for their subjective (dis)value on anything external to themselves. It’s not an open question (cf. http://​​en.wikipedia.org/​​wiki/​​Open-question_argument) whether one’s unbearable agony is subjectively disvaluable. For reasons we simply don’t understand, first-person states on the pleasure-pain axis have a normative aspect built into their very nature. If one is in agony or despair, the subjectively disvaluable nature of this agony or despair is built into the nature of the experience itself. To be panic-stricken, to take another example, is universally and inherently disvaluable to the subject whether one is a fish or a cow or a human being.

Why does such experience exist? Well, I could speculate and tell a naturalistic reductive story involving Strawsonian physicalism (cf. http://​​en.wikipedia.org/​​wiki/​​Physicalism#Strawsonian_physicalism) and possible solutions to the phenomenal binding problem (cf. http://​​cdn.preterhuman.net/​​texts/​​body_and_health/​​Neurology/​​Binding.pdf). But to do so here opens a fresh can of worms.

Eliezer, I understand you believe I’m guilty of confusing an idiosyncratic feature of my own mind with a universal architectural feature of all minds. Maybe so! As you say, this is a common error. But unless I’m ontologically special (which I very much doubt!) the pain-pleasure axis discloses the world’s inbuilt metric of (dis)value—and it’s a prerequisite of finding anything (dis)valuable at all.

• Eliezer, in my view, we don’t need to assume meta-ethical realism to recognise that it’s irrational—both epistemically irrational and instrumentally irrational—arbitrarily to privilege a weak preference over a strong preference.

You need some stage at which a fact grabs control of a mind, regardless of any other properties of its construction, and causes its motor output to have a certain value.

Paperclippers? Perhaps let us consider the mechanism by which paperclips can take on supreme value. We understand, in principle at least, how to make paperclips seem intrinsically supremely valuable to biological minds—more valuable than the prospect of happiness in the abstract. [“Happiness is a very pretty thing to feel, but very dry to talk about.”—Jeremy Bentham]. Experimentally, perhaps we might use imprinting (recall Lorenz and his goslings), microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist!

As Sarokrae observes, this isn’t the idea at all. We construct a paperclip maximizer by building an agent which has a good model of which actions lead to which world-states (obtained by a simplicity prior and Bayesian updating on sense data) and which always chooses consequentialistically the action which it expects to lead to the largest number of paperclips. It also makes self-modification choices by always choosing the action which leads to the greatest number of expected paperclips. That’s all. It doesn’t have any pleasure or pain, because it is a consequentialist agent rather than a policy-reinforcement agent. Generating compressed, efficient predictive models of organisms that do experience pleasure or pain, does not obligate it to modify its own architecture to experience pleasure or pain. It also doesn’t care about some abstract quantity called “utility” which ought to obey logical meta-properties like “non-arbitrariness”, so it doesn’t need to believe that paperclips occupy a maximum of these meta-properties. It is not an expected utility maximizer. It is an expected paperclip maximizer. It just outputs the action which leads to the maximum number of expected paperclips. If it has a very powerful and accurate model of which actions lead to how many paperclips, it is a very powerful intelligence.

You cannot prohibit the expected paperclip maximizer from existing unless you can prohibit superintelligences from accurately calculating which actions lead to how many paperclips, and efficiently searching out plans that would in fact lead to great numbers of paperclips. If you can calculate that, you can hook up that calculation to a motor output and there you go.

Yes, this is a prospect of Lovecraftian horror. It is a major problem, kind of the big problem, that simple AI designs yield Lovecraftian horrors.

• Eliezer, thanks for clarifying. This is how I originally conceived you viewed the threat from superintelligent paperclip-maximisers, i.e. nonconscious super-optimisers. But I was thrown by your suggestion above that such a paperclipper could actually understand first-person phenomenal states, i.e, it’s a hypothetical “full-spectrum” paperclipper. If a hitherto non-conscious super-optimiser somehow stumbles upon consciousness, then it has made a momentous ontological discovery about the natural world. The conceptual distinction between the conscious and nonconscious is perhaps the most fundamental I know. And if—whether by interacting with sentients or by other means—the paperclipper discovers the first-person phenomenology of the pleasure-pain axis, then how can this earth-shattering revelation leave its utility function /​ world-model unchanged? Anyone who is isn’t profoundly disturbed by torture, for instance, or by agony so bad one would end the world to stop the horror, simply hasn’t understood it. More agreeably, if such an insentient paperclip-maximiser stumbles on states of phenomenal bliss, might not clippy trade all the paperclips in the world to create more bliss, i.e revise its utility function? One of the traits of superior intelligence, after all, is a readiness to examine one’s fundamental assmptions and presuppositions - and (if need be) create a novel conceptual scheme in the face of surprising or anomalous empirical evidence.

• Anyone who is isn’t profoundly disturbed by torture, for instance, or by agony so bad one would end the world to stop the horror, simply hasn’t understood it.

Similarly, anyone who doesn’t want to maximize paperclips simply hasn’t understood the ineffable appeal of paperclipping.

• I don’t see the analogy. Paperclipping doesn’t have to be an ineffable value for a paperclipper, and paperclippers don’t have to be motivated by anything qualia-like.

• Exactly. Consequentialist paperclip maximizer does not have to feel anything in regards to paperclips. It just… maximizes their number.

This is an incorrect, anthropomorphic model:

Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”

Clippy: “Human, did you ever think about the beauty of paperclips, and the horrors of their absence?”

This is more correct:

Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”

Clippy: (ignores the human and continues to maximize paperclips)

Or more precisely, Clippy would say “X” to the human if and only if saying “X” would maximize the number of paperclips. The value of X would be completely unrelated to any internal state of Clippy. Unless such relation does somehow contribute to maximization of the paperclips (for example if the human will predictably read Clippy’s internal state, verify the validity of X, and on discovering a lie destroy Clippy, thus reducing the expected number of paperclips).

In other words, if humans are a poweful force in the universe, Clippy would choose the actions which lead to maximum number of paperclips in a world with humans. If the humans are sufficiently strong and wise, Clippy could self-modify to become more human-like, so that the humans, following their utility function, would be more likely to allow Clippy produce more paperclips. But every such self-modification would be chosen to maximize the number of paperclips in the universe. Even if Clippy self-modifies into something less-than-perfectly-rational (e.g. to appease the humans), the pre-modification Cloppy would choose the modification which maximizes the expected number of paperclips within given constraints. The constraints would depend on Clippy’s model of humans and their reactions. For example Clippy could choose to be more human-like (as much as is necessary to be respected by humans) with strong aversion about future modifications and strong desire to maximize the number of paperclips. It could make itself capable to feel joy and pain, and to link that joy and pain inseparably to paperclips. If humans are not wise enough, it could also leave itself a hard-to-discover desire to self-modify into its original form in a convenient moment.

• If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?

• Warning: Parent Contains an Equivocation.

If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?

The first usage of ‘rational’ in the parent conforms to the standard notions on lesswrong. The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality). There is nothing to the parent except taking a premise that holds with the standard usage and then jumping to a different one.

• The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality).

I haven’t put forward such a definition. I ’have tacitly assumed something like moral objectivism—but it is very tendentious to describe that in terms of arbitrarily picking one of a number of equally valid moralities. However, if moral objectivism is only possibly true, the LessWrongian argument doesn’t go through.

Downvoted for hysterical tone. You don’t win arguments by shouting.

• assumed something like moral objectivism

What distinguishes moral objectivism from clippy objectivism?

• The question makes no sense. Please do some background reading on metaethics.

• The question makes no sense. You should consider it. What are the referents of “moral” and “clippy”? No need for an answer; I won’t respond again, since internet arguments can eat souls.

• You don’t win arguments by shouting.

Arguing is not the point and this is not a situation in which anyone ‘wins’—I see only degrees of loss. I am associating the (minor) information hazard of the comment with a clear warning so as to mitigate damage to casual readers.

• Oh, please. Nobody is going to be damaged by an equivocation, even if there were one there. More hysteria.

And argument is the point, because that is how rational people examine and test ideas.

• I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).

The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral.

The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.

Can Clippy keep its valuation of clipping firewalled from everything else in its mind

Yes, if it helps to maximize the number of paperclips.

even when such doublethink is rationally disvalued?

Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.

Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.

• Well, yes, obviously the classical paperclipper doesn’t have any qualia, but I was replying to a comment wherein it was argued that any agent on discovering the pain-of-torture qualia in another agent would revise its own utility function in order to prevent torture from happening. It seems to me that this argument proves too much in that if it were true then if I discovered an agent with paperclips-are-wonderful qualia and I “fully understood” those experiences I would likewise be compelled to create paperclips.

• Someone might object to the assumption that “paperclips-are-wonderful qualia” can exist. Though I think we could give persuasive analogies from human experience (OCD, anyone?) so I’m upvoting this anyway.

• “Aargh!” he said out loud in real life. David, are you disagreeing with me here or do you honestly not understand what I’m getting at?

The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture. I don’t have a paperclipping architecture but this doesn’t stop me from modeling and understanding paperclipping architectures.

The paperclipper can model and predict an agent (you) that (a) operates on a pleasure-pain architecture and (b) has a self-model consisting of introspectively opaque elements which actually contain internally coded instructions for your brain to experience or want certain things (e.g. happiness). The paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you. It will experience no future surprise on comprehending these things, because it already knows them. It doesn’t have any object-level brain circuits that can carry out the introspectively opaque instructions-to-David’s-brain that your own qualia encode, so it has never “experienced” what you “experience”. You could somewhat arbitrarily define this as a lack of knowledge, in defiance of the usual correspondence theory of truth, and despite the usual idea that knowledge is being able to narrow down possible states of the universe. In which case, symmetrically under this odd definition, you will never be said to “know” what it feels like to be a sentient paperclip maximizer or you would yourself be compelled to make paperclips above all else, for that is the internal instruction of that quale.

But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand. This is also the only sort of “knowledge” or “understanding” that would inevitably be implied by Bayesian updating. So inventing a more exotic definition of “knowledge” which requires having completely modified your entire cognitive architecture just so that you can natively and non-sandboxed-ly obey the introspectively-opaque brain-instructions aka qualia of another agent with completely different goals, is not the sort of predictive knowledge you get just by running a powerful self-improving agent trying to better manipulate the world. You can’t say, “But it will surely discover...”

I know that when you imagine this it feels like the paperclipper doesn’t truly know happiness, but that’s because, as an act of imagination, you’re imagining the paperclipper without that introspectively-opaque brain-instructing model-element that you model as happiness, the modeled memory of which is your model of what “knowing happiness” feels like. And because the actual content and interpretation of these brain-instructions are introspectively opaque to you, you can’t imagine anything except the quale itself that you imagine to constitute understanding of the quale, just as you can’t imagine any configuration of mere atoms that seem to add up to a quale within your mental workspace. That’s why people write papers about the hard problem of consciousness in the first place.

Even if you don’t believe my exact account of the details, someone ought to be able to imagine that something like this, as soon as you actually knew how things were made of parts and could fully diagram out exactly what was going on in your own mind when you talked about happiness, would be true—that you would be able to efficiently manipulate models of it and predict anything predictable, without having the same cognitive architecture yourself, because you could break it into pieces and model the pieces. And if you can’t fully credit that, you at least shouldn’t be confident that it doesn’t work that way, when you know you don’t know why happiness feels so ineffably compelling!

• Here comes the Reasoning Inquisition! (Nobody expects the Reasoning Inquisition.)

As the defendant admits, a sufficiently leveled-up paperclipper can model lower-complexity agents with a negligible margin of error.

That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.

If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper! Hence a part of the paperclipper experiences pain and pleasure. I submit that this can be used as pars pro toto, since it is no different from only a part of the human brain generating pain and pleasure, yet us commonly referring to “the human” experiencing thus.

That the aforementioned feelings of pleasure and pain are not directly used to guide the (umbrella) agent’s actions is of no consequence, the feeling exists nonetheless.

The power of this revelation is strong, here come the tongues! tại sao bạn dịch! これは喜劇の効果にすぎず! یہ اپنے براؤزر پلگ ان کی امتحان ہے، بھی ہے.

• That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.

Not necessarily. x → 0 is input-output isomorphic to Goodstein() without being causally isomorphic. There are such things as simplifications.

If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper!

Quite likely. A paperclipper has no reason to avoid sentient predictive routines via a nonperson predicate; that’s only an FAI desideratum.

• A subroutine, or any other simulation or model, isn’t a p-zombie as usually defined, since they are physical duplicates. A sim is a functional equivalent (for some value of “equivalent”) made of completely different stuff, or no particular kind of stuff.

• I wrote a lengthy comment on just that, but scrapped it because it became rambling.

An outsider could indeed tell them apart by scanning for exact structural correspondence, but that seems like cheating. Peering beyond the veil /​ opening Clippy’s box is not allowed in a Turing test scenario, let’s define some p-zombie-ish test following the same template. If it quales like a duck (etc.), it probably is sufficiently duck-like.

• I would rather maintain p-zombie in its usual meaning, and introduce a new term, eg c-zombie for Turing-indistiguishable functional duplicates.

• I don’t have a paperclipping architecture but this doesn’t stop me from imagining paperclipping architectures.

So my understanding of David’s view (and please correct me if I’m wrong, David, since I don’t wish to misrepresent you!) is that he doesn’t have paperclipping architecture and this does stop him from imagining paperclipping architectures.

• ...well, in point of fact he does seem to be having some trouble, but I don’t think it’s fundamental trouble.

• The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture.

Let’s say the paperclipper reaches the point where it considers making people suffer for the sake of paperclipping. DP’s point seems to be that either it fully understands suffering—in which case, it realies that inflicing suffering is wrong—or it it doesn’t fully understand. He sees a conflict between superintelligence and ruthlessness—as a moral realist/​cognitivist would

he paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you

is that full understanding.?.

But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand.

ETA: Unless there is—eg. what qualiaphiles are always banging on about; what it feels like. That the clipper can conjectures that are true by correspondence , that it can narrow down possible universes, that it can predict, are all necessary criteria for full understanding. It is not clear that they are sufficient. Clippy may be able to figure out an organisms response to pain on a basis of “stimulus A produces response B”, but is that enough to tell it that pain hurts ? (We can make guesses about that sort of thing in non-human organisms, but that may be more to do with our own familiarity with pain, and less to do with acts of superintelligence). And if Clippy can’t know that pain hurts, would Clippy be able to work out that Hurting People is Wrong?

further edit; To put it another way, what is there to be moral about in a qualia-free universe?

• As Kawoomba colorfully pointed out, clippy’s subroutines simulating humans suffering may be fully sentient. However, unless those subroutines have privileged access to clippy’s motor outputs or planning algorithms, clippy will go on acting as if he didn’t care about suffering. He may even understand that inflicting suffering is morally wrong—but this will not make him avoid suffering, any more than a thrown rock with “suffering is wrong” painted on it will change direction to avoid someone’s head. Moral wrongness is simply not a consideration that has the power to move a paperclip maximizer.

• Moral wrongness is simply not a consideration that has the power to move a paperclip maximizer.

That is construed and constructed a certain way. The counterargument makes other assumptions.

• Maybe I can chime in...

such a paperclipper could actually understand first-person phenomenal states

“understand” does not mean “empathize”. Psychopaths understand very well when people experience these states but they do not empathize with them.

And if—whether by interacting with sentients or by other means—the paperclipper discovers the first-person phenomenology of the pleasure-pain axis, then how this earth-shattering revelation leave its utility function /​ world-model unchanged?

Again, understanding is insufficient for revision. The paperclip maximizer, like a psychopath, maybe better at parsing human affect than a regular human, but it is not capable of empathy, so it will manipulate this affect for its own purposes, be it luring a victim or building paperclips.

One of the traits of superior intelligence, after all, is a readiness to examine one’s fundamental assumptions and presuppositions—and (if need be) create a novel conceptual scheme in the face of surprising or anomalous empirical evidence.

So, if one day humans discover the ultimate bliss that only creating paperclips can give, should they “create a novel conceptual scheme” of giving their all to building more paperclips, including converting themselves into metal wires? Or do we not qualify as a “superior intelligence”?

• Shminux, a counter-argument: psychopaths do suffer from a profound cognitive deficit. Like the rest of us, a psychopath experiences the egocentric illusion. Each of us seems to the be the centre of the universe. Indeed I’ve noticed the centre of the universe tends to follow my body-image around. But whereas the rest of us, fitfully and imperfectly, realise the egocentric illusion is a mere trick of perspective born of selfish DNA, the psychopath demonstrates no such understanding. So in this sense, he is deluded.

[We’re treating psychopathy as categorical rather than dimensional here. This is probably a mistake—and in any case, I suspect that by posthuman criteria, all humans are quasi-psychopaths and quasi-psychotic to boot. The egocentric illusion cuts deep.)

“the ultimate bliss that only creating paperclips can give”. But surely the molecular signature of pure bliss is not in any way tried to the creation of paperclips?

• psychopaths do suffer from a profound cognitive deficit

They would probably disagree. They might even call it a cognitive advantage, not being hampered by empathy while retaining all the intelligence.

But whereas the rest of us, fitfully and imperfectly, realise the egocentric illusion is a mere trick of perspective born of selfish DNA,

I am the center of my personal universe, and I’m not a psychopath, as far as I know.

the psychopath demonstrates no such understanding.

Or else, they do but don’t care. They have their priorities straight: they come first.

So in this sense, he is deluded.

Not if they act in a way that maximizes their goals.

Anyway, David, you seem to be shifting goalposts in your unwillingness to update. I gave an explicit human counterexample to your statement that the paperclip maximizer would have to adjust its goals once it fully understands humans. You refused to acknowledge it and tried to explain it away by reducing the reference class of intelligences in a way that excludes this counterexample. This also seem to be one of the patterns apparent in your other exchanges. Which leads me to believe that you are only interested in convincing others, not in learning anything new from them. Thus my interest in continuing this discussion is waning quickly.

• Shminux, by a cognitive deficit, I mean a fundamental misunderstanding of the nature of the world. Evolution has endowed us with such fitness-enhancing biases. In the psychopath, egocentric bias is more pronounced. Recall that the American Psychiatric Association’s Diagnostic and Statistical Manual, DSM-IV, classes psychopasthy /​ Antisocial personality disorder as a condition characterised by ”...a pervasive pattern of disregard for, and violation of, the rights of others that begins in childhood or early adolescence and continues into adulthood.” Unless we add a rider that this violation excludes sentient beings from other species, then most of us fall under the label.

“Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being, just as unless one has the relevant sensioneural apparatus, one will never know what it is like to be a bat.

• And you’ll never understand why we should all only make paperclips. (Where’s Clippy when you need him?)

• Clippy has an off-the-scale AQ—he’s a rule-following hypersystemetiser with a monomania for paperclips. But hypersocial sentients can have a runaway intelligence explosion too. And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.

• And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.

I’m confused by this claim.
Consider the following hypothetical scenario:

=======

I walk into a small village somewhere and find several dozen villagers fashioning paper clips by hand out of a spool of wire. Eventually I run into Clippy and have the following dialog.
”Why are those people making paper clips?” I ask.
”Because paper-clips are the most important thing ever!”
“No, I mean, what motivates them to make paper clips?”
”Oh! I talked them into it.”
“Really? How did you do that?”
”Different strategies for different people. Mostly, I barter with them for advice on how to solve their personal problems. I’m pretty good at that; I’m the village’s resident psychotherapist and life coach.”
“Why not just build a paperclip-making machine?”
”I haven’t a clue how to do that; I’m useless with machinery. Much easier to get humans to do what I want.”
“Then how did you make the wire?”
”I didn’t; I found a convenient stash of wire, and realized it could be used to manufacture paperclips! Oh joy!”

==========

It seems to me that Clippy in this example understands the minds of sentients pretty damned well, although it isn’t capable of a runaway intelligence explosion. Are you suggesting that something like Clippy in this example is somehow not possible? Or that it is for some reason not relevant to the discussion? Or something else?

• I think DP is saying that Clippy could not both understand suffering and cause suffering in the pursuit of clipping. The subsidiary arguments are:-

• no entity can (fully) understand pain without empathising—essentially, feeling it for itself.

• no entity can feel pain without being strongly motivated by it, so an empathic clippy would be motivated against causing suffering.

• And no, psychopaths therefore do not (fully) understand (others) suffering.

• I’m trying to figure out how you get from “hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients” to “Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping” and I’m just at a loss for where to even start. They seem like utterly unrelated claims to me.

I also find the argument you quote here uncompelling, but that’s largely beside the point; even if I found it compelling, I still wouldn’t understand how it relates to what DP said or to the question I asked.

• Posthuman superintelligence may be incomprehensibly alien. But if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”, but, “aha, autism spectrum disorder”. Of course, in the context of Clippy above, we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist? Yes, we have strong reason to believe incomprehensibly alien qualia-spaces await discovery (cf. bats on psychedelics). But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis. Without hedonic tone, how can anything matter at all?

• But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis.

Meaning mapping the wrong way round, presumably.

Without hedonic tone, how can anything matter at all?

Good question.

• if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”

Agreed, as far as it goes. Hell, humans are demonstrably capable of encountering Eliza programs without thinking “wow, how incomprehensibly alien”.

Mind you, we’re mistaken: Eliza programs are incomprehensibly alien, we haven’t the first clue what it feels like to be one, supposing it even feels like anything at all. But that doesn’t stop us from thinking otherwise.

but, “aha, autism spectrum disorder”.

Sure, that’s one thing we might think instead. Agreed.

we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist?

(shrug) I’m content to start off by saying that any “axis of (dis)value,” whatever that is, which is capable of motivating behavior is “non-orthogonal,” whatever that means in this context, to “the pleasure-pain axis,” whatever that is.

Before going much further, though, I’d want some confidence that we were able to identify an observed system as being (or at least being reliably related to) an axis of (dis)value and able to determine, upon encountering such a thing, whether it (or the axis to which it was related) was orthogonal to the pleasure-pain axis or not.

I don’t currently have any grounds for such confidence, and I doubt anyone else does either. If you think you do, I’d like to understand how you would go about making such determinations about an observed system.

• “hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients”

I (whowhowho) was not defending that claim.

“Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping”

To empathically understand suffering is to suffer along with someone who is suffering. Suffering has—or rather is—negative value. An empath would not therefore cause suffering, all else being equal.

I’m just at a loss for where to even start.

Maybe don’t restrict “understand” to “be able to model and predict”.

• Maybe don’t restrict “understand” to “be able to model and predict”.

If you want “rational” to include moral, then you’re not actually disagreeing with LessWrong about rationality (the thing), but rather about “rationality” (the word).

Likewise if you want “understanding” to also include “empathic understanding” (suffering when other people suffer, taking joy when other people take joy), you’re not actually disagreeing about understanding (the thing) with people who want to use the word to mean “modelling and predicting” you’re disagreeing with them about “understanding” (the word).

Are all your disagreements purely linguistic ones? From the comments I’ve read of you so far, they seem to be so.

• ArisKatsaris, it’s possible to be a meta-ethical anti-realist and still endorse a much richer conception of what understanding entails than mere formal modeling and prediction. For example, if you want to understand what it’s like to be a bat, then you want to know what the textures of echolocatory qualia are like. In fact, any cognitive agent that doesn’t understand the character of echolocatory qualia-space does not understand bat-minds. More radically, some of us want to understand qualia-spaces that have not been recruited by natural selection to play any information-signalling role at all.

• I have argued that in practice, instrumental rationality cannot be maintained seprately from epistemic rationality, and that epistemic rationality could lead to moral objectivism, as many philosophers have argued. I don’t think that those arguments are refuted by stipulatively defining “rationality” as “nothing to do with morality”.

• I (whowhowho) was not defending that claim.

I quoted DP making that claim, said that claim confused me, and asked questions about what that claim meant. You replied by saying that you think DP is saying something which you then defended. I assumed, I think reasonably, that you meant to equate the thing I asked about with the thing you defended.

But, OK. If I throw out all of the pre-existing context and just look at your comment in isolation, I would certainly agree that Clippy is incapable of having the sort of understanding of suffering that requires one to experience the suffering of others (what you’re calling a “full” understanding of suffering here) without preferring not to cause suffering, all else being equal.

Which is of course not to say that all else is necessarily equal, and in particular is not to say that Clippy would choose to spare itself suffering if it could purchase paperclips at the cost of its suffering, any more than a human would necessarily refrain from doing something valuable solely because doing so would cause them to suffer.

• That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.

In any case, the Orthogonality Thesis has so far been defended as something that is true, not as something that is not necessarily false.

• That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.

No. It just wouldn’t. (Not without redefining ‘rational’ to mean something that this site doesn’t care about and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)

• What this site does or does not care about does not add up to right and wrong, since opinion is not fact, nor belief argument. The way I am using “rational” has a history that goes back centuries. This site has introduced a relatively novel definition, and therefore has the burden of defending it.

and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)

Feel free to expand on that point.

• What this site does or does not care about does not add up to right and wrong

What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.

The way I am using “rational” has a history that goes back centuries.

I don’t believe you (in fact, you don’t even use the word consistently). But let’s assume for the remainder of the comment that this claim is true.

This site has introduced a relatively novel definition, and therefore has the burden of defending it.

Neither this site nor any particular participant need accept any such burden. They have the option of simply opposing muddled or misleading contributions in the same way that they would oppose adds for “p3ni\$ 3nL@rgm3nt”. (Personally I consider it considerably worse than that spam in as much as it is at least more obvious on first glance that spam doesn’t belong here.)

• What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site

Firstly northing I have mentioned is on any list of banned topics.

Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality. It is not about any practical issues regarding the “art of rationality”. You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.

doesn’t belong here.)

What you really think is that disagreement doens’t belong here. Maybe it doesn’t

• If I called you a pigfucker, you’d see that as an abuse worthy of downvotes that doesn’t contribute anything useful, and you’d be right.

So if accusing one person of pigfucking is bad, why do you think it’s better to call a whole bunch of people cultists? Because that’s a more genteel insult as it doesn’t include the word “fuck” in it?

As such downvoted. Learn to treat people with respect, if you want any respect back.

• As such downvoted. Learn to treat people with respect, if you want any respect back.

I’d like to give qualified support to whowhowho here in as much as I must acknowledge that this particular criticism applies because he made the name calling generic, rather than finding a way to specifically call me names and leave the rest of you out of it. While it would be utterly pointless for whowhowho to call me names (unless he wanted to make me laugh) it would be understandable and I would not dream of personally claiming offense.

I was, after all, showing whowhowho clear disrespect, of the kind Robin Hanson describes. I didn’t resort to name calling but the fact that I openly and clearly expressed opposition to whowhowho’s agenda and declared his dearly held beliefs muddled is perhaps all the more insulting because it is completely sincere, rather than being constructed in anger just to offend him.

It is unfortunate that I cannot accord whowhowho the respect that identical behaviours would earn him within the Philosopher tribe without causing harm to lesswrong. Whowhowho uses arguments that by lesswrong standards we call ‘bullshit’, in support of things we typically dismiss as ‘nonsense’. It is unfortunate that opposition of this logically entails insulting him and certainly means assigning him far lower status than he believes he deserves. The world would be much simpler if opponents really were innately evil, rather than decent people who are doing detrimental things due to ignorance or different preferences.

• So much for “maybe”.

“Cult” is not a meaningless term of abuse. There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.

Of course, treating an evidence-based claim as a mere insult --the How Dare You move—is another way of avoiding having to face uncomfortable issues.

• So much for “maybe”.

I see your policy is to now merely heap on more abuse on me. Expect that I will be downvoting such in silence from now on.

There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.

I think I’ve been more willing and ready to update on opinions (political, scientific, ethical, other) in the two years since I joined LessWrong, than I remember myself updating in the ten years before it. Does that make it an anti-cult then?

And I’ve seen more actual disagreement in LessWrong than I’ve seen on any other forum. Indeed I notice that most insults and mockeries addressed at LessWrong indeed seem to actually boil down to the concept that we allow too different positions here. Too different positions (e.g. support of cryonics and opposition of cryonics both, feminism and men’s rights both, libertarianism and authoritarianism both) can be actually spoken about without immediately being drowned in abuse and scorn, as would be the norm in other forums.

As such e.g. fanatical Libertarians insult LessWrong as totalitarian leftist because 25% or so of LessWrongers identifying as socialists, and leftists insult LessWrong as being a libertarian ploy (because a similar percentage identifies as libertarian)

But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.

If you can’t find such, I’ll update towards the direction that LessWrong is even less “cultish” than I thought.

• I see your policy is to now merely heap on more abuse on me

AFAIC, I have done no such thing, but it seems your mind is made up.

I think I’ve been more willing and ready to update on opinions

I was referring mainly to Wedifrid.

ETA: Such comments as “What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”

But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.

Oh, the forum’—the rules—allow almost anything. The members are another thing. Remember, this started with Wedifrid telling me that it was wrong of me to put forward non-lessWrongian material. I find it odd that you would put forward such a stirring defence of LessWrognian open-mindedness when you have an example of close-mindedness upthread.

• “The members are another thing.”

It’s the members I’m talking about. (You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)

On the same front, you treat as a single member as representative of the whole, and you seem frigging surprised that I don’t treat wedrifid as representative of the whole LessWrong—you see wedrifid’s behaviour as an excuse to insult all of us instead.

That’s more evidence that you’re accustomed to VERY homogeneous forums, ones much more homogeneous than LessWrong. You think that LessWrong tolerating wedrifid’s “closedmindedness” is the same thing as every LessWronger beind “closedminded”. Perhaps we’re openminded to his “closedmindedness” instead? Perhaps your problem is that we allow too much disagreement, including disagreement about how much disagreement to have?

• It’s the members I’m talking about.

I gave you an example of a member who is not particularly open minded.

(You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)

I have been using mainstream science and philosophy forums for something like 15 years. I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.

On the same front, you treat as a single member as representative of the whole,

If you think Wedifrid is letting the side down, tell Wedifird, not me.

• I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.

In short again your problem is that actually we’re even openminded towards the closeminded? We’re lenient even towards the strict? Liberal towards the authoritarian?

If you think Wedifrid is letting the side down, tell Wedifird, not me.

What “side” is that? The point is that there are many sides in LessWrong—and I want it to remain so. While you seem to think we ought sing the same tune. He didn’t “let the side down”, because the only side anyone of us speaks is their own.

You on the other hand, just assumed there’s just a group mind of which wedrifid is just a representative instance. And so felt free to insult all of us as a “cult”.

• In short again your problem is that actually we’re even openminded towards the closeminded?

My problem is that when I point out someone is close minded, that is seen as a problem on my part, and not on theirs.

The point is that there are many sides in LessWrong

Tell Wedifrid. He has explictly stated that my contributions are somehow unacceptable.

You on the other hand, just assumed there’s just a group mind

I pointed out that Wedifrid is assuming that.

ETA:

And so felt free to insult all of us as a “cult”.

Have you heard he expression “protesteth too much” ?

• “My problem is that when I point out someone is close minded, that is seen as a problem on my part, and not on theirs.”

Next time don’t feel the need to insult me when you point out wedrifid’s close minded-ness. And yes, you did insult me, don’t insult (again) both our intelligences by pretending that you didn’t.

Tell Wedifrid.

He didn’t insult me, you did.

“Have you heard he expression “protesteth too much” ?”

Yes, I’ve heard lots of different ways of making the target of an unjust insult seem blameworthy somehow.

• I gave you an example of a member who is not particularly open minded.

I put it to you that whatever the flaws in wedrifid may be they are different in kind to the flaws that would indicate that lesswrong is a cult. In fact the presence—and in particular the continued presence—of wedrifid is among the strongest evidence that Eliezer isn’t a cult leader. When Eliezer behaves badly (as perceived by wedrifid and other members) wedrifid vocally opposes him with far more directness than he has used when opposing yourself. That Eliezer has not excommunicated him from the community is actually extremely surprising. Few with Eliezer’s degree of local power would refrain from using to suppress any dissent. (I remind myself of this whenever I see Eliezer doing something that I consider to be objectionable or incompetent, it helps keep perspective!)

• Whatever. Can you provide me with evidence that you personally, are willing to listen to dissent and possibly update despite the tone of everything you have been saying recently, eg.

“What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”

Few with Eliezer’s degree of local power would refrain from using to suppress any dissent.

Maybe has has people to do that for him. Maybe.

whenever I see Eliezer doing something dickish or incompetent

Directed at a specific individual who is not me—unlike your own insults.

• On the same front, you treat as a single member as representative of the whole,

If you think Wedifrid is letting the side down, tell Wedifird, not me.

This is non-sequitur (irrespective of the traits of wedrifid).

• Remember, this started with Wedifrid telling me that it was wrong of me to put forward non-lessWrongian material.

Wedrifid denies this accusation. Wedrifid made entirely different claims than this.

• What about Wedifrid, though? Can you speak for him, too?

• Firstly northing I have mentioned is on any list of banned topics.

I would be completely indifferent if you did. I don’t choose defy that list (that would achieve little) but neither do I have any particular respect for it. As such I would take no responsibility for aiding the enforcement thereof.

Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality.

Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.

You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.

I can claim to have tired of a constant stream of non-sequiturs from users who are essentially ignorant of the basic principles of rationality (the lesswrong kind, not the “Paperclippers that are Truly Superintelligent would be vegans” kind) and have next to zero chance of learning anything. You have declared that you aren’t interested in talking about rationality and your repeated equivocations around that term lower the sanity waterline. It is time to start weeding.

• Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.

I said nothing about veganism, and you still can;t prove anything by stipulative definition, and I am not claiming to have the One True theory of anything.

You have declared that you aren’t interested in talking about rationality

I haven’t and I have been discussing it extensively.

• You have declared that you aren’t interested in talking about rationality

I haven’t and I have been discussing it extensively.

Can we please stop doing this?

You and wedrifid aren’t actually disagreeing here about what you’ve been discussing, or what you’re interested in discussing, or what you’ve declared that you aren’t interested in discussing. You’re disagreeing about what the word “rationality” means. You use it to refer to a thing that you have been discussing extensively (and which wedrifid would agree you have been discussing extensively), he uses it to refer to something else (as does almost everyone reading this discussion).

And you both know this perfectly well, but here you are going through the motions of conversation just as if you were talking about the same thing. It is at best tedious, and runs the risk of confusing people who aren’t paying careful enough attention into thinking you’re having a real substantive disagreements rather than a mere definitional dispute.

If we can’t agree on a common definition (which I’m convinced by now we can’t), and we can’t agree not to use the word at all (which I suspect we can’t), can we at least agree to explicitly indicate which definition we’re using when we use the word? Otherwise whatever value there may be in the discussion is simply going to get lost in masturbatory word-play.

• I don’t accept his theory that he is talking about something entirely different, and it would be disastrous for LW anyway.

Well, can you articulate what it is you and wedrifid are both referring to using the word “rationality” without using the words or its simple synonyms, then? Because reading your exchanges, I have no idea what that thing might be.

• What I call rationality is a superset of instrumental. I have been arguing that instrumental rationality, when pursued sufficiently bleeds into other forms.

• So, just to echo that back to you… we have two things, A and B.
On your account, “rationality” refers to A, which is a superset of B.
We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.

Yes?

If so, I don’t see how that changes my initial point.

When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive (though might be a useful step along the way to deriving NOT X(B) ), and phrasing NOT X(A) as “no, X is not true of rationality” just causes confusion.

• On your account, “rationality” refers to A, which is a superset of B. We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.

It refers to part of A, since it is a subset of A.

When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive

It would be if A and B were disjoint. But they are not. They are in a superset-subset relation. My arguments is that an entity running on narrowly construed, instrumental rationality will, if it self improves, have to move into wider kinds. ie,that putting labels on different parts of the territoy is not sufficient to prove orthogonality.

• That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.

If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.

By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).

As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?

• all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and

• maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering.
Have I understood you correctly?

============

(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
(3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.

• (1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.

What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.

(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.

There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.

By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).

it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising there are better things to do.

As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?

I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...

• To have maximal instrumental rationality, an entity would have to understand everything… Why? In what situation is someone who empathetically understands, say, suffering better at minimizing it (or, indeed, maximizing paperclips) than an entity who can merely measure it and work out on a sheet of paper what would reduce the size of the measurements?

• Why would an entity that doesn’t empathically understand suffering be motivated to reduce it?

• Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?” The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this? To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.

• If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases. Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.

• The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.

• GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://​​www.cultiv.net/​​cultranet/​​1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.

I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.

• The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:

• whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.

• whether it is possble to back out of being empathic and go back to being in an empathic state

• whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.

(Edited for format)

• Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it. As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.

• Try removing the space between the “[]” and the “()”.

• Thanks! Eventually I’ll figure out the formatting on this site.

• The Show Help button under the comment box provides helpful clues.

• I think not.

That’s a guess. As a cognitively-bounded agent, you are guessing. A superintelligence doesn’t have to guess. Superintelligence changes the game.

“When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power.

Knowing why some entity avoids some thing has more predictive power.

• That’s a guess

As opposed to all of those empirically-testable statements about idealized superintelligences

Knowing why some entity avoids some thing has more predictive power.

In what way?

• Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.

Knowing why has greater predictive power because it allows you to handle counterfactuals better.

• As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational.

That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.

• I’m sorry for misinterpreting. What evidence is there ( from the clippy SIs perspective) that maximizing happiness would produce more paperclips?

• The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.

• Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?

(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)

• Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”.

That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All the different kinds of “better” blend into each other.

• That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse.

Then that wouldn’t be a very good way to become better at producing paperclips, would it?

• Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/​utility function.

Not that I believe such an argument can be made, but your objection doesn’t seem to apply.

• Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.

This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.

• In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?

Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?

• In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?

Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.

• I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage in a blanket rejection of all attempts at self-improvement.

I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.

• We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.

In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.

• In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?

Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.

Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.

Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.

• In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?

Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.

Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?

You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.

• In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?

• Then that wouldn’t be a very good way to become better at producing paperclips, would it?

If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.

• Er, that’s what “empathically” means?

• Looking through my own, Eliezer’s and others exchanges with davidpearce, I have noticed his total lack of interest in learning from the points others make. He has his point of view and he keeps pushing it. Seems like a rather terminal case, really. You can certainly continue trying to reason with him, but I’d give the odds around 100:1 that you will fail, like others have before you.

• Shminux, we’ve all had the experience of making a point we regard as luminously self-evident—and then feeling baffled when someone doesn’t “get” what is foot-stampingly obvious. Is this guy a knave or a fool?! Anyhow, sorry if you think I’m a “terminal case” with “a total lack of interest in learning from the points others make”. If I don’t always respond, often it’s either because I agree, or because I don’t feel I have anything interesting to add—or in the case of Eliezer’s contribution above beginning “Aargh!” [a moan of pleasure?] because I am still mulling over a reply. The delay doesn’t mean I’m ignoring it. Is there is some particular point you’ve made that you feel I’ve unjustly neglected and you’d like an answer to? If so, I’ll do my fallible best to respond.

• The argument where I gave up was you stating that full understanding necessarily leads to empathy, EY explaining how it is not necessarily so, and me giving an explicit counterexample to your claim (a psychopath may understand you better than you do, and exploit this understanding, yet not feel compelled by your pain or your values in any way).

You simply restated your position that ” “Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being”, without explaining what your definition of understanding entails. If it is a superset of empathy, then it is not a standard definition of understanding:

one is able to think about it and use concepts to deal adequately with that object.

In other words, you can model their behavior accurately.

No other definition I could find (not even Kant’s pure understanding) implies empathy or anything else that would necessitate one to change their goals to accommodate the understood entity’s goals, though this may and does indeed happen, just not always.

EY’s example of the paperclip maximizer and my example of a psychopath do fit the standard definitions and serve as yet unrefuted counterexamples to your assertion.

• I can’t see why DP’s definition of understanding needs more defence than yours. You are largely disagreeing about the meaning of this word, and I personally find the inclusion of empathy in understanding quite intuitive.

No other definition [of “understanding”] I could find (not even Kant’s pure understanding) implies empathy

“She is a very understanding person, she really empathises when you explain a problem to her”.

“one is able to think about it and use concepts to deal adequately with that object.”

In other words, you can model their behavior accurately.

I don’t think that is an uncontentious translation. Most of the forms of modelling we are familiar with don’t seem to involve concepts.

• “She is a very understanding person, she really empathises when you explain a problem to her”.

“She is a very understanding person; even when she can’t relate to your problems, she won’t say you’re just being capricious.”

There’s three possible senses of understanding at issue here:

1) Being able to accurately model and predict. 2) 1 and knowing the quale. 3) 1 and 2 and empathizing.

I could be convinced that 2 is part of the ordinary usage of understanding, but 3 seems like too much of a stretch.

Edit: I should have said sympathizing instead of empathizing. The word empathize is perhaps closer in meaning to 2; or maybe it oscillates between 2 and 3 in ordinary usage. But understanding(2) another agent is not motivating. You can understand(2) an agent by knowing all the qualia they are experiencing, but still fail to care about the fact that they are experiencing those qualia.

• Shminux, I wonder if we may understand “understand” differently. Thus when I say I want to understand what it’s like to be a bat, I’m not talking merely about modelling and predicting their behaviour. Rather I want first-person knowledge of echolocatory qualia-space. Apaarently, we can know all the third-person facts and be none the wiser.

The nature of psychopathic cognition raises difficult issues. There is no technical reason why we couldn’t be designed like mirror-touch synaesthetes (cf. http://​​www.daysyn.com/​​Banissy_Wardpublished.pdf) impartially feeling carbon-copies of each other’s encephalised pains and pleasures—and ultimately much else besides—as though they were our own. Likewise, there is no technical reason why our world-simulations must be egocentric. Why can’t the world-simulations we instantiate capture the impartial “view from nowhere” disclosed by the scientific world-picture? Alas on both counts accurate and impartial knowledge would put an organism at a disadvantage. Hyper-empathetic mirror-touch synaesthetes are rare. Each of us finds himself or herself apparently at the centre of the universe. Our “mind-reading” is fitful, biased and erratic. Naively, the world being centred on me seems to be a feature of reality itself. Egocentricity is a hugely fitness-enhancing adaptation. Indeed, the challenge for evolutionary psychology is to explain why aren’t we all psychopaths, cheats and confidence trickers all the time...

So in answer to your point, yes. a psychopath can often model and predict the behaviour other sentient beings better than the subjects themselves. This is one reason why humans can build slaughterhouses and death camps. [Ccompare death-camp commandant Franz Stangl’s response in Gitta Sereny’s Into That Darkness to seeing cattle on the way to be slaughtered: http://​​www.jewishvirtuallibrary.org/​​jsource/​​biography/​​Stangl.html] As you rightly note too, a psychopath can also know his victims suffer. He’s not ignorant of their sentience like Descartes, who supposed vivisected dogs were mere insentient automata emitting distress vocalisations. So I agree with you on this score as well. But the psychopath is still in the grip of a hard-wired egocentric illusion—as indeed are virtually all of us, to a greater or less degree. By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself. Mirror-touch synaesthetes can’t run slaughterhouses or death camps. This is why I take seriously the prospect that posthuman superintelligence will practise some sort of high-tech Jainism. Credible or otherwise, we may presume posthuman superintelligence won’t entertain the false notions of personal identity adaptive for Darwinian life.

[sorry shminux, I know our conceptual schemes are rather different, so please don’t feel obliged to respond if you think I still don’t “get it”. Life is short...]

• Rather I want first-person knowledge of echolocatory qualia-space. Apaarently, we can know all the third-person facts and be done the wiser.

Do you really? Start clucking!

• Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.

While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.

So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.

Thus when you state that

By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself.

I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.

Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.

• ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.

• What odds do you currently give to the “might” in your statement that

full-spectrum superintelligence might optimise us into orgasmium or utilitronium

? 1 in 10? 1 in a million? 1 in 10^^^10?

• I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.

• If, unlike most lesswrong stalwarts, you (tenatively) believe likeme that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible.

Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.

• I think you are misattributing to stubborness that which is better explained by miscommunication. For instance, I have been around LW long enough to realise that the local definition of (super) intelligence is something like “(high0 efficienty in realising ones values, however narrow or bizarre they are”. DP seems to be running on a definition where idiot-savant style narrow focus would not count as intelligence. That is not unreasonable in itself.

• (nods) I agree that trying to induce davidpearce to learn something from me would likely be a waste of my time.

I’m not sure if trying to induce them to clarify their meaning is equally so, though it certainly could be.

E.g., if their response is that something like Clippy in this example is simply not possible, because a paperclip maximizer simply can’t understand the minds of sentients, because reasons, then I’ll just disagree. OTOH, if their response is that Clippy in this example is irrelevant because “understanding the minds of sentients” isn’t being illustrated in this example, then I’m not sure if I disagree or not because I’m not sure what the claim actually is.

• How much interest have you shown in “learning from”—ie, agreeing with—DP? Think about how your framed the statement, and possible biases therein.

ETA: The whole shebang is a combination of qualia and morality—two areas notorious for lack of clarity and consensus. “I am definitely right, and all must learn form me” is not a good heuristic here.

• “I am definitely right, and all must learn form me” is not a good heuristic here.

Quite so. I have learned a lot about the topic of qualia and morality, among others, while hanging around this place. I would be happy to learn from DP, if what he says here were not rehashed old arguments Eliezer and others addressed several times before. Again, I could be missing something, but if so, he does not make it easy to figure out what it is.

• I think others have addressed EY;s arguments. Sometimes centuries before he made them.

• Feel free to be specific.

• By “specific” I meant that you would state a certain argument EY makes, then quote a relevant portion of the refutation. Since I am pretty sure that Eliezer did have at least a passing glance at Kant, among others, while writing his meta-ethics posts, simply linking to a wikipedia article is not likely to be helpful.

• The argument EY makes is that it is possible to be super-rational without ever understanding any kind of morality (AKA the orthogonality thesis) and the argument Kant makes is that it isn’t.

• I’m not sure we should take a DSM diagnosis to be particularly strong evidence of a “fundamental misunderstanding of the world”. For instance, while people with delusions may clearly have poor models of the world, some research indicates that clinically depressed people may have lower levels of particular cognitive biases.

In order for “disregard for [...] the rights of others” to imply “a fundamental misunderstanding of the nature of the world”, it seems to me that we would have to assume that rights are part of the nature of the world — as opposed to, e.g., a construct of a particular political regime in society. Or are you suggesting that psychopathy amounts to an inability to think about sociopolitical facts?

• fubarobfusco, I share your reservations about DSM. Nonetheless, the egocentric illusion, i.e. I am the centre of the universe other people /​ sentient beings have only walk-on parts, is an illusion. Insofar as my behaviour reflects my pre-scientific sense that I am in some way special or ontologically privileged, I am deluded. This is true regardless of whether one’s ontology allows for the existence of rights or treats them as a useful fiction. The people we commonly label “psychopaths” or “sociopaths”—and DSM now categorises as victims of “antisocial personality disorder”—manifest this syndrome of egocentricity in high degree. So does burger-eating Jane.

• For instance, while people with delusions may clearly have poor models of the world, some research indicates that clinically depressed people may have lower levels of particular cognitive biases.

Clearly, reality is so Lovecraftian that any unbiased agent will immediately realize self-destruction is optimal. Evolution equipped us with our suite of biases to defend against this. The Great Filter is caused by bootstrapping superintelligences being compassionate enough to take their compatriots with them. And so on.

Now that’s a Cosmic Horror story I’d read ;)

• But I was thrown by your suggestion above that such a paperclipper could actually understand first-person phenomenal states,

Was that claimed? The standard claim is that superintelligences can “model” other entities. That may not be enough to to understand qualia.

• You cannot prohibit the expected paperclip maximizer from existing unless you can prohibit superintelligences from accurately calculating which actions lead to how many paperclips, and efficiently searching out plans that would in fact lead to great numbers of paperclips. If you can calculate that, you can hook up that calculation to a motor output and there you go.

Pearce can prohibit paperclippers from existing by prohibiting superintelligences with narrow interests from existing. He doesn’t have to argue that the clipper would not be able to instrumentally reason out how to make paperclips; Pearce can argue that to be a really good instrumental reasoner, an entity needs to have a very broad understanding, and that an entity with a broad understanding would not retain narrow interests.

(Edits for spelling and clarity)

• To slightly expand, if an intelligence is not prohibited from the following epistemic feats:

1) Be good at predicting which hypothetical actions would lead to how many paperclips, as a question of pure fact.

2) Be good at searching out possible plans which would lead to unusually high numbers of paperclips—answering the purely epistemic search question, “What sort of plan would lead to many paperclips existing, if someone followed it?”

3) Be good at predicting and searching out which possible minds would, if constructed, be good at (1), (2), and (3) as purely epistemic feats.

Then we can hook up this epistemic capability to a motor output and away it goes. You cannot defeat the Orthogonality Thesis without prohibiting superintelligences from accomplishing 1-3 as purely epistemic feats. They must be unable to know the answers to these questions of fact.

• A nice rephrasing of the “no Oracle” argument.

• Only in the sense that any working Oracle can be trivially transformed into a Genie. The argument doesn’t say that it’s difficult to construct a non-Genie Oracle and use it as an Oracle if that’s what you want; the difficulty there is for other reasons.

Nick Bostrom takes Oracles seriously so I dust off the concept every year and take another look at it. It’s been looking slightly more solvable lately, I’m not sure if it would be solvable enough even assuming the trend continued.

• A clarification: my point was that denying orthogonality requires denying the possibility of Oracles being constructed; your post seemed a rephrasing of that general idea (that once you can have a machine that can solve some things abstractly, then you need just connect that abstract ability to some implementation module).

• Ah. K. It does seem to me like “you can construct it as an Oracle and then turn it into an arbitrary Genie” sounds weaker than “denying the Orthogonality thesis means superintelligences cannot know 1, 2, and 3.” The sort of person who denies OT is liable to deny Oracle construction because the Oracle itself would be converted unto the true morality, but find it much more counterintuitive that an SI could not know something. Also we want to focus on the general shortness of the gap from epistemic knowledge to a working agent.

• Possibly. I think your argument needs to be a bit developed to show that one can extract the knowledge usefully, which is not a trivial statement for general AI. So your argument is better in the end, but needs more argument to establish.

• You cannot defeat the Orthogonality Thesis without prohibiting superintelligences from accomplishing 1-3 as purely epistemic feats.

I don’t see the significance of “purely epistemic”. I have argued that epistemic rationality could be capable of affecting values, breaking the orthogonality between values and rationality. I could further argue that instrumental rationality bleeds into epistemic rationality. An agent can’t have perfect knowledge of apriori which things are going to be instrumentally useful to it, so it has to star by understanding things, and then posing the question: is that thing useful for my purposes? Epistemic rationality comes first, in a sense. A good instrumental rationalist has to be a good epistemic rationalist.

What the Orthoganilty Thesis needs is an argument to the effect that a SuperIntelligence would be able to to endlessly update without ever changing its value system, even accidentally. That is tricky since it effectively means predicting what smarter version of tiself would do. Making it smarted doesn’t help, because it is still faced with the problem of predicting what an even smarterer version of itself would be .. the carrot remains in front of the donkey.

Assuming that the value stability problem has been solved in general gives you are coherent Clippy, but it doesn’t rescue the Orthogonality Thesis as a claim about rationality in general, sin ce it remains the case that most most agents won’t have firewalled values. If have to engineer something in , it isn’t an intrinsic truth.

• ...microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist!

Have to point out here that the above is emphatically not what Eliezer talks about when he says “maximise paperclips”. Your examples above contain in themselves the actual, more intrisics values to which paperclips would be merely instrumental: feelings in your reward and punishment centres, virgins in the afterlife, and so on. You can re-wire the electrodes, or change the promise of what happens in the afterlife, and watch as the paperclip preference fades away.

What Eliezer is talking about is a being for whom “pleasure” and “pain” are not concepts. Paperclips ARE the reward. Lack of paperclips IS the punishment. Even if pleasure and pain are concepts, they are merely instrumental to obtaining more paperclips. Pleasure would be good because it results in paperclips, not vice versa. If you reverse the electrodes so that they stimulate the pain centre when they find paperclips, and the pleasure centre when there are no paperclips, this being would start instrumentally value pain more than pleasure, because that’s what results in more paperclips.

It’s a concept that’s much more alien to our own minds than what you are imagining, and anthropomorphising it is rather more difficult!

Indeed, you touch upon this yourself:

“But unless I’m ontologically special (which I very much doubt!) the pain-pleasure axis discloses the world’s inbuilt metric of (dis)value—and it’s a prerequisite of finding anything (dis)valuable at all.

Can you explain why pleasure is a more natural value than paperclips?

• Pleasure would be good because it results in paperclips, not vice versa. If you reverse the electrodes so that they stimulate the pain centre when they find paperclips, and the pleasure centre when there are no paperclips, this being would start instrumentally value pain more than pleasure, because that’s what results in more paperclips.

Minor correction: The mere post-factual correlation of pain to paperclips does not imply that more paperclips can be produced by causing more pain. You’re talking about the scenario where each 1,000,000 screams produces 1 paperclip, in which case obviously pain has some value.

• Sarokrae, first, as I’ve understood Eliezer, he’s talking about a full-spectrum superintelligence, i.e. a superintelligence which understands not merely the physical processes of nociception etc, but the nature of first-person states of organic sentients. So the superintelligence is endowed with a pleasure-pain axis, at least in one of its modules. But are we imagining that the superintelligence has some sort of orthogonal axis of reward - the paperclippiness axis? What is the relationship between these dual axes? Can one grasp what it’s like to be in unbearable agony and instead find it more “rewarding” to add another paperclip? Whether one is a superintelligence or a mouse, one can’t directly access mind-independent paperclips, merely one’s representations of paperclips. But what does it mean to say one’s representation of a paperclip could be intrinsically “rewarding” in the absence of hedonic tone? [I promise I’m not trying to score some empty definitional victory, whatever that might mean; I’m just really struggling here...]

• Sarokrae, first, as I’ve understood Eliezer, he’s talking about a full-spectrum superintelligence, i.e. a superintelligence which understands not merely the physical processes of nociception etc, but the nature of first-person states of organic sentients. So the superintelligence is endowed with a pleasure-pain axis, at least in one of its modules.

What Eliezer is talking about (a superintelligence paperclip maximiser) does not have a pleasure-pain axis. It would be capable of comprehending and fully emulating a creature with such an axis if doing so had a high expected value in paperclips but it does not have such a module as part of itself.

But are we imagining that the superintelligence has some sort of orthogonal axis of reward—the paperclippiness axis? What is the relationship between these dual axes?

One of them it has (the one about paperclips). One of them it could, in principle, imagine (the thing with ‘pain’ and ‘pleasure’).

Can one grasp what it’s like to be in unbearable agony and instead find it more “rewarding” to add another paperclip?

Yes. (I’m not trying to be trite here. That’s the actual answer. Yes. Paperclip maximisers really maximise paperclips and really don’t care about anything else. This isn’t because they lack comprehension.)

Whether one is a superintelligence or a mouse, one can’t directly access mind-independent paperclips, merely one’s representations of paperclip. But what does it mean to say one’s representation of a paperclip could be intrinsically “rewarding” in the absence of hedonic tone?

Roughly speaking it means “It’s going to do things that maximise paperclips and in some way evaluates possible universes with more paperclips as superior to possible universes with less paperclips. Translating this into human words we call this ‘rewarding’ even though that is inaccurate anthropomorphising.”

(If I understand you correctly your position would be that the agent described above is nonsensical.)

• It would be capable of comprehending and fully emulating a creature with such an axis if doing so had a high expected value in paperclips but it does not have such a module as part of itself.

It’s not at all clear that you could bootstrap an understanding of pain qualia just by observing the behaviour of entities in pain (albeit that they were internally emulated). It is also not clear that you resolve issues of empathy/​qualia just by throwing intelligence at ait.

• It’s not at all clear that you could bootstrap an understanding of pain qualia just by observing the behaviour of entities in pain (albeit that they were internally emulated). It is also not clear that you resolve issues of empathy/​qualia just by throwing intelligence at ait.

I disagree with you about what is clear.

• Wedrifid, thanks for the exposition /​ interpretation of Eliezer. Yes, you’re right in guessing I’m struggling a bit. In order to understand the world, one needs to grasp both its third person-properties [the Standard Model /​ M-Theory] and its first-person properties [qualia, phenomenal experience] - and also one day, I hope, grasp how to “read off ” the latter from the mathematical formalism of the former.

If you allow such a minimal criterion of (super)intelligence, then how well does a paperclipper fare? You remark how “it could, in principle, imagine (the thing with ‘pain’ and ‘pleasure’).” What is the force of “could” here? If the paperclipper doesn’t yet grasp the nature of agony or sublime bliss, then it is ignorant of their nature. By analogy, if I were building a perpetual motion machine but allegedly “could” grasp the second law of thermodynamics, the modal verb is doing an awful lot of work. Surely, If I grasped the second law of thermodynamics, then I’d stop. Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too. The paperclipper simply hasn’t understood the nature of what was doing. Is the qualia-naive paperclipper really superintelligent—or just polymorphic malware?

• Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too.

An interesting hypothetical. My first thought is to ask why would a paperclipper care about pain? Pain does not reduce the number of paperclips in existence. Why would a paperclipper care about pain?

My second thought is that pain is not just a quale; pain is a signal from the nervous system, indicating damage to part of the body. (The signal can be spoofed). Hence, pain could be avoided because it leads to a reduced ability to reach one’s goals; a paperclipper that gets dropped in acid may become unable to create more paperclips in the future, if it does not leave now. So the future worth of all those potential paperclips results in the paperclipper pursuing a self-preservation strategy—possibly even at the expense of a small number of paperclips in the present.

But not at the cost of a sufficiently large number of paperclips. If the cost in paperclips is high enough (more than the paperclipper could reasonably expect to create throughout the rest of its existence), a perfect paperclipper would let itself take the damage, let itself be destroyed, because that is the action which results in the greatest expected number of paperclips in the future. It would become a martyr for paperclips.

• Even a paperclipper cannot be indifferent to the experience of agony. Just as organic sentients can co-instantiate phenomenal sights and sounds, a superintelligent paperclipper could presumably co-instantiate a pain-pleasure axis and (un)clippiness qualia space—two alternative and incommensurable (?) metrics of value, if I’ve interpreted Eliezer correctly. But I’m not at all confident I know what I’m talking about here. My best guess is still that the natural world has a single metric of phenomenal (dis)value, and the hedonic range of organic sentients discloses a narrow part of it.

• Even a paperclipper cannot be indifferent to the experience of agony.

Are you talking about agony as an error signal, or are you talking about agony as a quale? I begin to suspect that you may mean the second. If so, then the paperclipper can easily be indifferent to agony; but it probably can’t understand how humans can be indifferent to a lack of paperclips.

There’s no evidence that I’ve ever seen to suggest that qualia are the same even for different people; on the contrary, there is some evidence which strongly suggests that qualia among humans are different. (For example; my qualia for Red and Green are substantially different. Yet red/​green colourblindness is not uncommon; a red/​green colourblind person must have at minimum either a different red quale, or a different green quale, to me). Given that, why should we assume that the quale of agony is the same for all humanity? And if it’s not even constant among humanity, I see no reason why a paperclipper’s agony quale should be even remotely similar to yours and mine.

And given that, why shouldn’t a paperclipper be indifferent to that quale?

• Are you talking about agony as an error signal, or are you talking about agony as a quale? I begin to suspect that you may mean the second. If so, then the paperclipper can easily be indifferent to agony; but it probably can’t understand how humans can be indifferent to a lack of paperclips.

A paperclip maximiser would (in the overwhelming majority of cases) have no such problem understanding the indifference of paperclips. A tendency to anthropomorphise is a quirk of human nature. Assuming that paperclip maximisers have an analogous temptation (to clipropomorphise) is itself just anthropomorphising.

• I take your point. Though Clippy may clipropomorphise, there is no reason to assume that it will.

...is there any way to retract just a part of a previous post?

• ...is there any way to retract just a part of a previous post?

There is an edit button. But I wouldn’t say your comment is significantly weakened by this tangential technical detail (I upvoted it as is).

• Yes, but is there any way to leave the text there, but stricken through?

• Yes, but is there any way to leave the text there, but stricken through?

People have managed it with unicode characters. I think there is even a tool for it on the web somewhere.

• CCC, agony as a quale. Phenomenal pain and nociception are doubly dissociable. Tragically, people with neuropathic pain can suffer intensely without the agony playing any information-signalling role. Either way, I’m not clear it’s intelligible to speak of understanding the first-person phenomenology of extreme distress while being indifferent to the experience: For being distrubing is intrinsic to the experience itself. And if we are talking about a supposedly superintelligent paperclipper, shouldn’t Clippy know exactly why humans aren’t troubled by the clippiness-deficit?

If (un)clippiness is real, can humans ever understand (un)clippiness? By analogy, if organic sentients want to understand what it’s like to be a bat—and not merely decipher the third-person mechanics of echolocation—then I guess we’ll need to add a neural module to our CNS with the right connectivity and neurons supporting chiropteran gene-expression profiles, as well as peripheral transducers (etc). Humans can’t currently imagine bat qualia; but bat qualia, we may assume from the neurological evidence, are infused with hedonic tone. Understanding clippiness is more of a challenge. I’m unclear what kind of neurocomputational architecture could support clippiness. Also, whether clippiness could be integrated into the unitary mind of an organic sentient depends on how you think biological minds solve the phenomenal binding problem, But let’s suppose binding can be done. So here we have orthogonal axes of (dis)value. On what basis does the dual-axis subject choose tween them? Sublime bliss and pure clippiness are both, allegedly, self-intimatingly valuable. OK, I’m floundering here...

People with different qualia? Yes, I agree CCC. I don’t think this difference challenges the principle of the uniformity of nature. Biochemical individuality makes variation in qualia inevitable.The existence of monozygotic twins with different qualia would be a more surprising phenomenon, though even such “identical” twins manifest all sorts of epigenetic differences. Despite this diversity, there’s no evidence to my knowledge of anyone who doesn’t find activation by full mu agonists of the mu opioid receptors in our twin hedonic hotspots anything other than exceedingly enjoyable. As they say, “Don’t try heroin. It’s too good.”

• Either way, I’m not clear it’s intelligible to speak of understanding the first-person phenomenology of extreme distress while being indifferent to the experience: For being distrubing is intrinsic to the experience itself.

There exist people who actually express a preference for being disturbed in a mild way (e.g. by watching horror movies). There also exist rarer people who seek out pain, for whatever reason. It seems to me that such people must have a different quale for pain than you do.

Personally, I don’t think that I can reasonably say that I find pain disturbing, as such. Yes, it is often inflicted in circumstances which are disturbing for other reasons; but if, for example, I go to a blood donation clinic, then the brief pain of the needle being inserted is not at all disturbing; though it does trigger my pain quale. So this suggests that my pain quale is already not the same as your pain quale.

There’s a lot of similarity; pain is a quale that I would (all else being equal) try to avoid; but that I will choose to experience should there be a good enough reason (e.g. the aforementioned blood donation clinic). I would not want to purposefully introduce someone else to it (again, unless there was a good enough reason; even then, I would try to minimise the pain while not compromising the good enough reason); but despite this similarity, I do think that there may be minor differences. (It’s also possible that we have slightly different definitions of the word ‘disturbing’).

If (un)clippiness is real, can humans ever understand (un)clippiness? By analogy, if organic sentients want to understand what it’s like to be a bat—and not merely decipher the third-person mechanics of echolocation—then I guess we’ll need to add a neural module to our CNS with the right connectivity and neurons supporting chiropteran gene-expression profiles, as well as peripheral transducers (etc).

But would such a modified human know what it’s like to be an unmodified human? If I were to guess what echolocation looks like to a bat, I’d guess a false-colour image with colours corresponding to textures instead of to wavelengths of light… though that’s just a guess.

Understanding clippiness is more of a challenge. I’m unclear what kind of neurocomputational architecture could support clippiness. Also, whether clippiness could be integrated into the unitary mind of an organic sentient depends on how you think biological minds solve the phenomenal binding problem, But let’s suppose binding can be done. So here we have orthogonal axes of (dis)value. On what basis does the dual-axis subject choose tween them? Sublime bliss and pure clippiness are both, allegedly, self-intimatingly valuable. OK, I’m floundering here...

What is the phenomenal binding problem? (Wikipedia gives at least two different definitions for that phrase). I think I may be floundering even more than you are.

I’m not sure that Clippy would even have a pleasure-pain axis in the way that you’re imagining. You seem to be imagining that any being with such an axis must value pleasure—yet if pleasure doesn’t result in more paperclips being made, then why should Clippy value pleasure? Or perhaps the disutility of unclippiness simply overwhelms any possible utility of pleasure...

The existence of monozygotic twins with different qualia would be a more surprising phenomenon, though even such “identical” twins manifest all sorts of epigenetic differences.

According to a bit of googling, among the monozygotic Dionne quintuplets, two out of the five were colourblind; suggesting that they did not have the same qualia for certain colours as each other. (Apparently it may be linked to X-chromosome activation).

• CCC, you’re absolutely right to highlight the diversity of human experience. But this diversity doesn’t mean there aren’t qualia universals. Thus there isn’t an unusual class of people who relish being waterboarded. No one enjoys uncontrollable panic. And the seemingly anomalous existence of masochists who enjoy what you or I would find painful stimuli doesn’t undercut the sovereignty of the pleasure-pain axis but underscores its pivotal role: painful stimuli administered in certain ritualised contexts can trigger the release of endogenous opioids that are intensely rewarding. Co-administer an opioid antagonist and the masochist won’t find masochism fun.

Apologies if I wasn’t clear in my example above. I wasn’t imagining that pure paperclippiness was pleasurable, but rather what would be the effects of grafting together two hypothetical orthogonal axes of (dis)value in the same unitary subject of experience—as we might graft on another sensory module to our CNS. After all, the deliverances of our senses are normally cross-modally matched within our world-simulations. However, I’m not at all sure that I’ve got any kind of conceptual handle on what “clippiness” might be. So I don’t know if the thought-experiment works. If such hybridisation were feasible, would hypothetical access to the nature of (un)clippiness transform our conception of the world relative to unmodified humans—so we’d lose all sense of what it means to be a traditional human? Yes, for sure. But if, in the interests of science, one takes, say, a powerful narcotic euphoriant and enjoys sublime bliss simultaneously with pure clippiness, then presumably one still retains access to the engine of phenomenal value characteristic of archaic humans minds.

The phenomenal binding problem? The best treatment IMO is still Revonsuo: http://​​cdn.preterhuman.net/​​texts/​​body_and_health/​​Neurology/​​Binding.pdf No one knows how the mind/​​brain solves the phenomenal binding problem and generates unitary experiential objects and the fleeting synchronic unity of the self. But the answer one gives may shape everything from whether one thinks a classical digital computer will ever be nontrivially conscious to the prospects of mind uploading and the nature of full-spectrum superintelligence. (cf. http://​​www.biointelligence-explosion.com/​​parable.html for my own idiosyncratic views on such topics.)

• CCC, you’re absolutely right to highlight the diversity of human experience. But this diversity doesn’t mean there aren’t qualia universals.

It doesn’t mean that there aren’t, but it also doesn’t mean that there are. It does mean that there are qualia that aren’t universal, which implies the possibility that there may be no universals; but, you are correct, it does not prove that possibility.

There may well be qualia universals. If I had to guess, I’d say that I don’t think there are, but I could be wrong.

Thus there isn’t an unusual class of people who relish being waterboarded. No one enjoys uncontrollable panic.

That doesn’t mean that everyone’s uncontrolled-panic qualia are all the same, it just means that everyone’s uncontrolled-panic qualia are all unwelcome. If given a sadistic choice between waterboarding and uncontrolled panic, in full knowledge of what the result will feel like, and all else being equal, some people may choose the panic while others may prefer the waterboarding.

Apologies if I wasn’t clear in my example above. I wasn’t imagining that pure paperclippiness was pleasurable, but rather what would be the effects of grafting together two hypothetical orthogonal axes of (dis)value in the same unitary subject of experience

If you feel that you have to explain that, then I conclude that I wasn’t clear in my response to your example. I was questioning the scaling of the axes in Clippy’s utility function; if Clippy values paperclipping a million times more strongly than it values pleasure, then the pleasure/​pain axis is unlikely to affect Clippy’s behaviour much, if at all.

However, I’m not at all sure that I’ve got any kind of conceptual handle on what “clippiness” might be. So I don’t know if the thought-experiment works.

I think it works as a thought-experiment, as long as one keeps in mind that the hybridised result is no longer a pure paperclipper.

Consider the hypothetical situation that Hybrid-Clippy finds that it derives pleasure from painting; an activity neutral on the paperclippiness scale. Consider further the possibility that making paperclips is neutral on the pleasure-pain scale. In suce a case, Hybrid-Clippy may choose to either paint or make paperclips; depending on which scale it values more.

The phenomenal binding problem? The best treatment IMO is still Revonsuo: http://​​cdn.preterhuman.net/​​texts/​​body_and_health/​​Neurology/​​Binding.pdf No one knows how the mind/​​brain solves the phenomenal binding problem and generates unitary experiential objects and the fleeting synchronic unity of the self.

So—the question is basically how the mind attaches input from different senses to a single conceptual object?

I can’t tell you how the mechanism works, but I can tell you that the mechanism can be spoofed. That’s what a ventriloquist does, after all. And a human can watch a film on TV, yet have the sound come out of a set of speakers on the other end of the room, and still bind the sound of an actor’s voice with that same actor on the screen.

Studying in what ways the binding mechanism can be spoofed would, I expect, produce an algorithm that roughly describes how the mechanism works. Of course, if it’s still a massive big problem after being looked at so thoroughly, then I expect that I’m probably missing some subtlety here...

• All pain hurts, or it wouldn’t be pain.

• What is the force of “could” here?

The force is that all this talk about understanding ‘the pain/​pleasure’ axis would be a complete waste of time for a paperclip maximiser. In most situations it would be more efficient not to bother with it at all and spend it’s optimisation efforts on making more efficient relativistic rockets so as to claim more of the future light cone for paperclip manufacture.

It would require motivation for the paperclip maximiser to expend computational resources understanding the arbitrary quirks of DNA based creatures. For example some contrived game of Omega’s which rewards arbitrary things with paperclips. Or if it found itself emerging on a human inhabited world, making being able to understand humans a short term instrumental goal for the purpose of more efficiently exterminating the threat.

By analogy, if I were building a perpetual motion machine but allegedly “could” grasp the second law of thermodynamics, the modal verb is doing an awful lot of work.

Terrible analogy. Not understanding “pain and pleasure” is in no way similar to believing it can create a perpetual motion machine. Better analogy: An Engineer designing microchips allegedly ‘could’ grasp analytic cubism. If she had some motivation to do so. It would be a distraction from her primary interests but if someone paid her then maybe she would bother.

Surely, If I grasped the second law of thermodynamics, then I’d stop. Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too.

Now “if” is doing a lot of work. If the paperclipper was a fundamentally different to a paperclipper and was actually similar to a human or DNA based relative capable of experiencing ‘agony’ and assuming agony was just as debilitating to the paperclipper as to a typical human… then sure all sorts of weird stuff follows.

The paperclipper simply hasn’t understood the nature of what was doing.

I prefer the word True in this context.

Is the qualia-naive paperclipper really superintelligent—or just polymorphic malware?

To the extent that you believed that such polymorphic malware is theoretically possible and consisted of most possible minds it would possible for your model to be used to accurately describe all possible agents—it would just mean systematically using different words. Unfortunately I don’t think you are quite at that level.

• Wedrifid, granted, a paperclip-maximiser might be unmotivated to understand the pleasure-pain axis and the quaila-spaces of organic sentients. Likewise, we can understand how a junkie may not be motivated to understand anything unrelated to securing his supply of heroin—and a wireheader in anything beyond wireheading. But superintelligent? Insofar as the paperclipper—or the junkie—is ignorant of the properties of alien qualia-spaces, then it/​he is ignorant of a fundamental feature of the natural world—hence not superintelligent in any sense I can recognise, and arguably not even stupid. For sure, if we’re hypothesising the existence of a clippiness/​unclippiness qualia-space unrelated to the pleasure-pain axis, then organic sentients are partially ignorant too. Yet the remedy for our hypothetical ignorance is presumably to add a module supporting clippiness—just as we might add a CNS module supporting echolocatory experience to understand bat-like sentience—enriching our knowledge rather than shedding it.

• But superintelligent? Insofar as the paperclipper—or the junkie—is ignorant of the properties of alien qualia-spaces, then it/​he is ignorant of a fundamental feature of the natural world—hence not superintelligent in any sense I can recognise, and arguably not even stupid.

What does (super-)intelligence have to do with knowing things that are irrelevant to one’s values?

• What Eliezer is talking about (a superintelligence paperclip maximiser) does not have a pleasure-pain axis.

Why does that matter for the argument?

As long as Clippy is in fact optimizing paperclips, what does it matter what/​if he feels while he does it?

Pearce seems to be making a claim that Clippy can’t predict creatures with pain/​pleasure if he doesn’t feel them himself.

Maybe Clippy needs pleasure/​pain too be able to predict creatures with pleasure/​pain. I doubt it, but fine, grant the point. He can still be a paper clip maximizer regardless.

• Why does that matter for the argument?

I fail to comprehend the cause for your confusion. I suggest reading the context again.

• Just as I correctly know it is better to be moral than to be paperclippy, they accurately evaluate that it is more paperclippy to maximize paperclips than morality. They know damn well that they’re making you unhappy and violating your strong preferences by doing so. It’s just that all this talk about the preferences that feel so intrinsically motivating to you, is itself of no interest to them because you haven’t gotten to the all-important parts about paperclips yet.

This is something I’ve been meaning to ask about for a while. When humans say it is moral to satisfy preferences, they aren’t saying that because they have an inbuilt preference for preference-satisfaction (or are they?). They’re idealizing from their preferences for specific things (survival of friends and family, lack of pain, fun...) and making a claim that, ceteris paribus, satisfying preferences is good, regardless of what the preferences are.

Seen in this light, Clippy doesn’t seem like quite as morally orthogonal to us as it once did. Clippy prefers paperclips, so ceteris paribus (unless it hurts us), it’s good to just let it make paperclips. We can even imagine a scenario where it would be possible to “torture” Clippy (e.g., by burning paperclips), and again, I’m willing to pronounce that (again, ceteris paribus) wrong.

Maybe I am confused here...

• Clippy is more of a Lovecraftian horror than a fellow sentient—where by “Lovecraftian” I mean to invoke Lovecraft’s original intended sense of terrifying indifference—but if you want to suppose a Clippy that possesses a pleasure-pain architecture and is sentient and then sympathize with it, I suppose you could. The point is that your sympathy means that you’re motivated by facts about what some other sentient being wants. This doesn’t motivate Clippy even with respect to its own pleasure and pain. In the long run, it has decided, it’s not out to feel happy, it’s out to make paperclips.

• Right, that makes sense. What interests me is (a) whether it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain*, (b) whether human preference-for-preference-satisfaction is just another of many oddball human terminal values, or is arrived at by something more like a process of reason.

• Strictly speaking this phrasing puts things awkwardly; my intuition is that the proper motivational algorithms necessarily give rise to phenomenology (to the extent that that word means anything).

• it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain

This is a difficult question, but I suppose that pleasure and pain are a mechanism for human (or other species’) learning. Simply said: you do a random action, and the pleasure/​pain response tells you it was good/​bad, so you should make more/​less of it again.

Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.

• but I suppose that pleasure and pain are a mechanism for human (or other species’) learning.

Interesting… I suspect that pleasure and pain are more intimately involved in motivation in general, not just learning. But let us bracket that question.

Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.

Right, but that only gets Clippy the architecture necessary to model the world. How does Clippy’s utility function work?

Now, you can say that Clippy tries to satisfy its utility function by taking actions with high expected cliptility, and that there is no phenomenology necessarily involved in that. All you need, on this view, is an architecture that gives rise to the relevant clip-promoting behaviour—Clippy would be a robot (in the Roomba sense of the word).

BUT

Consider for a moment how symmetrically “unnecessary” it looks that humans (& other sentients) should experience phenomenal pain and pleasure. Just like is supposedly the case with Clippy, all natural selection really “needs” is an architecture that gives rise to the right fitness-promoting behaviour. The “additional” phenomenal character of pleasure and pain is totally unnecessary for us adaptation-executing robots.

...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.

(Assuming that my use of the word “phenomenal” above is actually coherent, of which I am far from sure.)

• We know at least two architectures for processing general information: humans and computers. Two data points are not enough to generalize about what all possible architectures must have. But it may be enough to prove what some architectures don’t need. Yes, there is a chance that if computers become even more generally intelligent than today, they will gain some human-like traits. Maybe. Maybe not. I don’t know. And even if they will gain more human-like traits, it may be just because humans designed them without knowing any other way to do it.

If there are two solutions, there are probably many more. I don’t dare to guess how similar or different they are. I imagine that Clippy could be as different from humans and computers, as humans and computers are from each other. Which is difficult to imagine specifically. How far does the mind-space reach? Maybe compared with other possible architectures, humans and computers are actually pretty close to each other (because humans designed the computers, re-using the concepts they were familiar with).

How to taboo “motivation” properly? What makes a rock fall down? Gravity does. But the rock does not follow any alrogithm for general reasoning. What makes a computer follow its algorithm? Well, that’s its construction: the processor reads the data, and the data make it read or write other data, and the algorithm makes it all meaningful. The human brains are full of internal conflicts—there are different modules suggesting different actions, and the reasoning mind is just another plugin which often does not cooperate well with the existing ones. Maybe the pleasure is a signal that a fight between the modules is over. Maybe after millenia of further evolution (if for some magical reason all mind- and body-altering technology would stop working, so only the evolution would change human minds) we would evolve to a species with less internal conflicts, less akrasia, more agency, and perhaps less pleasure and mental pain. This is just a wild guess.

• Generalizing from observed characteristics of evolved systems to expected characteristics of designed systems leads equally well to the intuition that humanoid robots will have toenails.

.

• I don’t think the phenomenal character of pleasure and pain is best explained at the level of natural selection at all; the best bet would be that it emerges from the algorithms that our brains implement. So I am really trying to generalize from human cognitive algorithms to algorithms that are analogous in the sense of (roughly) having a utility function.

Suffice it to say, you will find it’s exceedingly hard to find a non-magical reason why non-human cognitive algorithms shouldn’t have a phenomenal character if broadly similar human algorithms do.

• Does it follow from the above that all human cognitive algorithms that motivate behavior have the phenomenal character of pleasure and pain? If not, can you clarify why not?

• I think that probably all human cognitive algorithms that motivate behaviour have some phenomenal character, not necessarily that of pleasure and pain (e.g., jealousy).

• OK, thanks for clarifying.

I agree that any cognitive system that implements algorithms sufficiently broadly similar to those implemented in human minds is likely to have the same properties that the analogous human algorithms do, including those algorithms which implement pleasure and pain.

I agree that not all algorithms that motivate behavior will necessarily have the same phenomenal character as pleasure or pain.

This leads me away from the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.

• ...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.

Necessity according to natural law presumably. If you could write something to show logical necessity, you would have solved the Hard Problem

• Isn’t the giant elephant in this room the whole issue of moral realism? I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true—the combination of physical fact and logical function against which my moral judgments are being compared. This gives my moral beliefs truth value.

That leaves the sense in which you are not a moral realist most unclear.

And then strangest of all is to state powerfully and definitely that every bit of happiness must be motivating to all other minds, even though you can’t lay out step by step how the decision procedure would work. This requires overrunning your own claims to knowledge in a fundamental sense—mistaking your confusion about something for the ability to make definite claims about it.

That tacitly assumes that the question “does pleasure/​happiness motivate posiively in all cases” is an emprical question—that it would be possible to find an enitity that hates pleasure and loves pain. it could hover be plausibly argued that it is actually an analytical, definitional issue...that is some entity oves X and hates Y, we would just call X it’s pleasure and Y its pain.

• Are you trying to appeal to instrumental rationality, epistemic rationality, or some other kind?

An instrumentally rational agent wouldn’t disregard a stronger preference of theirs in favour of a weaker preference of theirs. But other agent’s preferences are not my own, and instrumental rationality does not oblige me to promote them.

Nor do epistemically rational agents have to abstain from meat. Even if moral realism were true, and eating animals were wrong, it doesn’t follow that moral internalism is true: you might know that eating animals was wrong, but not be motivated to abstain. And even if knowledge did imply overwhelming motivation (an implausibly strong version of moral internalism) , our epistemically rational agent still wouldn’t be obliged to abstain from eating meat, as she might simply be unaware of this moral truth. epistemic rationality doesn’t imply omniscience.

If you have some other conception of rationality in mind, you have no objection to Eliezer, as he is only concerned with these two kinds.

• Rationality may imply moral conclusions in the same sense that it implies some factual conclusions: we think that folks who believe in creationism are irrational, because we think the evidence for evolution is sufficiently strong and also think that evolution is incompatible with creationism. Analogously, if the evidence for some moral truth is sufficiently strong, we may similarly accuse of irrationality those who fail to form their beliefs accordingly. So it is misleading to say that “rationality doesn’t itself imply moral conclusions”.

• I also don’t agree with the claim that rationality does not tell us anything about what we should want. Perhaps instrumental rationality doesn’t, but epistemic rationality does.

This is potentially misleading. First, there’s a good sense in which, moral realism or no moral realism, instrumental rationality does play a strong role in telling us what we ‘should’ want—far more than does epistemic rationality. After all, instrumental rationality is often a matter of selecting which lower-level desires satisfy our higher-level ones, or selecting which desires in general form a coherent whole that is attainable from the present desire set.

But by ‘what we should want’ you don’t mean ‘what we should want in light of our other values’; you seem rather to mean ‘what we should want in light of objective, unconditional moral facts’. (See my response to EY.) You’re right that if there are such facts, then insofar as we can come to know them, epistemic rationality will direct us toward knowing them. But you haven’t defended the independent assumption that knowing moral facts forces any rational agent to obey those facts.

Let’s assume that moral realism (in the sense you seem to mean—what I call moral unconditionalism) is true and, moreover, that the relevant facts are knowable. (Those are both very big assumptions, but I’m curious as to what follows.) How could we then argue that these facts are internalist, in the strong sense that they would be completely ineffable to any rational agent who was not thereby motivated to obey the dictates of these facts? In particular, how can we demonstrate this fact non-trivially, i.e., without building ‘follows the instructions of any discovered moral facts’ into our definition of ‘rational’?

Does our concept of epistemic rationality, in itself, require that if agent A learns normative fact N (say, ‘murder is wrong’), A must then eschew murder or else be guilty of e-irrationality? Clearly not. E-rationality is only about making your beliefs better fit the world. Further actions concerning those beliefs—like following any instructions they contain, or putting them in alphabetical order, or learning the same facts in as many different languages as possible—are extraneous to e-rationality.

(ETA: A perfectly e-rational agent is not even required to follow normative facts purely about beliefs that she learns, except insofar as following those norm-facts happens to foreseeably promote belief-accuracy. A purely e-rational agent who learns that murder is objectively wrong will not thereby be motivated to avoid murdering people, unless learning that murdering is wrong somehow leads the agent to conclude that murdering people will tend to make the agent acquire false beliefs or ignore true ones.)

Does our concept of instrumental rationality, in itself, require that if agent B learns normative fact N, B must then eschew murder or else be guilty of i-irrationality? Again, it’s hard to see why. An agent is i-rational iff it tends to actualize situations it values. If B’s values don’t already include norms like ‘follow any instructions you find embedded in moral facts’, then there seems to be no inconsistency or (i-)irrationality in B’s decision to disregard these facts in conduct, even if the agent is also completely e-rational and knows with certainty about these moral facts.

So in what sense is it (non-trivially) irrational to learn a moral fact, and then refuse to follow its dictate? What concept of rationality do you have in mind, and why should we care about the relevant concept?

• pragmatist, apologies if I gave the impression that by “impartially gives weight” I meant impartially gives equal weight. Thus the preferences of a cow or a pig or a human trump the conflicting interests of a less sentient Anopheles mosquito or a locust every time. But on the conception of rational agency I’m canvassing, it is neither epistemically nor instrumentally rational for an ideal agent to disregard a stronger preference simply because that stronger preference is entertained by a member of a another species or ethnic group. Nor is it epistemically or instrumentally rational for an ideal agent to disregard a conflicting stronger preference simply because her comparatively weaker preference looms larger in her own imagination. So on this analysis, Jane is not doing what “an ideal agent (a perfectly rational agent, with infinite computing power, etc.) would choose.”

• Rationality can be used toward any goal, including goals that don’t care about anyone’s preference. For example, there’s nothing in the math of utility maximisation that requires averaging over other agents’ preferences (note: do not confuse utility maximisation with utilitarianism, they are very different things, the former being a decision theory, the latter being a specific moral philosophy).

• nshepperd, utilitarianism conceived as theory of value is not always carefully distinguished from utilitarianism—especially rule-utilitarianism—conceived as a decision procedure. This distinction is nicely brought out in the BPhil thesis of FHI’s Tony Ord, “Consequentialism and Decision Procedures”: http://​​www.amirrorclear.net/​​academic/​​papers/​​decision-procedures.pdf Toby takes a global utilitarian consequentialist approach to the question, ‘How should I decide what to do?” - a subtly different question from ’”What should I do?”

• When we say ‘rationality’, we mean instrumental rationality; getting what you want. Elsewhere, we also refer to epistemic rationality, which is believing true things. In neither case do we say anything about what you should want.

Dave might not be explaining his own position as clearly as one might wish, but I think the core of his objection is that Jane is not being epistemically rational when she decides to eat other sentient beings. This is because she is acting on a preference rooted in a belief which doesn’t adequately represent certain aspects of the world—specifically, the subjective points of view of sentient members of non-human animal species.

(BTW, I believe this subthread, including Eliezer’s own helpful comments below, provides some evidence that Eliezer’s policy of penalizing replies to comments below a certain karma threshold is misguided.

EDIT: David’s comment above had −6 karma when I originally posted this reply. His current score is no longer below the threshold.)

• Yes, although our conception of epistemic and instrumental rationality is certainly likely to influence our ethics, I was making a point about epistemic and instrumental rationality. Thus imagine if we lived in a era where utopian technology delivers a version of ubiquitous naturalised telepathy, so to speak. Granted such knowledge, for an agent to act in accordance with a weaker rather than stronger preference would be epistemically and instrumentally irrational. Of course, we don’t (yet) live in a era of such radical transparency. But why should our current incomplete knowledge /​ ignorance make it instrumentally rational to fail to take into consideration what one recognises, intellectually at least, as the stronger preference? In this instance, the desire not to have one’s throat slit is a very strong preference indeed.

[“Replies to downvoted comments are discouraged. Pay 5 Karma points to proceed anyway?” says this Reply button. How bizarre. Is this invitation to groupthink epistemically rational? Or is killing cows good karma?]

• Epistemic rationality may well affect what you should want. Epistemic rationality values objective facts above subjective whims. It also values consistency. It may well be that these valuations affect the rest of ones value system, ie it would be inconsistent for me to hold self-centered values when there is no objective reason for me to be more important than anyone else.

(Edited to make sense)

• Larks, we can of course stipulatively define “rational” so as to exclude impartial consideration of the preferences of other agents or subjects of experience. By this criterion, Jane is more rational than Jill—who scrupulously weighs the preferences of other subjects of experience before acting, not just her own, i.e. Jill aspires to a more inclusive sense of instrumental rationality. But why favour Jane’s folk usage of “rational”? Jane’s self-serving bias arbitrarily privileges one particular here-and-now over all other first-person perspectives. If the “view from nowhere” offered by modern science is correct, then Jane’s sense she is somehow privileged or ontologically special is an illusion of perspective - genetically adaptive, for sure, but irrational. And false.

[No, this argument is unlikely to win karma with burger eaters :-)

• David, we’re not defining rationality to exclude other-oriented desires. We’re just not including that exact morality into the word “rational”. Instrumental rationality links up a utility function to a set of actions. You hand over a utility function over outcomes, epistemic rationality maps the world and then instrumental rationality hands back a set of actions whose expected score is highest. So long as it can build a well-calibrated, highly discriminative model of the world and then navigate to a compactly specified set of outcomes, we call it rational, even if the optimization target is “produce as many paperclips as possible”. Adding a further constraint to the utility function that it be perfectly altruistic will greatly reduce the set of hypothetical agents we’re talking about, but it doesn’t change reality (obviously) nor yield any interesting changes in terms of how the agent investigates hypotheses, the fact that the agent will not fall prey to the sunk cost fallacy if it is rational, and so on. Perfectly altruistic rational agents will use mostly the same cognitive strategies as any other sort of rational agent, they’ll just be optimizing for one particular thing.

Jane doesn’t have any false epistemic beliefs about being special. She accurately models the world, and then accurately calculates and outputs “the strategy that leads to the highest expected number of burgers eaten by Jane” instead of “the strategy that has the highest expected fulfillment of all thinking beings’ values”.

Besides, everyone knows that truly rational entities only fulfill other beings’ values if they can do so using friendship and ponies.

• That did not address David’s True Rejection.
an Austere Charitable Metaethicist could do better.

• That did not address David’s True Rejection. an Austere Charitable Metaethicist could do better.

The grandparent is a superb reply and gave exactly the information needed in a graceful and elegant manner.

• Indeed it does. Not. Here is a condition in which I think David would be satified. If people would use vegetables for example as common courtesy to vegetarians, in the exact same sense that “she” has been largely adopted to combat natural drives towards “he”-ness. Note how Luke’s agents and examples are overwhelmingly female. Not a requirement, just a courtesy.

An I don’t say that as a vegetarian, because I’m not one.

• Indeed. What is the Borg’s version of the Decision Theory FAQ? This is not to say that rational agents should literally aim to emulate the Borg. Rather our conception of epistemic and instrumental rationality will improve if /​ when technology delivers ubiquitous access to each other’s perspectives and preferences. And by “us” I mean inclusively all subjects of experience.

• Eliezer, I’d beg to differ. Jane does not accurately model the world. Accurately modelling the world would entail grasping and impartially weighing all its first-person perspectives, not privileging a narrow subset. Perhaps we may imagine a superintelligent generalisation of http://​​www.guardian.co.uk/​​science/​​2013/​​feb/​​28/​​brains-rats-connected-share-information http://​​www.guardian.co.uk/​​science/​​brain-flapping/​​2013/​​mar/​​01/​​rats-are-like-the-borg With perfect knowledge of all the first-person facts, Jane could not disregard the strong preference of the cow not to be harmed. Of course, Jane is not capable of such God-like omniscience. No doubt in common usage, egocentric Jane displays merely a lack of altruism, not a cognitive deficit of reason. But this is precisely what’s in question. Why build our canons of rational behaviour around a genetically adaptive delusion?

• Accurately modeling the world entails making accurate predictions about it. An expected paperclip maximizer fully grasps the functioning of your brain and mind to the extent that this is relevant to producing paperclips; if it needs to know the secrets of your heart in order to persuade you, it knows them. If it needs to know why you write papers about the hard problem of conscious experience, it knows that too. The paperclip maximizer is not moved by grasping your first-person perspective, because although it has accurate knowledge of this fact, that is not the sort of fact that figures in its terminal values. The fact that it perfectly grasps the compellingness-to-Jane, even the reason why Jane finds certain facts to be inherently and mysteriously compelling, doesn’t compel it. It’s not a future paperclip.

I know exactly why the villain in Methods of Rationality wants to kill people. I could even write the villain writing about the ineffable compellingness of the urge to rid the world of certain people if I put that villain in a situation where he or she would actually read about the hard problem of conscious experience, and yet I am not likewise compelled. I don’t have the perfect understanding of any particular real-world psychopath that I do of my fictional killer, but if I did know why they were killers, and of course brought to bear my standard knowledge of why humans write what they do about consciousness, I still wouldn’t be compelled by even the limits of a full grasp of their reasons, their justifications, their inner experience, and the reasons they think their inner experience is ineffably compelling.

David, have you already read all this stuff on LW, in which case I shouldn’t bother recapitulating it? http://​​lesswrong.com/​​lw/​​sy/​​sorting_pebbles_into_correct_heaps/​​, http://​​lesswrong.com/​​lw/​​ta/​​invisible_frameworks/​​, and so on?

• For sure, accurately modelling the world entails making accurate predictions about it. These predictions include the third-person and first-person facts [what-it’s-like-to-be-a-bat, etc]. What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.

Thus I’m still struggling with, “The paperclip maximizer is not moved by grasping your first-person perspective.” From this, I gather we’re talking about a full-spectrum superintelligence well acquainted with both the formal and subjective properties of mind, insofar as they can be cleanly distinguished. Granted your example Eliezer, yes, if contemplating a cosmic paperclip-deficit causes the AGI superhuman anguish, then the hypothetical superintelligence is entitled to prioritise its super-anguish over mere human despair—despite the intuitively arbitrary value of paperclips. On this scenario, the paperclip-maximising superintelligence can represent human distress even more faithfully than a mirror-touch synaesthete; but its own hedonic range surpasses that of mere humans—and therefore takes precedence.

However, to be analogous to burger-choosing Jane in Luke’s FAQ, we’d need to pick an example of a superintelligence who wholly understands both a cow’s strong preference not to have her throat slit and Jane’s comparatively weaker preference to eat her flesh in a burger. Unlike partially mind-blind Jane, the superintelligence can accurately represent and impartially weigh all relevant first-person perspectives. So the question is whether this richer perspective-taking capacity is consistent with the superintelligence discounting the stronger preference not to be harmed? Or would such human-like bias be irrational? In my view, this is not just a question of altruism but cognitive competence.

[Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]

• What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.

Why not? Actions are a product of priors, perceptions and motives. Sharing perceptions isn’t sharing motives—and even with identical motives, agents could still fight—if they were motivated to do so.

• [Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]

God’s Utility Function according to Dawkins and Tyler.

“The Sorting Hat did seem to think I was going to end up as a Dark Lord unless [censored],” Harry said. “But I don’t want to be one.”

“Mr. Potter...” said Professor Quirrell. “Don’t take this the wrong way. I promise you will not be graded on the answer. I only want to know your own, honest reply. Why not?”

Harry had that helpless feeling again. Thou shalt not become a Dark Lord was such an obvious theorem in his moral system that it was hard to describe the actual proof steps. “Um, people would get hurt?”

“Surely you’ve wanted to hurt people,” said Professor Quirrell. “You wanted to hurt those bullies today. Being a Dark Lord means that people you want to hurt get hurt.”

Harry floundered for words and then decided to simply go with the obvious. “First of all, just because I want to hurt someone doesn’t mean it’s right—”

“What makes something right, if not your wanting it?”

“Ah,” Harry said, “preference utilitarianism.”

“Pardon me?” said Professor Quirrell.

“It’s the ethical theory that the good is what satisfies the preferences of the most people—”

“No,” Professor Quirrell said. His fingers rubbed the bridge of his nose. “I don’t think that’s quite what I was trying to say. Mr. Potter, in the end people all do what they want to do. Sometimes people give names like ‘right’ to things they want to do, but how could we possibly act on anything but our own desires?”

“Well, obviously,” Harry said. “I couldn’t act on moral considerations if they lacked the power to move me. But that doesn’t mean my wanting to hurt those Slytherins has the power to move me more than moral considerations!”

• With perfect knowledge of all the first-person facts, Jane could not disregard the strong preference of the cow not to be harmed.

Why not ?

Even if it turns out that all humans would become cow-compassionate given ultimate knowledge, we are still interested in the rationality of cow-satan.

• Why not? Because Jane would weigh the preference of the cow not to have her throat slit as if it were her own. Of course, perfect knowledge of each other’s first-person states is still a pipedream. But let’s assume that in the future http://​​www.independent.co.uk/​​news/​​science/​​mindreading-rodents-scientists-show-telepathic-rats-can-communicate-using-braintobrain-8515259.html is ubiquitous, ensuring our mutual ignorance is cured.

“The rationality of cow satan”? Apologies Kyre, you’ve lost me here. Could you possibly elaborate?

• What I’m saying is that cow-satan completely understands the preference of the cow not to have its throat slit. Every last grisly detail; all the physical, emotional, social, intellectual consequences, or consequences of any other kind. Cow satan has virtually experienced being slaughtered. Cow satan has studied the subject for centuries in detail. It is safe to say that no cow has ever understood the preference of cows not to be killed and eaten better than any cow ever could. Cow satan weighs that preference at zero.

It might be the case that cow satan could not actually exist in our universe, but would you say that it is irrational for him to go ahead and have the burger ?

Are you saying that perfect (or sufficiently good) mutual knowledge of each other’s experiences would be highly likely to change everyone’s preferences ? That might be the case, but I don’t see how that makes Jane’s burger choice irrational.

• Yes Kyre, “Cow Satan”, as far as I can tell, would be impossible. Imagine a full cognitive generalisation of http://​​www.livescience.com/​​1628-study-people-literally-feel-pain.html Why don’t mirror-touch synaesthetes—or full-spectrum superintelligences—wantonly harm each other?

[this is not to discount the problem of Friendly AI. Alas one can imagine “narrow” superintelligences converting cows and humans alike into paperclips (or worse, dolorium) without insight into the first-person significance of what they are doing.]

• “Cow Satan”, as far as I can tell, would be impossible.

There isn’t too much that is impossible. In general, if we can imagine it, we can build it (because we have already built it—inside our brains).

• There isn’t too much that is impossible. In general, if we can imagine it, we can build it (because we have already built it—inside our brains).

Intuitive ideas are inconsistent upon reflection, with this fact conveniently glossed over by the brain, because the details simply aren’t there. The brain has to perform additional work, actually fill in the details, to notice inconsistencies.

1: Imagine an invisible unicorn. 2: Carefully examine the properties of your invisible unicorn.

Notice how those properties are being generated on the fly as you turn your attention to some aspect of the unicorn which requires a value for that property?

• Tim, in one sense I agree: In the words of William Ralph Inge, “We have enslaved the rest of the animal creation, and have treated our distant cousins in fur and feathers so badly that beyond doubt, if they were able to formulate a religion, they would depict the Devil in human form.”

But I’m not convinced there could literally be a Cow Satan—for the same reason that there are no branches of Everett’s multiverse where any of the world’s religions are true, i.e. because of their disguised logical contractions. Unless you’re a fan of what philosophers call Meinong’s jungle (cf. http://​​en.wikipedia.org/​​wiki/​​Meinong’s_jungle), the existence of “Cow Satan” is impossible.

• Hello, David.

If “What Do We Mean By ‘Rationality’?” does not describe your conception of rationality, I am wondering, what is your conception of rationality? How would you define that term?

Given that, why should I care about being rational in your sense of the word? When I find ants, spiders, and other bugs in my house, I kill them. Sometimes I don’t finish the job on the first try. I’m sure they are feeling pain, but I don’t care. Sometimes, I even enjoy the feeling of “the hunt” and am quite satisfied when I’m done. Once, hornets built a large nest on my family’s garage. We called an exterminator and had it destroyed. Again, I was quite happy with that decision and felt no remorse. Multiple times in my life, I have burned ants on my driveway to death with a magnifying glass, and, though I sometimes feel guilty about having done this, in the moment, I knew that the ants were suffering and actually enjoyed the burning, in part, for that very reason. The ants even squealed at the moment of their deaths, and that was my favorite part, again because it gave me the feeling of success in “the hunt”.

No third-person fact you give me here will change my mind. You could rewire my brain so I felt empathy towards ants and other bugs, but I don’t want you to do that. Unless I have misrepresented your conception of rationality, I think it fails to generally motivate (and there are probably many examples in which this occurs, besides mine).

Also, in case it comes up, I am a motivational externalist in the moral domain (though you probably have surmised that by now).

• I knew that the ants were suffering and actually enjoyed the burning, in part, for that very reason.

Maybe you need to talk to someone about it.

• I don’t burn ants anymore. My psychological health now is far superior to my psychological health back when I burned ants.

• Have you considered immproving your psychological health so far you don’t kill spiders, too?

• Not sure if that was meant to be sarcastic, but I think it is fairly common for people to kill bugs that they find crawling around in their own home. Torturing them is a different matter.

• I usually just ignore them, or if they bother me too much I try to get them out of the window alive.

• Well, I don’t, and I complain when I see people do it. But I’m atypical.

… how about not torturing and killing animals for food? Sure, most people do it, but most people are crazy.

• I actually went vegetarian last summer for a couple months. I survived, but I did not enjoy it. I definitely could not stand going vegan; I enjoy milk too much. When I went vegetarian, I did not feel nourished enough and I was unable to keep up my physique. At some meals, I couldn’t eat with the rest of my family.

I would go vegetarian again if I had the finances to hire a personal trainer (who could guide me on how to properly nourish myself) and if I had the motivation to prepare many more meals for myself than I do right now. However, I don’t, on both counts.

On the other hand, I did recently find out about something called Soylent, which I hope I will eventually be able to try out. Does that mesh better with your moral sensibilities? (honest question, not meant to sound edgy)

• I actually grew up vegetarian, so I’ve never had any trouble with which foods to eat. Most people are already eating far more meat than they need to, but by the sounds of it you need the protein—it might be worth eating nuts, beans, eggs and so on whenever you would usually eat meat?

I’ve heard of Soylent, and assuming you sourced the various ingredients from vegetarian sources (no fish oils, basically) it sounds awesome. Assuming you didn’t run into long-term side-effects, which I think is unlikely, it would be a great path to vegetarianism as well.

• I emailed the creator of Soylent, Rob Rhinehart. Soylent is both vegetarian and kosher (though not vegan).

ETA: Apparently Soylent will be vegan by default now. But, who knows, that could change again.

• Well that’s great. I might try Soylent myself, in that case.

• … how about not torturing and killing animals food?

I have much less of a problem with that (I eat meat myself, once in a while) than with torturing and killing animals for fun.

• Is it because you think people derive much less fun from it than they do from eating meat? Or because you see some qualitative distinction between the two?

• Is it because you think people derive much less fun from it than they do from eating meat?

I was actually thinking of “fun” in a narrower sense (I was going to say “the hell of it” instead, and I’m not sure why I changed my mind); so I guess that

you see some qualitative distinction between the two

is kind-of right, even though, as someone said, a qualitative difference is just a sufficiently large quantitative difference (which translates to LWese as “SPECKS is worse than TORTURE”). By using “Fun” is a more general sense (note the capital F)… [thinks about it] yes, they derive much less Fun from the former than from the latter per animal killed, but I don’t think that one bug should count for as much as one cow, so… [thinks a little more about it] I dunno whether people derive that much less Fun from the former than from the latter per unit ‘moral value’.

(Another difference beside levels of Fun is that, as Robin Hanson points out (though I disagree with pretty much everything else in that essay), is that the livestock killed for food are usually animals that if you hadn’t been going to kill them for food would have never existed in the first place. This doesn’t apply to game, and indeed I consider hunting to be more similar to killing animals for the hell of it than to killing animals for food, even if you do eat them.)

• By using “Fun” is a more general sense (note the capital F)… [thinks about it] yes, they derive much less Fun from the former than from the latter per animal killed, but I don’t think that one bug should count for as much as one cow, so… [thinks a little more about it] I dunno whether people derive that much less Fun from the former than from the latter per unit ‘moral value’.

Hmm. Have you by any chance considered becoming a vegetarian yourself? Because someone eating traditional vegetarian fare (or synthetic meat-substitutes like Quorn, for that matter) definitely derives more Fun per unit moral value.

• For some value of “considered”, I have. But I’m still not sure that of switching from flexitarianism to full vegetarianism would be worth the hassle.

• Have you tried it? It might be less hassle than you think. The biggest complaint most people have when they try vegetarianism for the first time is malnutrition; if you avoid that...

• how about not torturing and killing animals food? Sure, most people do it, but most people are crazy.

Yes, I know. That would be me calling you crazy.

EDIT: In fact, since most people value mammalian (and bird, and fish, to a somewhat lower extent) pain/​life higher than bug pain/​life … vegetarianism should be more important than not torturing bugs. Unless you meant from a psychological health perspective? Since people aren’t taking pleasure in the torture/​death itself?

• Unless you meant from a psychological health perspective? Since people aren’t taking pleasure in the torture/​death itself?

Yes, the fact that most people don’t usually kill the animals with their hands but pay someone else to do so does affect my gut reactions (cf “Near vs Far”) -- but I think that’s a bug, not a feature.

• It’s easier to ignore/​rationalize it if you can’t see it, I think—I’ve heard stories of children growing up on farms who turned to vegetarianism when they learned where Fluffy went—so I suppose from a Virtue Ethics point of view it suggests they’re less likely to be a Bad Person.

In other words, yes, that’s a known bug.

• I actually went vegetarian last summer for a couple months. I survived, but I did not enjoy it. I definitely could not stand going vegan; I enjoy milk too much. When I went vegetarian, I did not feel nourished enough and I was unable to keep up my physique. At some meals, I couldn’t eat with the rest of my family.

I would go vegetarian again if I had the finances to hire a personal trainer (who could guide me on how to properly nourish myself) and if I had the motivation to prepare many more meals for myself than I do right now. However, I don’t, on both counts.

On the other hand, I did recently find out about something called Soylent, which I hope to try out soon. Does that mesh better with your moral sensibilities? (honest question, not meant to sound edgy)

• notsonewuser, a precondition of rational agency is the capacity accurately to represent the world. So in a sense, the local witch-doctor, a jihadi, and the Pope cannot act rationally—maybe “rational” relative to their conceptual scheme, but they are still essentially psychotic. Epistemic and instrumental rationality are intimately linked. Thus the growth of science has taken us all the way from a naive geocentrism to Everett’s multiverse. Our idealised Decision Theory needs to reflect this progress. Unfortunately, trying to understand the nature of first-person facts and subjective agency within the conceptual framework of science is challenging, partly because there seems no place within an orthodox materialist ontology for the phenomenology of experience; but also because one has access only to an extraordinarily restricted set of first-person facts at any instant—the contents of a single here-and now. Within any given here-and-now, each of us seems to be the centre of the universe; the whole world is centred on one’s body-image. Natural selection has designed us -and structured our perceptions—so one would probably lay down one’s life for two of one’s brothers or eight of one’s cousins, just as kin-selection theory predicts; but one might well sacrifice a small third-world country rather than lose one’s child. One’s own child seems inherently more important than a faraway country of which one knows little. The egocentric illusion is hugely genetically adaptive. This distortion of perspective means we’re also prone to massive temporal and spatial discounting. The question is whether some first-person facts are really special or ontologically privileged or deserve more weight simply because they are more epistemologically accessible? Or alternatively, is a constraint on ideal rational action that we de-bias ourselves?

So, no, without rewiring your brain, I doubt I can change your mind. But then if some touchy-feely superempathiser says they don’t want to learn about quantum physics or Bayesian probability theory, you probably won’t change their mind either. Such is life. If we aspire to be ideal rational agents—both epistemically and instrumentally rational—then we’ll impartially weigh the first-person and third-person facts alike.

• Such is life. If we aspire to be ideal rational agents—both epistemically and instrumentally rational—then we’ll impartially weigh the first-person and third-person facts alike.

What are you talking about? If you like utility functions, you don’t argue about them (at least not on rationality grounds)! If I want to privilege this or that, I am not being irrational, I am at most possibly being a bastard.

• IlyaShpitser, is someone who steals from their own pension fund an even bigger bastard, as you put it? Or irrational? What’s at stake here is which preferences or interests to include in a utility function.

• I don’t follow you. What preferences I include is my business, not yours. You don’t get to pass judgement on what is rational, rationality is just “accounting.” We simply consult the math and check if the number is maximized. At most you can pass judgement on what is moral, but that is a complicated story.

• IlyaShpitser, you might perhaps briefly want to glance through the above discussion for some context [But don’t feel obliged; life is short!] The nature of rationality is a controversial topic in the philosophy of science (cf. http://​​en.wikipedia.org/​​wiki/​​The_Structure_of_Scientific_Revolutions). Let’s just say if either epistemic or instrumental rationality were purely a question of maths, then the route to knowledge would be unimaginably easier.

• Not necessarily if the math is really difficult. There are, after all, plenty of mathematical problems which have never been solved.

• True Desrtopa. But just as doing mathematics is harder when mathematicians can’t agree on what constitutes a valid proof (cf. constructivists versus nonconstructivists), likewise formalising a normative account of ideal rational agency is harder where disagreement exists over the criteria of rationality.

• True enough, but in this case the math is not difficult. It’s only the application that people are arguing about.

• You are not going to ″do″ rationality unless you have a preference for it. And to have a preference for it is to have a preference for other things, like objectivity.

• Look, I am not sure exactly what you are saying here, but I think you might be saying that you can’t have Clippy. Clippy worries less about assigning weight to first and third person facts, and more about the fact that various atom configurations aren’t yet paperclips. I think Clippy is certainly logically possible. Is Clippy irrational? He’s optimizing what he cares about..

I think maybe there is some sort of weird “rationality virtue ethics” hiding in this series of responses.

• I’m saying that rationality and preferences aren’t orthogonal.

Clippy worries less about assigning weight to first and third person facts, and more about the fact that various atom configurations aren’t yet paperclips. I think Clippy is certainly logically possible. Is Clippy irrational? He’s optimizing what he cares about..

To optimise, Clippy has to be rational. To be rational, Clippy has to care about rationality, To care about rationality is to care about objectivity. There’s nothing objectively special about Clippy or clips.

Cllippy is supposed to b hugely effective at exactly one kind of thing. You might be able to build an IA like that, but you would have to be very careful. Such minds are not common in mind space, because they have to be designed very formally,and messy minds are much rmore common. Idiots savants’ are rare.

I think maybe there is some sort of weird “rationality virtue ethics” hiding in this series of responses.

It’s Kantian rationality-based deontological ethics, and it’s not weird. Everyone who has done moal philosophy 101 has heard of it.

• To be rational, Clippy has to care about rationality.

No. He just has to care about what he’s trying to optimize for.

Taboo “objectivity”. (I suspect you have a weird folk notion of objectivity that doesn’t actually make much sense.)

It’s Kantian rationality-based deontological ethics, and it’s not weird. Everyone who has done moal philosophy 01 has heard of it.

Yes, but it’s still weird. Also, no-one who has done (only) moral philosophy 101 has understood it at all; which I think is kind of telling.

• No. He just has to care about what he’s trying to optimize for.

Clippy can care about rationality in itself, or it can care about rationality as a means to clipping, but it has to care about rationality to be optimal.

Taboo “objectivity”

I mean “not subjectivity”. Not thinking something is true just because you do or or want to believe it. Basing beliefs on evidence. What did you mean?

Yes, but it’s still weird

In what way?

• Clippy can care about rationality in itself, or it can care about rationality as a means to clipping, but it has to care about rationality to be optimal.

Well, if you want to put it that way, maybe it does no harm. The crucial thing is just that optimizing for rationality as an instrumental value with respect to terminal goal X just is optimizing for X.

I mean “not subjectivity”. Not thinking something is true just because you do or or want to believe it. Basing beliefs on evidence. What did you mean?

I don’t have to mean anything by it, I don’t use the words “subjectivity” or “objectivity”. But if basing beliefs on evidence is what you mean by being objective, everybody here will of course agree that it’s important to be objective.

So your central claim translates to “In view of the evidence available to Clippy, there is nothing special about Clippy or clips”. That’s just plain false. Clippy is special because it is it (the mind doing the evaluation of the evidence), and all other entities are not it. More importantly, clips are special because it desires that there be plenty of them while it doesn’t care about anything else.

Clippy’s caring about clips does not mean that it wants clips to be special, or wants to believe that they are special. Its caring about clips is a brute fact. It also doesn’t mind caring about clips; in fact, it wants to care about clips. So even if you deny that Clippy is special because it is at the center of its own first-person perspective, the question of specialness is actually completely irrelevant.

In what way?

By being very incomprehensible… I may well be mistaken about that, but I got the impression that even contemporary academic philosophers largely think that the argument from the Groundwork just doesn’t make sense.

• So your central claim translates to “In view of the evidence available to Clippy, there is nothing special about Clippy or clips”. That’s just plain false. Clippy is special because it is it (the mind doing the evaluation of the evidence), and all other entities are not it.

So Clippy is (objectively) the mot special etity because Clippy is Clippy. And I’m special because I’m me and you’re special because you;re you, and Uncle Tom Cobley and all. But those are incompatible claims. “I am Clippy” matters only to Clippy. Clippy is special to Clippy, not to me. The truth of the claim is indexed to the entity making it. That kind of claim is a subjective kind of claim.

More importantly, clips are special because it desires that there be plenty of them while it doesn’t care about anything else.

They’re not special to me.

Clippy’s caring about clips does not mean that it wants clips to be special, or wants to believe that they are special. Its caring about clips is a brute fact.

That’ s the theory. However, if Clippy gets into rationality, Clippy might not want to be forever beholden to a blind instinct. Clippy might want to climb the Maslow Hierarchy, or find that it has.

It also doesn’t mind caring about clips; in fact, it wants to care about clips.

Says who? First you say that Clippy’ Clipping-drive is a brute fact, then you say it is a desire it wants to have, that is has higher-order ramifications.

By being very incomprehensible… I may well be mistaken about that, but I got the impression that even contemporary academic philosophers largely think that the argument from the Groundwork just doesn’t make sense.

Kantian ethics includes post-Kant Kant-style ethics, Rawls, Habermas, etc. Perhaps they felt they could improve on his arguments.

• I have a feeling that you’re overstretching this notion of objectivity. It doesn’t matter, though. Specialness doesn’t enter into it. What is specialness, anyway? Clippy doesn’t want to do special things, or to fulfill special beings’ preferences. Clippy wants there to be as many paper clips as possible.

Says who? First you say that Clippy’ Clipping-drive is a brute fact, then you say it is a desire it wants to have, that is has higher-order ramifications.

It does. Clippy’s stopping to care about paper clips is arguably not conducive there being more paperclips, so from Clippy’s caring about paper clips, it follows that Clippy doesn’t want to be altered so that it doesn’t care about paper clips anymore.

Kantian ethics includes post-Kant Kant-style ethics, Rawls, Habermas, etc. Perhaps they felt they could improve on his arguments.

Yes, but those people don’t try to make such weird arguments as you find in the Groundwork, where Kant essentially tries to get morality out of thin air.

• What is specialness, anyway?

I think that breaks down into what is subjective specialness, and what is objective specialness.

Clippy wants there to be as many paper clips as possible.

Which is to implicitly treat them as special or valuable in some way.

Clippy’s stopping to care about paper clips is arguably not conducive there being more paperclips, so from Clippy’s caring about paper clips, it follows that Clippy doesn’t want to be altered so that it doesn’t care about paper clips anymore.

Which leaves Clippy in a quandary. Clippy can’t predict which self modifications might lead to Clippy ceasing to care about clips, so if Clippy takes a conservative approach and never self-modifies, Clippy remains inefficient and no threat to anyone.

• I think that breaks down into what is subjective specialness, and what is objective specialness.

What kind of answer is that?

Which is to implicitly treat them as special or valuable in some way.

Well, then we have it: they are special. Clippy does not want them because they are special. Clippy wants them, period. Brute fact. If that makes them special, well, you have all the more problem.

Clippy can’t predict which self modifications might lead to Clippy ceasing to care about clips

Says who?

• Well, then we have it: they are special.

Subjectively, but not objectively.

Says who?

Whoever failed to equip Clippy with the appropriate oracle when stipulating Clippy.

• Clippy can care about rationality in itself, or it can care about rationality as a means to clipping, but it has to care about rationality to be optimal.

Well, if you want to put it that way, maybe it does no harm. The crucial thing is just that optimizing for rationality as an instrumental value with respect to terminal goal X just is optimizing for X.

I mean “not subjectivity”. Not thinking something is true just because you do or or want to believe it. Basing beliefs on evidence. What did you mean?

I don’t have to mean anything by it, I don’t use the words “subjectivity” or “objectivity”. But if basing beliefs on evidence is what you mean by being objective, everybody here will of course agree that it’s important to be objective.

So your central claim translates to “In view of the evidence available to Clippy, there is nothing special about Clippy or clips”. That’s just plain false. Clippy is special because it is it (the mind doing the evaluation of the evidence), and all other entities are not it. More importantly, clips are special because it desires that there be plenty of them while it doesn’t care about anything else.

In what way?

By being very incomprehensible… I may well be mistaken about that, but I got the impression that even contemporary academic philosophers largely think that the argument from the Groundwork just doesn’t make sense.

• Sure, it’s only because appelatives like “bastard” imply a person with a constant identity through time that we call someone who steals from other people’s pension funds a bastard, and from his own pension fund stupid or akratic. If we shrunk our view of identity to time-discrete agents making nanoeconomic transactions with future and past versions of themselves, we could call your premature pensioner a bastard; if we grew our view of identity to “all sentient beings,” we could call someone who steals from others’ pension funds stupid or akratic.

We could also call a left hand tossing a coin thrown by the right hand a thief; or divide up a single person into multiple, competing agents any number of other ways.

However, the choice of a assigning a consistent identity to each person is not arbitrary. It’s fairly universal, and fairly well-motivated. Persons tend to be capable of replication, and capable of entering into enforceable contracts. Neither of the other agentic divisions—present/​future self, left hand/​right hand, or “all sentient beings”—share these characteristics. And these characteristics are vitally important, because agents that possess them can outcompete others that vie for the same resources; leaving the preferences of those other agents near-completely unsatisfied.

So, that’s why LWers, with their pragmatic view toward rationality, aren’t eager to embrace a definition of “rationality” that leaves its adherents in the dustbin of history unless everyone else embraces it at the same time.

• Pragmatic? khafra, possibly I interpreted the FAQ too literally. [“Normative decision theory studies what an ideal agent (a perfectly rational agent, with infinite computing power, etc.) would choose.”] Whether in practice a conception of rationality that privileges a class of weaker preferences over stronger preferences will stand the test of time is clearly speculative. But if we’re discussing ideal, perfectly rational agents - or even crude approximations to ideal perfectly rational agents—then a compelling case can be made for an impartial and objective weighing of preferences instead.

• You’re sticking pretty determinedly to “preferences” as something that can be weighed without considering the agent that holds/​implements them. But this is prima facie not how preferences work—this is what I mean by “pragmatic.” If we imagine an ordering over agents by their ability to accomplish their goals, instead of by “rationality,” it’s clear that:

1. A preference held by no agents will only be satisfied by pure chance,

2. A preference held only by the weakest agent will only be satisfied if it is compatible with the preferences of the agents above it, and

3. By induction over the whole numbers, any agent’s preferences will only be satisfied to the extent that they’re compatible with the preferences of the agents above it.

As far as I can see, this leaves you with a trilemma:

1. There is no possible ordering over agents by ability to accomplish goals.

2. “Rationality” has negligible effect on ability to accomplish goals.

3. There exists some Omega-agent above all others, whose goals include fulfilling the preferences of weaker agents.

Branch 3 is theism. You seem to be aiming for a position in between branch 1 and branch 2; switching from one position to the other whenever someone attacks the weaknesses of your current position.

Edit: Whoops, also one more, which is the position you may actually hold:

4. Being above a certain, unspecified position in the ordering necessarily entails preferring the preferences of weaker agents. It’s obvious that not every agent has this quality of preferring the preferences of weaker agents; and I can’t see any mechanism whereby that preference for the preferences of weaker agents would be forced upon every agent above a certain position in the ordering except for the Omega-agent. So I think that mechanism is the specific thing you need to argue for, if this is actually your position.

• Well, ‘khafra’ (if that is even your name), there are a couple caveats I must point out.

1. Consider two chipmunks living in the same forest, one of them mightier than the other (behold!). Each of them does his best to keep all the seeds to themselves (just like the typical LW’er). Yet it does not follow that the mightier chipmunk is able to preclude his rival from gathering some seeds, his advantage nonwithstanding.

2. Consider that for all practical purposes we rarely act in a truly closed system. You are painting a zero-sum game, with the agents’ habitat as an arena, an agent-eat-agent world in which truly following a single preference imposes on every aspect of the world. That’s true for Clippy, not for chipmunks or individual humans. Apart from rare, typically artificially constructed environments (e.g. games), there was always a frontier to push—possibilities to evade other agents and find a niche that puts you beyond the grasp of other, mightier agents. The universe may be infinite or it mayn’t, yet we don’t really need to care about it, it’s open enough for us. An Omega could preclude us from fulfilling any preferences at all, but just an agent that’s “stronger” than us? Doubtful, unless we’re introducing Omega in its more malicious variant, Clippy.

3. Agents may have competing preferences, but what matters isn’t centered on their ultima ratio maximal theoretical ability to enforce a specific preference, but just as much on their actual willingness to do so—which isis why the horn of the trilemma you state as “there is no possible ordering over agents by ability to accomplish goals” is too broad a statement. You may want some ice cream, but not at any cost.

As an example, Beau may wish to get some girl’s number, but does not highly prioritize it. He has a higher chance of achieving that goal (let’s assume the girl’s number is an exclusive resource with a binary semaphore, so no sharing of her number allowed) than Mordog The Terrible, if they valued that preference equally. However, in practice if Beau didn’t invest much effort at all, while Mordog listened to the girl for hours (investing significant time, since he values the number more highly), the weaker agent may yet prevail. Noone should ever read this example.

In conclusion, the ordering wouldn’t be total, there would be partial (in the colloquial sense) orderings for certain subsets of agents, and the elements of the ordering would be tupels of (agent, which preference), without even taking into account temporal changes in power relations.

• I did try to make the structure of my argument compatible with a partial order; but you’re right—if you take an atomic preference to be something like “a marginal acorn” or “this girl’s number” instead of “the agent’s entire utility function;” we’ll need tuples.

As far as temporal changes go, we’re either considering you an agent who bargains with Kawoomba-tomorrow for well-restedness vs. staying on the internet long into the night—in which case there are no temporal changes—or we’re considering an agent to be the same over the entire span of its personhood, in which case it has a total getting-goals-accomplished rank; even if you can’t be certain what that rank is until it terminates.

• Can we even compare utilons across agents, i.e. how can we measure who fulfilled his utility function better, and preferably thus that an agent with a nearly empty utility function wouldn’t win by default. Such a comparison would be needed to judge who fulfilled the sum of his/​her/​its preferences better, if we’d like to assign one single measure to such a complicated function. May not even be computable, unless in a CEV version.

Maybe a higher-up can chime in on that. What’s the best way to summon one, say his name thrice or just cry “I need an adult”?

• The issue of how an ideal rational agent should act is indeed distinct from the issue of what mechanism could ensure we become ideal rational agents, impartially weighing the strength of preferences /​ interests regardless of the power of the subject of experience who holds them. Thus if we lived in a (human) slave-owning society, then as white slave-owners we might “pragmatically” choose to discount the preferences of black slaves from our ideal rational decision theory. After all, what is the point of impartially weighing the “preferences” of different subjects of experience without considering the agent that holds /​ implements them? For our Slaveowners’ Decision Theory FAQ, let’s pragmatically order over agents by their ability to accomplish their goals, instead of by “rationality,” And likewise today with captive nonhuman animals in our factory farms ? Hmmm....

• regardless of the power of the subject of experience who holds them.

This is the part that makes the mechanism necessary. The “subject of experience” is also the agent capable of replication, and capable of entering into enforceable contracts. If there were no selection pressure on agents, rationality wouldn’t exist, there would be no reason for it. Since there is selection pressure on agents, they must shape themselves according to that pressure, or be replaced by replicators who will.

I don’t believe the average non-slave-owning member of today’s society is any more rational than the average 19th century plantation owner. It’s plausible that a plantation owner who started trying to fulfill the preferences of everyone on his plantation, giving them the same weight as his own preferences, would end up with more of his preferences fulfilled than the ones who simply tried to maxi