# Reflexive decision theory is an unsolved problem

By “reflexive decision theory”, hereafter RDT, I mean any decision theory that can incorporate information about one’s own future decisions into the process of making those decisions. RDT is not itself a decision theory, but a class of decision theories, or a property of decision theories. Some say it is an empty class. FDT (to the extent that it has been worked out — this is not something I have kept up with) is an RDT.

The use of information of this sort is what distinguishes Newcomb’s problem, Parfit’s Hitchhiker, the Smoking Lesion, and many other problems from “ordinary” decision problems which treat the decision-maker as existing outside the world that he is making decisions about.

There is currently no generally accepted RDT. Unfortunately, that does not stop people insisting that every decision theory but their favoured one is wrong or crazy. There is even a significant literature (which I have never seen cited on LW, but I will do so here) saying that reflexive decision theory itself is an impossibility.

## Reflexive issues in logic

We have known that reflexiveness is a problem for logic ever since Russell said to Frege, “What about the set of sets that aren’t members of themselves?” (There is also the liar paradox going back to the ancient Greeks, but it remained a mere curiosity until people started formalising logic in the 19th century. *Calculemus, nam veritas in calculo est.*)

In set theory, this is a solved problem, solved by the limited comprehension axiom. Since Gödel, we also have ways of making theories talk about themselves, and there are all manner of theorems about the limits of how well they can introspect: Gödel’s incompleteness theorem, Löb’s theorem, etc.

Compared with that, reflexive decision theory has hardly even started.

## Those who know not, and know not that they know not

Many think they have solutions, but they disagree with each other, and keep on disagreeing. So we have the situation where CDT-ers say “but the boxes already contain what they contain!”, and everyone with an RDT replies “then you’ll predictably lose!”, and both point with scorn at EDT and say “you think you can change reality by managing the news!” The words “flagrantly, confidently, egregiously wrong” get bandied about, at least by one person. Everyone thinks everyone else is crazy. There is also a curious process by which an XDT’er, for any value of X, responds to counterexamples to X by modifying XDT and claiming it’s still XDT, to the point of people ending up saying that CDT and EDT are the same. Now *that’s* crazy.

## Those who know not, and know that they know not

Some people know that they do not have a solution. Andy Egan, in “Some Counterexamples to Causal Decision Theory” (2007, Philosophical Review), shoots down both CDT and EDT, but only calls for a better theory, without any suggestions for finding it.

## Those who reject the very idea of an RDT

Some deny the possibility of any such theory, such as Marion Ledwig (“The No Probabilities for Acts-Principle”), who formulates the principle thus: “Any adequate quantitative decision model must not explicitly or implicitly contain any subjective probabilities for acts.” This rejects the very idea of reflexive decision theory. It also implies that one-boxing is wrong for Newcomb’s problem, and Ledwig explicitly says that it is.

For the original statement of the principle, Ledwig cites Spohn (1999, “Strategic Rationality”, and 1983 “Ein Theorie Der Kausalität”. My German is not good enough to analyse what he says in the latter reference, but in the former he says that the rational strategy in one-shot Prisoners’ Dilemma is to defect, and in one-shot Newcomb is to two-box. Ledwig and Spohn trace the idea back to Savage’s 1954 “The Foundations of Statistics”. Savage’s whole framework, however, in common with the other “classical” theories such as Jeffrey-Bolker and VNM, does not have room in it for any sort of reflexiveness, ruling it out implicitly rather than considering the idea and explicitly rejecting it. There is more in Spohn 1977 “Where Luce and Krantz Do Really Generalize Savage’s Decision Model” where Spohn says:

“[P]robabilities for acts play no role in decision making. For, what only matters in a decision situation is how much the decision maker likes the various acts available to him, and relevant to this, in turn, is what he believes to result from the various acts and how much he likes these results. At no place does there enter any subjective probability for an act.”

There is also Itzhak Gilboa “Can free choice be known?”. He says, “[W]e are generally happier with a model in which one cannot be said to have beliefs (let alone knowledge) of one’s own choice while making this choice”, and looks for a way to resolve reflexive paradoxes by ruling out reflexiveness.

These people all defect in PD and two-box in Newcomb. The project of RDT is to do better.

(ETA: Thanks to Sylvester Kollin (see comments) for drawing my attention to a later paper by Spohn in which he has converted to one-boxing within a causal decision theory.)

## Reflexiveness in the real world

People often make decisions that take into account the states of mind of the people they are interacting with, including other people’s assessments of one’s own state of mind. This is an essential part of many games (such as Diplomacy) and of many real-world interactions (such as diplomacy). A theory satisfying “no foreknowledge of oneself” (my formulation of “no probabilities for acts”) cannot handle these. (Of course one can have foreknowledge of oneself; the principle only excludes this information from input into one’s decisions.)

The principle “know thyself” is as old as the Liar paradox.

Just as there have been contests of bots playing iterated prisoners’ dilemma, so there have been contests where these bots are granted access to each others’ source code. Surely we need a decision theory that can deal with the reasoning processes used by such bots.

The elephant looming behind the efforts of Eliezer and his colleagues to formulate FDT is AI. It might be interesting at some point to have a tournament of bots whose aim is to get other bots to “let them out of the box”.

These practical problems need solutions. The no foreknowledge principle rejects any attempt to think about them; thinking about them therefore requires rejecting the principle. That is the aim of RDT. I do not have an RDT, but I do think that duelling intuition pumps is a technique whose usefulness for this problem has been exhausted. It is no longer enough to construct counterexamples to everyone else, for they can as easily do the same to your own theory. Some general principle is needed that will be as decisive for this problem as limited comprehension is for building a consistent set theory.

## A parable

Far away and long ago, each morning the monks at a certain monastery would go out on their alms rounds. To fund the upkeep of the monastery buildings, once every month each monk would buy a single gold coin out of the small coins that he had received, and deposit it in a chest through a slot in the lid.

This system worked well for many years.

One month, a monk thought to himself, “What if I drop in a copper coin instead? No-one will know I did it.”

That month, when the chest was opened, it contained nothing but copper coins.

This should say

2007.Spohn argues for one-boxing in

Reversing 30 years of discussion: why causal decision theorists should one-box.Fixed.

Thanks for the reference. I hope this doesn’t turn out to be a case of changing CDT and calling the result CDT.

ETA: 8 pages in, I’m impressed by the clarity, and it looks like it leads to something reasonably classified as a CDT.

Seconded. This is an extremely impressive paper. It seems like Spohn had most of the insights that motivated and led to the development of logical/functional decision theories, years before Less Wrong existed. I’m astounded that I’ve never heard of him before now.

Having searched for “Spohn” on LW, it appears that Spohn was already mentioned a few times on LW. In particular:

11 years ago, in lukeprog’s post Eliezer’s Sequences and Mainstream Academia (also see bits of this Wei Dai comment here, and of this long critical comment thread here):

And the comments here on the MIRI paper “Cheating Death in Damascus” briefly mention Spohn.

Finally, a post from a few months ago also mentions Spohn:

It probably doesn’t help that he’s a German philosopher; language barriers are a thing.

On the other hand, his research interests seem to have lots of overlap with LW content:

In looking through Spohn’s oeuvre, here are a couple of papers that I think will be of interest on LessWrong, although I have not read past the very beginnings.

“Dependency Equilibria” (2007). “Its philosophical motive is to rationalize cooperation in the one shot prisoners’ dilemma.”

“The Epistemology and Auto-Epistemology of Temporal Self-Location and Forgetfulness” (2017). It relates to the Sleeping Beauty problem.

More stuff:

He co-initiated the framework program “New Frameworks of Rationality” (German Wikipedia, German-only website) which seems to have been active in 2012~2018. Their lists of publications and conferences are in English, as are most of the books in this list.

(Incidentally, there’s apparently a book called Von Rang und Namen: Philosophical Essays in Honour of Wolfgang Spohn.)

And since 2020, he leads this project on Reflexive Decision & Game Theory (longer outline). The site doesn’t list any results of this project, but presumably some of Spohn’s papers since 2020 are related to this topic.

Saying that Set Theory “solved the problem” by introducing restricted Comprehension is maybe a stretch.

Restricted Comphrension prevents the question from even being asked. So, it “solves it” by removing the object from the domain of discourse.

The Incompleteness Theorems are Meta-Theorems talking about Proper Theorems.

I’m not sure Set Theory has really solved the self-reference problem in any, real sense besides avoiding it. ( which may be the best solution possible)

The closest might be the Recursion Theorems, which allow functions to “build-themselves” by referencing earlier versions of themself. But, that isn’t proper self-reference.

My issue with the practicality of any kind of real world computation of self reference, is I believe it would require infinite time/energy/space. As each time you “update” your current self, you change, and so would need to update again- etc.. You could approximate a RDT decision tree, but not apply it. The exception being for “fixed-points” Decisions which stay constant when we apply the RDT decision algorithm.

I was painting with a broad brush and omitting talk of alternatives to Limited Comprehension, because my impression is that ZF (with or without C) is nowadays the standard background for doing set theory. Is there any other in use, not known to be straightforwardly equivalent to ZF? (ETA: I also elided all the history of how the subject developed leading up to ZF. For example, in

Principia MathematicaRussell & Whitehead used some sort of hierarchy of types to avoid the inconsistency. That approach dropped by the wayside, although type systems are back in use in the various current projects to formalise all of mathematics.)Any answer to Russell’s question to Frege must exclude something. There may be other ways of avoiding the inconsistencies, such as the thesis that Adele Lopez linked, but one has to do something about them.

As far as I know for know, all of standard Mathematics is done within ZF + Some Degree of Choice. So it makes sense to restrict discussion to ZF (with C or without).

My comment was a minor nitpick, on the phrasing “in set theory, this is a solved problem”. For me, solved implies that an apparent paradox has been shown under additional scrutiny to not be a paradox. For example, the study of convergent series (in particular the geometric series) solves Zeno’s Paradox of Motion.

In Set Theory, Restricted Comprehension just restricts us from asking the question, “Consider this Set, with Y property” It’s quite a bit different than solving a paradox in my book. Although, it does remove the paradoxical object from our discourse. It’s really more that Axiomatic Set Theory avoids the paradox, rather than solve it.

I want to emphasize that this is a minor nitpick. It actually, ( I believe) serves to strengthen your overall point that RDT is an unsolved problem, I’m just adding that as far as I can tell—I think it’s safe to say this component of RDT ( self-reference) isn’t really adequately addressed in Standard Logic. If we allow self reference, we don’t always produce paradoxes, x = x is hardly, in any way self-evidently paradoxical. But, sometimes we do—such as in Russell’s famous case.

The fact that we don’t have a good rule system ( in standard logic, to my knowledge) for predicting when self-reference produces a paradox indicates it’s still something of an open problem. This may be radical, but I’m basically claiming that restricted Comprehension isn’t a particularly amazing solution for the self-reference problem, it’s something of a throwing the baby out with the bathwater kind of solution. Although, to its merit, that ZF hasn’t produced any contradictions in all these years of study- is an incredible feat.

Your point about, having to sacrifice to solve Russells question is well taken. I think it may be correct, the removal of something may be the best kind of solution possible. In that sense, restricted comprehension may have “solved” the problem, as it may be the only kind of solution we can hope for.

Adele Lopez’s answer was excellent, and I haven’t had a chance to digest the referenced thesis, but it does seem to follow your proposed principle- to answer Russells question we need to omit things.

There’s some interesting research using “exotic” logical systems where unrestricted comprehension can be done consistently (this thesis includes a survey as well as some interesting remarks about how this relates to computability). This can only happen at the expense of things typically taken for granted in logic, of course. Still, it might be a better solution for reasoning about self-reference than the classical set theory system.

My anecdote of a recent experience of decision makes an interesting contrast with all verbal discussion of decision theory.

Here is another:

If we go by analogy with Godel, Turing, and basically anything else involving self-reflection, one has to be pessimistic and place a higher probability on a proper fully consistent RDT simply being impossible, which could actually have some rather dramatic consequences downstream for e.g. AI alignment.

Doing the research necessary to have something more than a vaguely arrived-at probability — that is, a wild guess — would have more dramatic consequences.

Anyway, the precedent of formalising arithmetic and set theory is grounds for optimism: a lot of self-reflection is consistent in those theories.

Well, obviously so. But that sounds more like a PhD program than a LW comment. My point was, there seems to be a trend, and the trend is “self reflection allows for self contradiction and inconsistency”. I imagine the general thrust of a more formal argument could be imagining that there is a “correct” decision theory, imagining a Turing machine implementing such a theory, proving that this machine is in itself Turing-complete (as in, it is always possible to input a decision problem that maps to any arbitrary algorithm) and then it would follow that it being reflexive would require it to be able to solve the halting problem.

I hadn’t put this name to things and I’m happy to have some way of mentally tagging it. It has seemed to me that many sources of ‘irrationality’ in people’s decision theory relate to resistance against negative reflexive effects. Eg why resist information streams from certain sources? Bc being seen to update on certain forms of information creates an incentive to manipulate or goodhart that data stream.

It’s worth repeating Spohn’s arguments from

Where Luce and Krantz Do Really Generalize Savage’s Decision Model:And yet it seems that Spohn no longer believes this.

His solution seems to rely on the ability to precommit to a future action, such that the future action can be treated like an ordinary outcome:

Ifpeople can just “make decisions early”, then one-boxing is, of course, the rational thing to do from the point of CDT. It effectively means you are no longer deciding anything when you are standing in front of the two boxes, you are just slavishly one-boxing as if under hypnotic suggestion, or as if being somehow forced to one-box by your earlier self. Then the “decision” or “act” here can be assigned a probability because it is assumed there is nothing left to decide, it’s effectively just an consequence of the real decision that was made much earlier, consistent with the view that an action in a decision situation may not be assigned a probability.The real problem with the precommitment route is that it assumes the possibility of “precommitment”. Yet in reality, if you “commit” early to some action, and you are later faced with the situation where the action has to be executed, you are still left with the question of whether or not you should “follow through” with your commitment. Which just means your precommitment wasn’t real. You can’t make decisions in advance, you can’t simply force your later self to do things. The actual decision always has to be made in the present, and the supposed “precommitment” of your past self is nothing more than a suggestion.

(The impossibility of precommitment was illustrated in Kavka’s toxin puzzle.)

The toxin puzzle is also referenced extensively in that aforementioned Spohn paper on one-boxing, and his paper is a response to the toxin puzzle as much as it is to two-boxing.

Spohn shows that you can draw causal graphs such that CDT can get rewards in both cases, though only under the assumption that true precommitment is possible. But Spohn doesn’t give arguments for the possibility of precommitment, as far as I can tell.

Isn’t the possibility and, moreover, computability of precommitmet just trivially true?

If you have programm

DT(data), determinimg a decision according to a particular decision theory in the circumstances, specified bydata, then you can easily construct a program PDT(data), determining the decision for the same decision theory but with precommitment:The only thing that is required is an if-statement and

memoryobject which can be implemented via a dictionary.Yes, but I was taking about humans. An AI might have a precommitment ability.

This also seems trivially true to me. I’ve successfully precommited multiple times in my life and I bet you have as well.

What you are probably talking about is the fact that occasionally humans fail at precommitments. But isn’t it an isolated demand for rigor? Humans occasionally fail at following any decision theory, or fail at being rational in general. It doesn’t make all the decision theories and rationality itself incoherent concept which we thus can’t talk about.

Actually, when I think about it, isn’t deciding what decision theory to follow, itself a precommitment?

I often do things because I earlier decided to, overruling whatever feelings I may have in the moment. So from a psychological point of view, precommitment is possible. Why did I pause at Alderford? To let my fatigue clear sufficiently to let the determination to do 100 miles overcome it.

Kavka’s toxin puzzle only works if the intention-detecting machine works, and the argument against rationally drinking the toxin when the time comes could equally well be read as an argument against the possibility of such an intention-detecting machine. Its existence, after all, presupposes that the future decision can be determined at midnight, while the argument against drinking presupposes that it cannot be. An inconsistent thought experiment proves nothing. This example is playing much the same role in decision theory as Russell’s question to Frege did for set theory. It’s pointing to an inconsistency in intuitions around the subject.

Excluding reflectiveness is too strong a restriction, akin to excluding all forms of comprehension axiom from set theory. A precisely formulated limitation is needed that will rule out the intention-detecting machine while allowing the sorts of self-knowledge that people observably use.

But clearly you still made your final decision between 10 and 40 miles only when you were at Alderford. Not hours before that. Our past selves can’t simply force us to do certain things, the memory of a past “commitment” is only one factor that may influence our present decision making, but it doesn’t replace a decision. Otherwise, always when we “decide” to definitely do an unpleasant task tomorrow rather than today (“I do the dishes tomorrow, I swear!”), we would then tomorrow in fact always follow through with it, which isn’t at all the case. (The Kavka/Newcomb cases are even worse than this, because there it isn’t just irrational akrasia preventing us from executing past “commitments”, but instrumental rationality itself, at least if we believe that CDT captures instrumental rationality.)

A more general remark, somewhat related to reflexivity (reflectivity?): In the

Where Luce and Krantzpaper, Spohn also criticizes Jeffrey for allowing the assignment of probabilities to acts, because for Jeffrey, everything (acts, outcomes, states) is a proposition. And any boolean combination of propositions is a proposition. In his framework, any proposition can be assigned a probability and a utility. But I’m pretty sure Jeffrey’s theory doesn’t strictly require that act probabilities are defined. Moreover, even if they are defined, it doesn’t require them for decisions. That is, for outcomes O and an action A, to calculate the utility U(A) he only requires probabilities of the form P(O|A), which we can treat as a basic probability instead of, frequentist style, a mere abbreviation for the ratio formula P(O∧A)P(A). So P(A) and P(O∧A) can be undefined. In his theory U(A)=P(O|A)U(O∧A)+P(¬O|A)U(¬O∧A) is a theorem. I’m planning a post on explaining Jeffrey’s theory because I think it is way underappreciated. It’s a general theory of utility, rather than just a decision theory which is restricted to “acts” and “outcomes”. To be fair, I don’t know whether that would really help much with elucidating reflectivity. The lesson would probably be something like “according to Jeffrey’s theory you can have prior probabilities for present acts but you should ignore them when making decisions”. The interesting part is that his theory can’t be simply dismissed because others aren’t as general and thus are not a full replacement.Maybe the first question is then what form of “self-knowledge” people do, in fact, observably use. I think we treat memories of past “commitments”/intentions more like non-binding recommendations from a close friend (our “past self”), which we may very well just ignore. Maybe there is an ideal rule of rationality that we

shouldalways adhere to our past commitments, at least if we learn no new information. But I’d say “should” implies “can”, so by contraposition, “not can” implies “not should”. Which would mean if precommitment is not possible for an agent it’s not required by rationality.That is the question at issue.