A Reaction to Wolfgang Schwarz’s “On Functional Decision Theory”

So I finished reading On Functional Decision Theory by Wolfgang Schwarz. In this critique of FDT, Schwarz makes quite some claims I either find to be unfair criticism of FDT or just plain wrong—and I think it’s interesting to discuss them. Let’s go through them one by one. (Note that this post will not make much sense if you aren’t familiar with FDT, which is why I linked the paper by Yudkowsky and Soares.)

Schwarz first defines three problems:

Blackmail. Donald has committed an indiscretion. Stormy has found out and considers blackmailing Donald. If Donald refuses and blows Stormy’s gaff, she is revealed as a blackmailer and his indiscretion becomes public; both suffer. It is better for Donald to pay hush money to Stormy. Knowing this, it is in Stormy’s interest to blackmail Donald. If Donald were irrational, he would blow Stormy’s gaff even though that would hurt him more than paying the hush money; knowing this, Stormy would not blackmail Donald. So Donald would be better off if here were (known to be) irrational.

Prisoner’s Dilemma with a Twin. Twinky and her clone have been arrested. If they both confess, each gets a 5 years prison sentence. If both remain silent, they can’t be convicted and only get a 1 year sentence for obstructing justice. If one confesses and the other remains silent, the one who confesses is set free and the other gets a 10 year sentence. Neither cares about what happens to the other. Here, confessing is the dominant act and the unique Nash equilibrium. So if Twinky and her clone are rational, they’ll each spend 5 years in prison. If they were irrational and remained silent, they would get away with 1 year.

Newcomb’s Problem with Transparent Boxes. A demon invites people to an experiment. Participants are placed in front of two transparent boxes. The box on the left contains a thousand dollars. The box on the right contains either a million or nothing. The participants can choose between taking both boxes (two-boxing) and taking just the box on the right (one-boxing). If the demon has predicted that a participant one-boxes, she put a million dollars into the box on the right. If she has predicted that a participant two-boxes, she put nothing into the box. The demon is very good at predicting, and the participants know this. Each participant is only interested in getting as much money as possible. Here, the rational choice is to take both boxes, because you are then guaranteed to get $1000 more than if you one-box. But almost all of those who irrationally take just one box end up with a million dollars, while most of those who rationally take both boxes leave with $1000.

Blackmail is a bit vaguely defined here, but the question is whether or not Donald should pay if he actually gets blackmailed—given that he prefers paying to blowing Stormy’s gaff and of course prefers not being blackmailed above all. Aside from this, I disagree with the definitions of rational and irrational Schwarz uses here, but that’s partly the point of this whole discussion.

Schwarz goes on to say Causal Decision Theory (CDT) will pay on Blackmail, confess on Prisoner’s Dilemma with a Twin and two-box on Newcomb’s Problem with Transparent Boxes. FDT will not pay, remain silent and one-box, respectively. So far we agree.

However, Schwarz also claims “there’s an obvious sense in which CDT agents fare better than FDT agents in the cases we’ve considered”. On Blackmail, he says: “You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay!” (Apparently the hush money is $1.) It may seem this way, because given Donald is already blackmailed, paying is better than not paying, and FDT recommends not paying while CDT pays. But it’s worth noting that this is totally irrelevant, since FDT agents never end up in this scenario anyway. The problem statement specifies Stormy would know an FDT agent wouldn’t pay, so she wouldn’t blackmail such an agent. Schwarz acknowledges this point later on, but doesn’t seem to realize it completely refutes his earlier point of CDT doing better in “an obvious sense”.

Surprisingly, Schwarz doesn’t analyze CDT’s and FDT’s answer to Prisoner’s Dilemma with a Twin (besides just giving the answers). It’s worth noting FDT clearly does better than CDT here, because the FDT agent (and its twin) both get away with 1 year in prison while the CDT agent and its twin both get 5. This is because the agents and their twins are clones—and therefore have the same decision theory and thus reach the same conclusion to this problem. FDT recognizes this, but CDT doesn’t. I am baffled Schwarz calls FDT’s recommendation on this problem “insane”, as it’s easily the right answer.

Newcomb’s Problem with Transparent Boxes is interesting. Given the specified scenario, two-boxing outperforms one-boxing, but this is again irrelevant. Two-boxing results in a logically impossible scenario (given perfect prediction), since then Omega would have predicted you two-box and put nothing in the right box. Given less-then-perfect (but still good) prediction, the scenario is still very unlikely: it’s one two-boxers almost never end up in. It’s the one-boxers who get the million. Schwarz again acknowledges this point—and again he doesn’t seem to realize it means CDT doesn’t do better in an obvious sense.

Edit: Vladimir Nesov left a comment which made me realize my above analysis of Newcomb’s Problem with Transparent Boxes is a reaction to the formulation in Yudkowsky and Soares’ paper instead of the formulation by Schwarz. Since Schwarz left his formulation relatively unspecified, I’ll leave the above analysis for what it is. However, note that it is assumed the demon filled the left box if and only if she predicted the participant leaves the left box behind upon seeing two full boxes. The question, then, is what to do upon seeing two full boxes.

So there’s an obvious sense in which CDT agents fare better than FDT agents in the cases we’ve considered. But there’s also a sense in which FDT agents fare better. Here we don’t just compare the utilities scored in particular decision problems, but also the fact that FDT agents might face other kinds of decision problems than CDT agents. For example, FDT agents who are known as FDT agents have a lower chance of getting blackmailed and thus of facing a choice between submitting and not submitting. I agree that it makes sense to take these effects into account, at least as long as they are consequences of the agent’s own decision-making dispositions. In effect, we would then ask what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime. Even then, however, there is no guarantee that FDT would come out better. What if someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options? In such an environment, the engineer would be wise not build an FDT agent.

I agree for a large part. I care about FDT from the perspective of building the right decision theory for an A(S)I, in which case it is about something like scoring the most utility across a lifetime. The part of the quote about FDT agents being worse off if someone directly punishes “agents who use FDT” is moot though. What if someone decides to punish agents for using CDT?

Schwarz continues with an interesting decision problem:

Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there’s a significant probability that I wouldn’t exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)

He says:

In Procreation, FDT agents have a much worse life than CDT agents.

True, but things are a bit more complicated than this. An FDT agent facing Procreation recognizes the subjunctive dependence her and her father have on FDT, and, realizing she wants to have been born, procreates. A CDT agent with an FDT father doesn’t have this subjunctive dependence (and wouldn’t use it if she did) and doesn’t procreate, gaining more utils than the FDT agent. But note that the FDT agent is facing a different problem than the CDT agent: she faces one where her father has the same decision theory she does. The CDT agent doesn’t have this issue. What if we put the FDT agent in a modified Procreation problem, one where her father is a CDT agent? Correctly realizing she can make a decision other than that of her father, she doesn’t procreate. Obviously, in this scenario, the CDT agent also doesn’t procreate—even though, through subjunctive dependence, her decision is the exact same as her father’s. So, here the CDT agent does worse, because her father wouldn’t have procreated either and she isn’t even born. So, this gives us two scenarios: one where the FDT agent procreates and lives miserably while the CDT agent lives happily, and one where the FDT agent lives happily while the CDT agent doesn’t live at all. FDT is, again, the better decision theory.

It seems, then, we can construct a more useful version of Procreation, called Procreation*:

Procreation*. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and I know he followed the same decision theory I do. If my decision theory were to recommend not procreating, there’s a significant probability that I wouldn’t exist. I value a miserable life to no life at all, but obviously I value a happy life to a miserable one. Should I procreate?

FDT agents procreate and live miserably—CDT agents don’t procreate and, well, don’t exist since their father didn’t procreate either.

All that said, I agree that there’s an apparent advantage of the “irrational” choice in cases like Blackmail or Prisoner’s Dilemma with a Twin, and that this raises an important issue. The examples are artificial, but structurally similar cases arguably come up a lot, and they have come up a lot in our evolutionary history. Shouldn’t evolution have favoured the “irrational” choices?

Not necessarily. There is another way to design agents who refuse to submit to blackmail and who cooperate in Prisoner Dilemmas. The trick is to tweak the agents’ utility function. If Twinky cares about her clone’s prison sentence as much as about her own, remaining silent becomes the dominant option in Prisoner’s Dilemma with a Twin. If Donald develops a strong sense of pride and would rather take Stormy down with him than submit to her blackmail, refusing to pay becomes the rational choice in Blackmail.

“The trick is to tweak the agents’ utility function.”? No. I mean, sure, Twinky, it’s good to care about others. I do, so does almost everybody. But this completely misses the point. In the above problems, the utility function is specified. Tweaking it gives a new problem. If Twinky indeed cares about her clone’s prison years as much as she does about her own, then the payoff matrix would become totally different. I realize that’s Schwarz’s point, because that gives a new dominant option—but it literally doesn’t solve the actual problem. You solve a decision problem by taking one of the allowed actions—not by changing the problem itself. Deep Blue didn’t define the opening position as a winning position in order to beat Kasparov. All Schwarz does here is defining new problems CDT does solve correctly. That’s fine, but it doesn’t solve the issue that CDT still fails the original problems.

FDT agents rarely find themselves in Blackmail scenarios. Neither do CDT agents with a vengeful streak. If I wanted to design a successful agent for a world like ours, I would build a CDT agent who cares what happens to others.

Of course, me too (a vengeful streak though? That’s not caring about others). So would Yudkowsky and Soares. But don’t you think a successful agent should have a decision theory that can at least solve the basic cases like Newcomb’s Problem, with or without transparent boxes? Also note how Schwarz is making ad hoc adjustments for each problem: Twinky has to care about her clone’s prison time, while Donald has to have a sense of pride/​vengeful streak.

My CDT agent would still two-box in Newcomb’s Problem with Transparent Boxes (or in the original Newcomb Problem). But this kind of situation practically never arises in worlds like ours.

But if we can set up a scenario that breaks your decision theory even when we do allow modifying utility functions, that points to a serious flaw in your theory. Would you trust to build it into an Artificial Superintelligence?

Schwarz goes on to list a number of points of questions he has/​unclarities he found in Yudkowsky and Soares’ paper, which I don’t find relevant for the purpose of this post. So this is where I conclude my post: FDT is still standing, and not only that: it is better than CDT.