The Epistemic Prisoner’s Dilemma

Let us say you are a doctor, and you are dealing with a malaria epidemic in your village. You are faced with two problems. First, you have no access to the drugs needed for treatment. Second, you are one of two doctors in the village, and the two of you cannot agree on the nature of the disease itself. You, having carefully tested many patients, being a highly skilled, well-educated diagnostician, have proven to yourself that the disease in question is malaria. Of this you are >99% certain. Yet your colleague, the blinkered fool, insists that you are dealing with an outbreak of bird flu, and to this he assigns >99% certainty.

Well, it need hardly be said that someone here is failing at rationality. Rational agents do not have common knowledge of disagreements etc. But… what can we say? We’re human, and it happens.

So, let’s say that one day, OmegaDr. House calls you both into his office and tells you that he knows, with certainty, which disease is afflicting the villagers. As confident as you both are in your own diagnoses, you are even more confident in House’s abilities. House, however, will not tell you his diagnosis until you’ve played a game with him. He’s going to put you in one room and your colleague in another. He’s going to offer you a choice between 5,000 units of malaria medication, and 10,000 units of bird-flu medication. At the same time, he’s going to offer your colleague a choice between 5,000 units of bird-flu meds, and 10,000 units of malaria meds.

(Let us assume that keeping a malaria patient alive and healthy takes the same number of units of malaria meds as keeping a bird flu patient alive and healthy takes bird flu meds).

You know the disease in question is malaria. The bird-flu drugs are literally worthless to you, and the malaria drugs will save lives. You might worry that your colleague would be upset with you for making this decision, but you also know House is going to tell him that it was actually malaria before he sees you. Far from being angry, he’ll embrace you, and thank you for doing the right thing, despite his blindness.

So you take the 5,000 units of malaria medication, your colleague takes the 5,000 units of bird-flu meds (reasoning in precisely the same way), and you have 5,000 units of useful drugs with which to fight the outbreak.

Had you each taken that which you supposed to be worthless, you’d be guaranteed 10,000 units. I don’t think you can claim to have acted rationally.

Now obviously you should be able to do even better than that. You should be able to take one another’s estimates into account, share evidence, revise your estimates, reach a probability you both agree on, and, if the odds exceed 2:1 in one direction or the other, jointly take 15,000 units of whatever you expect to be effective, and otherwise get 10,000 units of each. I’m not giving out any excuses for failing to take this path.

But still, both choosing the 5,000 units strictly loses. If you can agree on nothing else, you should at least agree that cooperating is better than defecting.

Thus I propose that the epistemic prisoner’s dilemma, though it has unique features (the agents differ epistemically, not preferentially) should be treated by rational agents (or agents so boundedly rational that they can still have persistent disagreements) in the same way as the vanilla prisoner’s dilemma. What say you?

• Another example:

Pretend we have two people, a Republican and a Democrat, who can each donate to three charities: The Republican party, the Democratic Party, and a non-political charity.

Both people’s utility is increasing in the amount of resources that their party and the non-political charity gets. And, as you would expect, the Republican is made worse off the more the Democratic party gets and the Democrat is made worse off the more the Republican party gets.

The two people would benefit from an agreement in which they each agreed to give a higher percentage of their charitable dollars to the non-political charity than the would have absent this agreement.

To the best of my knowledge, such agreements are never made.

• But similar agreements are made in politics. A close (but not isomorphic) example would be vote-trading and especially vote-pairing.

Perhaps it’s just that no one has suggested it before or gone through with it. Dozens of people have suggested prediction registries or prediction markets, for example, but the number of working & large examples can be counted on one hand.

• Yes, this should be treated the same as PD. What are you driving at?

To my mind there’s not much point to PD except this: if someone else carelessly controls a variable you care about, and the problem has been carefully construed to prohibit you from manipulating their decision… well, it sucks to be you.

• It doesn’t matter how the preference order of the players came to be as it is. Cooperation is a Pareto improvement according to that preference order, and in this case the only one available, so go for it.

• Seems pretty similar to the “true prisoner’s dilemma”.

If both of you can self-bind (i.e. self-modify to act in a specific way in one specific future situation) and prove you’ve done so, you can make a mutual agreement to each take the 10k units.

If you can provably self-commit but the other doctor can’t, make a one-sided commitment along the following lines: “I’ll take the 10k units, and afterwards will trade them to you for 10k units of the other medicine. If you don’t want to trade for the medicine I got, I will destroy it completely, even if by that point I know that this will condemn 10k people to die horribly.”.

Obviously, if the other doctor can provably self-bind and you can’t, suggest that they commit along the same lines.

If one of you is willing to self-bind on this and can prove what code their mind is implemented with, that should (given arbitrarily large computing power) be enough to prove that they have done so.

Alternatively, if you like to cheat: Take the more expensive package of medicine, and if it turns out you took the set for the wrong disease, trade it for a larger amount of the other medicine on the market.

• Assuming the 99% likelihood I assign to the disease being malaria doesn’t change, if I can’t communicate with my colleague I obviously take the 5000 units of malaria meds. If I can communicate, I’ll do my best to convince my colleague to cooperate so he takes the 10 000 units of malaria meds, and then I take the other 5000 units of malaria meds.

Either I save 5000 people or I save 15 000 people with 99% likelihood (instead of saving 0 or 10 000 people), which is similar to avoiding 5 years or 10 years in prison (instead of avoiding 0 years or 9.5 years). So yeah, it is similar to the prisoner’s dilemma.

• It seems like you assume implicitly that there’s an equal probability of the other doctor defecting: (0 + 10,000)/​2 < (5,000 + 15,000)/​2. That makes sense in the original prisoner’s dilemma, but given that you can communicate, why assume this?

• It doesn’t make a difference. I’m better off defecting no matter what the other doctor does. Like I said, I’ll try to convince him to cooperate and then I’ll break our agreement. If I succeed, good for me; if I fail, at least I’ll have saved 5000 people.

That’s only if there’s a single iteration of this dilemma, of course. If I have reason to believe there will be three iterations and if I’m pretty sure I managed to convince the other doctor, I should cooperate (10000 * 3 > 15 000 + 5000 + 5000).

• What if you’re wrong?

• What if I’m wrong? Well, what if my house gets hit by a meteor today, and I get seriously wounded? Should I then regret not having left my house today?

I could wish I had left, but regretting my decision would be silly. We can only ever make decisions with the information that’s available to us at the moment. Right now I have every reason to believe my house will not get hit by a meteor, and I feel like staying at home, so that’s the best decision. Likewise, in the OP’s scenario I have every reason to believe the disease is malaria, so getting my hands on as much malaria medication as I can is the best decision. That’s all there is to it.

• But in this case, someone with a degree of astronomical knowledge comparable to yours, acting in good faith, has come up to you and has said “I’m 99% confident that a meteor will hit your house today. You should leave.” Why not investigate his claim before dismissing it?

• The original post specifies that even taking account of the other doctor’s opinion, we’re still 99% sure. This seems pretty unlikely, unless we know that the other doctor is really very rationally deficient, but it’s the scenario we’re discussing.

• Out of curiosity, do you cooperate or defect against an unfriendly superintelligence in the regular prisoner’s dilemma?

• I’m one of the human beings that Eliezer has so much trouble imagining: While I’m not (entirely) selfish myself, I have no trouble acting as if I were completely selfish for the purpose of playing in the vanilla prisoner’s dilemma. Consequently, it’s of no relevance to me that the other agent is an unfriendly superintelligence, rather than a friendly human being. I defect in both cases.

• How many people are there in the village? If there are less than 5000, it makes no differences how many doses of the correct drug we have, so long as we have at least 5000. Otherwise...

• (And if there are between 5000 and 10,000 people, the two doctors can just agree to take each the 10,000 doses of the other drug, so that whichever the true disease is, we have 10,000 doses of the drug for it. This only fully resembles the Prisoner’s Dilemma if there are more than 10,000 people.)

• 15 May 2009 20:17 UTC
1 point

Well, it need hardly be said that someone here is failing at rationality. Rational agents do not have common knowledge of disagreements etc.

That’s not strictly true. They could both be completely rational but each assume the other is irrational or dishonest.

• I need a clarification. In the PD with the supertintelligence, you are not in a position to negotiate because values cannot be negotiated, right? However, epistemology can be negotiated, right? You can point out the various symptoms that make it sure that it is malaria. he can point out the various symptoms that make it sure it is bird flu. You can agree to tests and counter-tests.

There are no tests for wants and desires. But we can agree about ways to find out, can’t we?

• Building the typical payoff matrix:

C,C − 10k, 10k (after trade)
C,D − 0k, 15k (assuming I give him what I consider useless)
D,C − 15k, 0k (and vice versa)
D,D − 5k, 5k

I am not tracking how this is any different than the standard model except you can communicate beforehand. Does it have to do with the >99% certainty of value?

(Note) Thinking and typing here, so no guarantee of value. If someone sees a misstep though, please point it out. I am still learning this.

So… that would >99% certainty of 15k, 15k on cooperation, <1% certainty of 15k, 15k. Actually, no, that would be 15k, 0k if I was right and 0k, 15k if he was right. So ignore that, here is a new matrix.

C,C - >99% 10k, 0k; 99% 0k, 0k; 99% 15k, 0k; 99% 5k, 0k; <1% 0k, 5k

I suppose, since we are assuming trading it would make it simpler to just consider the rewards as pooled afterwards. It really makes no difference which doctor ends up with the medicine.

C,C − 100% 10k; 99% 0k; 99% 15k; 99% 5k; <1% 5k

Taking out the failed diagnoses:

C,C − 100% 10k
C,D − 99% 15k
D,D − 100% 5k

Yeah, that is not a prisoner’s dilemma. Especially if you can convince the other doctor to cooperate. Assuming probability P for the other doctor cooperating (this is the part where I may need help):

C: P 10k + (1 - P) 99% 15k + (1 - P) 5k

And that is as far as I can go. I assume that there is some way to map which choice is better for which values of P.

• When I first read down this, I decided to assume it’s equally likely that he’s right and you’re the fool, thus assigning ~50% probability either way. Taking the 10,000 bird flu meds became a no-brainer.

Another way to look at it is that you consider the malaria meds valuable and the bird flu meds worthless, and vice versa for him. By Ricardo’s law, it’s thus best to take the bird flu medication, and trade with the other doctor before seeing Dr. House, and you can pretend as if you were simply offered 5,000 malaria meds versus 10,000 malaria meds.

This does appear to meet the entire definition of the Prisoner’s Dilemma, but, for some reason, I have much less trouble imagining myself cooperating than in most versions.

• Yes. Given the set-up, it’s a standard prisoners’ dilemma (well, at least if you add the proviso that it’s guaranteed to be one-shot). Iff you can ensure that the other doctor will co-operate iff you co-operate, then co-operate. If not defect.

(Or, more accurately, a probabilistic version of this. E.g. assuming you are risk neutral, co-operate iff the other doctor has at least a 50% greater chance of co-operating if you co-operate than if you defect.)

• I would condition my response on how trustworthy my colleague and I had been on previous diagnoses (and equally consistent with our confidence of our diagnoses). If we had both been equally good, I should trust his judgement as well as mine, then I would pick 10,000 of the wrong medicine and trust him to follow the same reasoning.

• I think you have to take the 5k. The only way it doesn’t leave everyone better off and save lives is if you don’t actually believe your prior of >99%, in which case update your prior. I don’t see how what he does in another room matters. Any reputational effects are overwhelmed by the ability to save thousands of lives.

However, I also don’t see how you can cooperate in a true one-time prisoner’s dilemma without some form of cheating. The true PD presumes that I don’t care at all about the other side of the matrix, so assuming there isn’t some hidden reason to prefer co-operation—there are no reputational effects personally or generally, no one can read or reconstruct my mind, etc—why not just cover the other side’s payoffs? The payoff looks a lot like this: C → X, D → X+1, where X is unknown.

Also, as a fun side observation, this sounds suspiciously like a test designed to figure out which of us actually thinks we’re >99% once we take into account the other opinion and which of us is only >99% before we take into account the other opinion. Dr. House might be thinking that if we order 15k of either medicine that one is right often enough that his work here is done. I’d have to assign that p>0.01, as it’s actually a less evil option than taking him at face value. But I’m presuming that’s not the case and we can trust the scenario as written.

• I think you have to take the 5k. The only way it doesn’t leave everyone better off and save lives is if you don’t actually believe your prior of >99%

I’m not sure how much work and what kind of work the following color does:

You, having carefully tested many patients, being a highly skilled, well-educated diagnostician [is 99% confident one way.] Yet your colleague, the blinkered fool, [is 99% certain the other way]. Well, it need hardly be said that someone here is failing at rationality. [...] You should be able to take one another’s estimates into account, share evidence, revise your estimates, reach a probability you both agree on.

I’m not sure what the assumptions are, here. If I have been using Testing of Patients while my colleague has been practicing Blinkered Folly, by which I mean forming truth-uncorrelated beliefs, taking his estimate into account shouldn’t change my beliefs since they’re not truth-correlated. He has no useful evidence, and he is impervious to mine.

But let’s say we’re playing a game of prisoners dilemma with payouts in two currencies. I can add either 5k malarian dollars or 10k birdfluian dollars to a pot which will be evenly shared by both (since we share the value of healing the sick), while my colleague has the reverse choice. The expected utilities if I’m right and he’s wrong are 99*5k + 1*0k = 4950 and 1*10k + 99*0k = 100. I maximize by defecting. By a similar calculation, my colleague maximizes by cooperating but is too foolishly blinkered to see this.

I guess your point is that if we have commitments available, I can do better than my 5k and his 0k malarian dollars by humoring his delusions and agreeing on \$10,000m + \$10,000b, which he wants because \$10,000b is “better” than the 5k he could get by defecting.

So: sometimes you can act real stupid in ways that bribe crazy people to act even less stupid, thereby increasing social utility compared to the alternative. Or, said another way, people act “rationally” from the perspective of their wrong beliefs—the payoff matrix in terms of malarian dollars reflect the utility of the patients and the rational doctor whereas the payoff matrix in terms of birdfluian dollars reflects the utility of blinkered doctor, where “utility” means “behavior-predicting abstraction” in case of the blinkered doctor.

If it is common knowledge between the two doctors that both practice Epistemic Rationality and Intellectual Virtue but collect disjoint bodies of evidence (I think this goes against the stated assumptions), one doctor’s confidence is evidence of their conclusion to the other doctor, and they should both snap to 50% confidence once they learn about the other doctor’s (previous) certainty. In that scenario it’s a straightforward game of prisoner’s dilemma, the exact same as before except with half the payoffs.

The closest I have come to studying decision theory for irrational agents is reinforcement learning in Markov Decision Processes. Maybe you can argue in favor of epsilon-greedy exploration, on the argument that getting the medicine will not be the last thing the doctors will experience, but then we’re veering into a discussion about how to make and select maps rather than a discussion of how to select a route given the map the article author has drawn.

• Also, as a fun side observation, this sounds suspiciously like a test designed to figure out which of us actually thinks we’re >99% once we take into account the other opinion and which of us is only >99% before we take into account the other opinion. Dr. House might be thinking that if we order 15k of either medicine that one is right often enough that his work here is done. I’d have to assign that p>0.01, as it’s actually a less evil option than taking him at face value. But I’m presuming that’s not the case and we can trust the scenario as written.

I wasn’t going there, but I like the thought =)

• I am having some difficulty imagining that I am 99% sure of something, but I cannot either convince a person to outright agree with me or accept that he is uncertain and therefore should make the choice that would help more if it is right, but I could convince that same person to cooperate in the prisoner’s dilemma. However, if I did find myself in that situation, I would cooperate.

• I’m tipping my hand here, but...

Do you think you could convince a young-earth creationist to cooperate in the prisoner’s dilemma?

• Good point. I probably could. I expect that the young-earth creationist has a huge bias that does not have to interfere with reasoning about the prisoner’s dilemma.

So, suppose Omega finds a young-earth creationist and an atheist, and plays the following game with them. They will each be taken to a separate room, where the atheist will choose between each of them receiving \$10000 if the earth is less than 1 million years old or each receiving \$5000 if the earth is more than 1 million years old, and the young earth creationist will have a similar choice with the payoffs reversed. Now, with prisoner’s dilemma tied to the young earth creationist’s bias, would I, in the role of the atheist still be able to convince him to cooperate? I don’t know. I am not sure how much the need to believe that the earth is around 5000 years would interfere with recognizing that it is in his interest to choose the payoff for earth being over a million years old. But still, if he seemed able to accept it, I would cooperate.

(Edit: Fixed a typo reversing the payoffs.)

• You’ve pretty much written tomorrow’s post for me, though I was going to throw in some existential risk to make things more fun.

• Now obviously you should be able to do even better than that. You should be able to take one another’s estimates into account, share evidence, revise your estimates, reach a probability you both agree on

Hold on. I assume that, in accordance with a normal Prisoner’s Dilemma, the two players aren’t allowed to discuss stuff. So this is out.

Edit: And lo, MBlume added the following to his post:

I’m starting with them together, and free to speak briefly, for reasons which will be clear soon enough.

In that case, they can simply agree to take 10,000 of each. This is obvious.

Edit again: This seems inconsistent with my later view. Hmm.

• “Well, it need hardly be said that someone here is failing at rationality.”

No. The given data does not require that either of the two individuals have failed to be rational.

• Voted down for bald assertion with no argument.

MBlume is relying on an Aumann’s (dis)agreement theorem, which is generally assumed knowledge around these parts. If you don’t think it (or any of it’s generalizations) apply here, please say why.

• Aumann’s agreement theorem—I don’t claim to know all its generalizations—assumes that the two parties have the same priors, and that each knows the other’s “information partition” (i.e., what states of the world the other can distinguish). It also assumes that their knowledge of one another’s posteriors is “common knowledge” in a technical sense. It also assumes that both parties are perfect Bayesians and that this too is “common knowledge”. I see no reason to assume that any of these is true, given MBlume’s description of the situation. (In particular, the assumption regarding what each knows about the other seems to me, from Aumann’s description, to imply more detailed knowledge of one another’s cognitive faculties than any human has of any other’s.)

Clearly someone is failing (very broadly understood) at something, since at least one of the two doctors assigns 99% probability to something untrue. But, e.g., the following is perfectly consistent with the scenario as described (although unlikely):

Both doctors are superlatively intelligent, skillful and well informed (about medicine). Both have done the same, entirely sensible, tests; one has been the victim of extreme bad luck and got evidence at the 99.5% level for a wrong diagnosis. Both have also been victims of further mischance, and each has (unknown to the other) got evidence at the 99.5% level that the other is horrendously incompetent even though that is not actually true. (We can agree, I hope, that all this is possible, albeit very unlikely?)

Now each considers the evidence. A, before learning B’s verdict:
P(malaria & B incompetent) = 0.995^2 ~= 0.99
P(malaria & B competent) = 0.995 . 0.005 ~= 0.005
P(bird flu & B incompetent) = 0.995 . 0.005 ~= 0.005
P(bird flu & B competent) = 0.005 . 0.005 = 0.000025

Now for a Bayesian update based on B’s opinion. Some kinda-plausible figures:
P(B gets given result | malaria & B incompetent) = 0.1
P(B gets given result | malaria & B competent) = 0.001
P(B gets given result | bird flu & B incompetent) = 0.1
P(B gets given result | bird flu & B competent) = 0.99

So the new odds are roughly 0.099 : 0.000005 : 0.0005 : 0.000025, giving a probability of about 99.5% for the “malaria & B incompetent” option.

B goes through an exactly parallel calculation, favouring “bird flu & A incompetent”.

Both doctors have been unlucky, but neither has been irrational. Those who think Aumann’s theorem requires one of them to have been irrational given the data: please explain what’s impossible in the above scenario.

• If each scientist has gotten evidence at the 99.5% level that the other is horrendously incompetent, then they should have no problem convincing each other that the other is incompetent with said evidence. (Unless one of them has additional evidence to defend their competency, in which case they will agree on how the additional evidence should change the assessment.) The idea being that with sufficient communication, they have exactly the same information and thus must make the same conclusions.

On the other hand, the requirement of the same priors is interesting. Mightn’t this be how they could rationally come to different conclusions?

• Aumann’s theorem itself doesn’t say anything about “with sufficient communication”; that’s just one possible way for them to make the relevant stuff “common knowledge”. (Also, remember that the thing about Aumann’s theorem is that the two parties are supposed not to have to share their actual evidence with one another—only their posterior probabilities. And, indeed, only their posterior probabilities for the single event whose probability they are to end up agreeing about.)

The scenario described in the original post here doesn’t say anything about there being “sufficient communication” either.

It seems to me that Aumann’s theorem is one of those (Goedel’s incompleteness theorem is notoriously one, to a much greater extent) where “everyone knows” a simple one-sentence version of it, which sounds exciting and dramatic and fraught with conclusions directly relevant to daily life, and which also happens to be quite different from the actual theorem.

But maybe some of those generalizations of Aumann’s theorem really do amount to saying that rational people can’t agree to disagree. If someone reading this is familiar with a presentation of some such generalization that actually provides details and a proof, I’d be very interested to know.

(For instance, is Hanson’s paper on “savvy Bayesian wannabes” an example? Brief skimming suggests that it still involves technical assumptions that might amount to drastic unrealism about how much the two parties know about one another’s cognitive faculties, and that its conclusion isn’t all that strong in any case—it basically seems to say that if A and B agree to disagree in the sense Hanson defines then they are also agreeing to disagree about how well they think, which doesn’t seem very startling to me even if it turns out to be true without heavy technical conditions.)

• Thank you for the clarification: Aumann’s theorem does not assume that the people have the same information. They just know each other’s posteriors. After reading the original paper, I understand that the concensus comes about iteratively in the following way: they know each other’s conclusions (posteriors). If they have different conclusions, then they must infer that the other has different information, and they modify their posteriors based on this different, unknown information to some extent. They then recompare their posteriors. If they’re still different, they conclude that the other’s evidence must have been stronger than they estimated, and they recalculate. So without actually sharing the information, they deduce the net result of the information by mutually comparing the posteriors.

• In Aumann’s original paper, the statement of the theorem doesn’t involve any assumption that the two parties have performed any sort of iterative procedure. In informal explanations of why the result makes sense, such iterative procedures are usually described. I think this illustrates the point that the innocuous-sounding description of what the two parties are supposed to know (“their posteriors are common knowledge”) conceals more than meets the eye: to get a situation where anything like it is true, you need to assume that they’ve been through some higher-quality information exchange procedure.

• The proof looks at an ordering of posteriors p1, p2, etc, that result from subsequent levels of knowledge of knowledge of the other’s posteriors. However, these are shown to be equal, so in a sense all of the iterations happens in some way simultaneously. -- Actually, I looked at it again and I’m not so sure this is true. It’s how I understand it.

• To be fair, I did neglect to state specifically that we have common knowledge of our probability estimates

• “Well, it need hardly be said that someone here is failing at rationality.”

Right, using the agreement theorem, someone is failing at rationality. Either it is him or it is me. I must conclude that it is him. (If you would like an argument for this, I can provide it, but will skip for now as I suspect it is uncontroversial.)

Given that I have already concluded that my colleague is irrational, I cannot trust him to make a rational decision regarding the choice of drugs. Thus I just need to make the choice that will save at least 5,000 lives. (Note: I cannot predict which decision he will make, since irrational reasoning can lead to either choice. But if he chooses the malaria drugs, then all for the better.)

After we both choose the 5000 of each drug,

I don’t think you can claim to have acted rationally.

What is the “you” here? If the “you” is plural and refers to both me and my colleague, it is expected that we did not act rationally since we already knew we weren’t both rational.

However, I acted rationally, given the information that my colleague would not.

By the way, what is the interpretation “around these parts” of Aumann’s disagreement theorem taken together with the fact that apparently rational people have different solutions to these kinds of dilemma’s? Is the idea that eventually, we’ll reach a consensus?

• Given that I have already concluded that my colleague is irrational, I cannot trust him to make a rational decision regarding the choice of drugs. Thus I just need to make the choice that will save at least 5,000 lives. (Note: I cannot predict which decision he will make, since irrational reasoning can lead to either choice. But if he chooses the malaria drugs, then all for the better.)

Is it a common belief that someone who has acted irrationally with regards to X is unable to act rationally with regards to Y? I am not challenging, just pinging for more information because this came as a surprise.

• I generally assume that the people who read my comments are capable of detecting obvious errors.

Look at the requirements for those theorems to apply, and then look at the conditions MBlume set out.

• Whether or not your assumption is true, your comment added no information. If people are capable of detecting obvious errors, then they would already have done so; if not, then you haven’t helped.

Not only does this style of comment prevent others from learning from you, it also prevents others from actually engaging with your point, so that you might learn from them. Assuming that you have nothing to learn from others is, in general, a poor strategy.

(All of that assumes that you’re not just bluffing and trying to hide the fact that you have no idea what you’re talking about.)