Resolving the Dr Evil Problem

Deep in Dr Evil’s impregnable fortress (paraphrased):

Dr Evil is just about to complete his evil plan of destroying the Earth, when he receives a message from the Philosophy Defence Force on Mars. They have created a clone in the exact same subjective situation Dr Evil now occupies; he believes he is Dr Evil and is currently in a recreation of the fortress. If the clone of Dr Evil tries to destroy the Earth, they will torture him, otherwise they will treat him well. Dr Evil wants to destroy the Earth, but he would prefer to avoid being tortured much, much more and he is now uncertain about whether he should surrender or not. Should Dr Evil surrender?

The paper then concludes:

I conclude that Dr. Evil ought to surrender. I am not entirely comfortable with that conclusion. For if INDIFFERENCE is right, then Dr. Evil could have protected himself against the PDF’s plan by (in advance) installing hundreds of brains in vats in his battlestation—each brain in a subjective state matching his own, and each subject to torture if it should ever surrender

This article will address two areas of this problem:

  • Firstly, it argue that Dr Evil should surrender, however focusing on a different path, particularly what it means to “know” and how this is a leaky abstraction

  • Secondly, it will argue that hundreds of brains in a vat would indeed secure him against this kind of blackmail

I’ll note that this problem is closely related to The AI that Boxes You. Several people noted there that you could avoid blackmail by pre-committing to reboot the AI if it tried to threaten you, although my interest is in the rational behaviour of an agent who has failed to pre-commit.

Why Dr Evil Should Surrender

I think there’s a framing effect in the question that is quite misleading. Regardless of whether we say, “You are Dr Evil. What do you do?” or “What should Dr Evil do?” we are assuming that you or a third-party Dr Evil knows that they are Dr Evil? However that’s an assumption that needs to be questioned.

If you knew that you were Dr Evil, then knowing that a clone has been created and placed in a situation that appears similar wouldn’t change what you should do if you don’t care about the clone. However, you strictly can’t ever actually know that you are Dr Evil. All you can know is that you have memories of being Dr Evil and you appear to be in Dr Evil’s fortress (technically we could doubt this too. Maybe Dr Evil doesn’t actually exist, but we don’t have to go that far to prove our point).

Before this event, you would have placed the probability of you being Dr Evil really high as you had no reason to believe that you might have been a clone or in a simulation. After you receive the message, you have to rate the chance of you being a clone much higher and this breaks the leaky abstraction that normally allows you to say that you know you are Dr Evil.

If we did say that you knew you were Dr Evil, then on receiving the message, you would somehow have to magically come to un-know something without someone erasing your memories or otherwise interfering with your brain. However, since you only know that you have memories of being Dr Evil, you haven’t lost information You’ve actually gained it, no nothing magical is happening at all.

Why Clones Protect Against the Attack

The idea of creating clones to protect yourself against these kinds of attacks seems weird. However, I will argue that this strategy is actually effective.

I’ll first note that the idea of setting up a punishment to prevent you giving in to blackmail isn’t unusual at all. It’s well known that if you can credibly pre-commit, there’s no incentive to blackmail you. If you have a device that will torture you if you surrender, then have no incentive to surrender unless the expected harm from not surrendering exceeds the torture.

Perhaps then this issue is that you want to protect yourself by intentionally limiting messing up your beliefs about what is true? There’s no reason why this shouldn’t be effective. If you can self-modify yourself to disbelieve any blackmail threat, no-one can blackmail you. One way to do this would be to self-modify yourself to believe you are in a simulation that will end just before you are tortured. Alternatively, you could protect yourself by self-modifying to believe that you would be tortured worse if you accepted the blackmail. Creating the clones is just another way of achieving this, though less effective as you only believe there is a probability that you will be tortured if you surrender.