Resolv­ing the Dr Evil Problem

Deep in Dr Evil’s im­preg­nable fort­ress (para­phrased):

Dr Evil is just about to com­plete his evil plan of des­troy­ing the Earth, when he re­ceives a mes­sage from the Philo­sophy De­fence Force on Mars. They have cre­ated a clone in the ex­act same sub­ject­ive situ­ation Dr Evil now oc­cu­pies; he be­lieves he is Dr Evil and is cur­rently in a re­cre­ation of the fort­ress. If the clone of Dr Evil tries to des­troy the Earth, they will tor­ture him, oth­er­wise they will treat him well. Dr Evil wants to des­troy the Earth, but he would prefer to avoid be­ing tor­tured much, much more and he is now un­cer­tain about whether he should sur­render or not. Should Dr Evil sur­render?

The pa­per then con­cludes:

I con­clude that Dr. Evil ought to sur­render. I am not en­tirely com­fort­able with that con­clu­sion. For if INDIFFERENCE is right, then Dr. Evil could have pro­tec­ted him­self against the PDF’s plan by (in ad­vance) in­stalling hun­dreds of brains in vats in his battle­sta­tion—each brain in a sub­ject­ive state match­ing his own, and each sub­ject to tor­ture if it should ever surrender

This art­icle will ad­dress two areas of this prob­lem:

  • Firstly, it ar­gue that Dr Evil should sur­render, how­ever fo­cus­ing on a dif­fer­ent path, par­tic­u­larly what it means to “know” and how this is a leaky abstraction

  • Se­condly, it will ar­gue that hun­dreds of brains in a vat would in­deed se­cure him against this kind of blackmail

I’ll note that this prob­lem is closely re­lated to The AI that Boxes You. Several people noted there that you could avoid black­mail by pre-com­mit­ting to re­boot the AI if it tried to threaten you, al­though my in­terest is in the ra­tional be­ha­viour of an agent who has failed to pre-com­mit.

Why Dr Evil Should Surrender

I think there’s a fram­ing ef­fect in the ques­tion that is quite mis­lead­ing. Regard­less of whether we say, “You are Dr Evil. What do you do?” or “What should Dr Evil do?” we are as­sum­ing that you or a third-party Dr Evil knows that they are Dr Evil? However that’s an as­sump­tion that needs to be ques­tioned.

If you knew that you were Dr Evil, then know­ing that a clone has been cre­ated and placed in a situ­ation that ap­pears sim­ilar wouldn’t change what you should do if you don’t care about the clone. However, you strictly can’t ever ac­tu­ally know that you are Dr Evil. All you can know is that you have memor­ies of be­ing Dr Evil and you ap­pear to be in Dr Evil’s fort­ress (tech­nic­ally we could doubt this too. Maybe Dr Evil doesn’t ac­tu­ally ex­ist, but we don’t have to go that far to prove our point).

Be­fore this event, you would have placed the prob­ab­il­ity of you be­ing Dr Evil really high as you had no reason to be­lieve that you might have been a clone or in a sim­u­la­tion. After you re­ceive the mes­sage, you have to rate the chance of you be­ing a clone much higher and this breaks the leaky ab­strac­tion that nor­mally al­lows you to say that you know you are Dr Evil.

If we did say that you knew you were Dr Evil, then on re­ceiv­ing the mes­sage, you would some­how have to ma­gic­ally come to un-know some­thing without someone eras­ing your memor­ies or oth­er­wise in­ter­fer­ing with your brain. However, since you only know that you have memor­ies of be­ing Dr Evil, you haven’t lost in­form­a­tion You’ve ac­tu­ally gained it, no noth­ing ma­gical is hap­pen­ing at all.

Why Clones Pro­tect Against the Attack

The idea of cre­at­ing clones to pro­tect your­self against these kinds of at­tacks seems weird. However, I will ar­gue that this strategy is ac­tu­ally ef­fect­ive.

I’ll first note that the idea of set­ting up a pun­ish­ment to pre­vent you giv­ing in to black­mail isn’t un­usual at all. It’s well known that if you can cred­ibly pre-com­mit, there’s no in­cent­ive to black­mail you. If you have a device that will tor­ture you if you sur­render, then have no in­cent­ive to sur­render un­less the ex­pec­ted harm from not sur­ren­der­ing ex­ceeds the tor­ture.

Per­haps then this is­sue is that you want to pro­tect your­self by in­ten­tion­ally lim­it­ing mess­ing up your be­liefs about what is true? There’s no reason why this shouldn’t be ef­fect­ive. If you can self-modify your­self to dis­be­lieve any black­mail threat, no-one can black­mail you. One way to do this would be to self-modify your­self to be­lieve you are in a sim­u­la­tion that will end just be­fore you are tor­tured. Al­tern­at­ively, you could pro­tect your­self by self-modi­fy­ing to be­lieve that you would be tor­tured worse if you ac­cep­ted the black­mail. Creat­ing the clones is just an­other way of achiev­ing this, though less ef­fect­ive as you only be­lieve there is a prob­ab­il­ity that you will be tor­tured if you sur­render.