Resolving the Dr Evil Problem

Deep in Dr Evil’s im­preg­nable fortress (para­phrased):

Dr Evil is just about to com­plete his evil plan of de­stroy­ing the Earth, when he re­ceives a mes­sage from the Philos­o­phy Defence Force on Mars. They have cre­ated a clone in the ex­act same sub­jec­tive situ­a­tion Dr Evil now oc­cu­pies; he be­lieves he is Dr Evil and is cur­rently in a recre­ation of the fortress. If the clone of Dr Evil tries to de­stroy the Earth, they will tor­ture him, oth­er­wise they will treat him well. Dr Evil wants to de­stroy the Earth, but he would pre­fer to avoid be­ing tor­tured much, much more and he is now un­cer­tain about whether he should sur­ren­der or not. Should Dr Evil sur­ren­der?

The pa­per then con­cludes:

I con­clude that Dr. Evil ought to sur­ren­der. I am not en­tirely com­fortable with that con­clu­sion. For if INDIFFERENCE is right, then Dr. Evil could have pro­tected him­self against the PDF’s plan by (in ad­vance) in­stal­ling hun­dreds of brains in vats in his bat­tlesta­tion—each brain in a sub­jec­tive state match­ing his own, and each sub­ject to tor­ture if it should ever surrender

This ar­ti­cle will ad­dress two ar­eas of this prob­lem:

  • Firstly, it ar­gue that Dr Evil should sur­ren­der, how­ever fo­cus­ing on a differ­ent path, par­tic­u­larly what it means to “know” and how this is a leaky abstraction

  • Se­condly, it will ar­gue that hun­dreds of brains in a vat would in­deed se­cure him against this kind of blackmail

I’ll note that this prob­lem is closely re­lated to The AI that Boxes You. Sev­eral peo­ple noted there that you could avoid black­mail by pre-com­mit­ting to re­boot the AI if it tried to threaten you, al­though my in­ter­est is in the ra­tio­nal be­havi­our of an agent who has failed to pre-com­mit.

Why Dr Evil Should Surrender

I think there’s a fram­ing effect in the ques­tion that is quite mis­lead­ing. Re­gard­less of whether we say, “You are Dr Evil. What do you do?” or “What should Dr Evil do?” we are as­sum­ing that you or a third-party Dr Evil knows that they are Dr Evil? How­ever that’s an as­sump­tion that needs to be ques­tioned.

If you knew that you were Dr Evil, then know­ing that a clone has been cre­ated and placed in a situ­a­tion that ap­pears similar wouldn’t change what you should do if you don’t care about the clone. How­ever, you strictly can’t ever ac­tu­ally know that you are Dr Evil. All you can know is that you have mem­o­ries of be­ing Dr Evil and you ap­pear to be in Dr Evil’s fortress (tech­ni­cally we could doubt this too. Maybe Dr Evil doesn’t ac­tu­ally ex­ist, but we don’t have to go that far to prove our point).

Be­fore this event, you would have placed the prob­a­bil­ity of you be­ing Dr Evil re­ally high as you had no rea­son to be­lieve that you might have been a clone or in a simu­la­tion. After you re­ceive the mes­sage, you have to rate the chance of you be­ing a clone much higher and this breaks the leaky ab­strac­tion that nor­mally al­lows you to say that you know you are Dr Evil.

If we did say that you knew you were Dr Evil, then on re­ceiv­ing the mes­sage, you would some­how have to mag­i­cally come to un-know some­thing with­out some­one eras­ing your mem­o­ries or oth­er­wise in­terfer­ing with your brain. How­ever, since you only know that you have mem­o­ries of be­ing Dr Evil, you haven’t lost in­for­ma­tion You’ve ac­tu­ally gained it, no noth­ing mag­i­cal is hap­pen­ing at all.

Why Clones Pro­tect Against the Attack

The idea of cre­at­ing clones to pro­tect your­self against these kinds of at­tacks seems weird. How­ever, I will ar­gue that this strat­egy is ac­tu­ally effec­tive.

I’ll first note that the idea of set­ting up a pun­ish­ment to pre­vent you giv­ing in to black­mail isn’t un­usual at all. It’s well known that if you can cred­ibly pre-com­mit, there’s no in­cen­tive to black­mail you. If you have a de­vice that will tor­ture you if you sur­ren­der, then have no in­cen­tive to sur­ren­der un­less the ex­pected harm from not sur­ren­der­ing ex­ceeds the tor­ture.

Per­haps then this is­sue is that you want to pro­tect your­self by in­ten­tion­ally limit­ing mess­ing up your be­liefs about what is true? There’s no rea­son why this shouldn’t be effec­tive. If you can self-mod­ify your­self to dis­be­lieve any black­mail threat, no-one can black­mail you. One way to do this would be to self-mod­ify your­self to be­lieve you are in a simu­la­tion that will end just be­fore you are tor­tured. Alter­na­tively, you could pro­tect your­self by self-mod­ify­ing to be­lieve that you would be tor­tured worse if you ac­cepted the black­mail. Creat­ing the clones is just an­other way of achiev­ing this, though less effec­tive as you only be­lieve there is a prob­a­bil­ity that you will be tor­tured if you sur­ren­der.