On self-deception

(Meta-note: First post on this site)

I have read the se­quence on self-de­cep­tion/​dou­ble­think and I have some com­ments for which I’d like to so­licit feed­back. This post is go­ing to fo­cus on the idea that it’s im­pos­si­ble to de­ceive one­self, or to make one­self be­lieve some­thing which one knows apri­ori to be wrong. I think Eliezer be­lieves this to be true, e.g. as dis­cussed here. I’d like to pro­pose a con­trary po­si­tion.

Let’s sup­pose that a su­per-in­tel­li­gent AI has been built, and it knows plenty of tricks that no hu­man ever thought of, in or­der to pre­sent a false ar­gu­ment which is not eas­ily de­tectable to be false. Whether it can do that by pre­sent­ing sub­tly wrong premises, or by in­cor­rect gen­er­al­iza­tion, or word tricks, or who knows what, is not im­por­tant. It can, how­ever, pre­sent an ar­gu­ment in a So­cratic man­ner, and like Socrates’ in­ter­locu­tors, you find your­self agree­ing with things you don’t ex­pect to agree with. I now come to this AI, and re­quest it to make a library of books for me (per­son­ally). Each is to be such that if I (speci­fi­cally) were to read it, I would very likely come to be­lieve a cer­tain propo­si­tion. It should take into ac­count that ini­tially I may be op­posed to the propo­si­tion, and that I am aware that I am be­ing ma­nipu­lated. Now, AI pro­duces such a library, on the topic of re­li­gion, for all ma­jor known re­li­gions, A to Z. It has a book called “You should be an athe­ist”, and “You should be a Chris­tian”, etc, up to “You should be a Zoroas­trian”.

Sup­pose, I now want to de­ceive my­self. I throw fair dice, and end up pick­ing a Zoroas­trian book. I now com­mit to read­ing the en­tire book and do so. In the pro­cess I be­come con­vinced that in­deed, I should be a Zoroas­trian, de­spite my ini­tial skep­ti­cism. Now my skep­ti­cal friend comes to me:

Q: You don’t re­ally be­lieve in Zoroas­tri­anism.

A: No, I do. Praise Ahura Mazda!

Q: You can’t pos­si­bly mean it. You know that you didn’t be­lieve it and you read a book that was de­signed to ma­nipu­late you, and now you do? Don’t you have any in­tro­spec­tive abil­ity?

A: I do. I didn’t in­tend to be­lieve it, but it turns out that it is ac­tu­ally true! Just be­cause I picked this book up for the wrong rea­son, doesn’t mean I can’t now be gen­uinely con­vinced. There are many ex­am­ples where peo­ple would study re­li­gion of their en­emy in or­der to dis­credit it and in the pro­cess be­come con­vinced of its truth. I think St. Au­gus­tine was in a some­what similar case.

Q: But you know the book is writ­ten in such a way as to con­vince you, whether it’s true or not.

A: I took that into ac­count, and my prior was re­ally low that I would ever be­lieve it. But the ev­i­dence pre­sented in the book was so sig­nifi­cant and con­vinc­ing that it over­came my skep­ti­cism.

Q: But the book is a ra­tio­nal­iza­tion of Zoroas­tri­anism. It’s not an im­par­tial anal­y­sis.

A: I once read a book try­ing to ex­plain and prove Gödel’s the­o­rem. It was writ­ten ex­plic­itly to con­vince the reader that the the­o­rem was true. It started with the con­clu­sion and built all ar­gu­ments to prove it. But the book was in fact cor­rect in as­sert­ing this propo­si­tion.

Q: But the AI is a clever ar­guer. It only pre­sents ar­gu­ments that are use­ful to its cause.

A: So is the book on Gödel’s the­o­rem. It never pre­sented any ar­gu­ments against Gödel, and I know there are some, at least philo­soph­i­cal ones. It’s still true.

Q: You can’t make a new de­ci­sion based on such a book which is a ra­tio­nal­iza­tion. Per­haps it can only be used to ex­pand one’s knowl­edge. Even if it ar­gues in sup­port of a true propo­si­tion, a book that is a ra­tio­nal­iza­tion is not re­ally ev­i­dence for the propo­si­tion’s truth.

A: You know that our AI cre­ated a library of books to ar­gue for most the­olog­i­cal po­si­tions. Do you agree that with very high prob­a­bil­ity one of the books in the library ar­gues for a true propo­si­tion? E.g. the one about athe­ism? If I were to read it now, I’d be­come an athe­ist again.

Q: Then do so!

A: No, Ahura Mazda will pun­ish me. I know I would think he’s not there af­ter I read it, but he’ll pun­ish me any­way. Be­sides, at pre­sent I be­lieve that book to be in­ten­tion­ally mis­lead­ing. Any­way, if one of the books ar­gues for a true propo­si­tion, it may also use a com­pletely valid ar­gu­ment with­out any tricks. I think this is true of this book on Zoroas­tri­anism, and is false of all other books in AI’s library.

Q: Per­haps I be­lieve the Athe­ism book ar­gues for a true propo­si­tion, but it is pos­si­ble that all the books writ­ten by the AI use specious rea­son­ing, even the one that ar­gues for a true propo­si­tion. In this case, you can’t rely on any of them be­ing valid.

A: Why should the AI do that? Valid ar­gu­ment is the best way to demon­strate the truth of some­thing that is in fact true. If tricks are used, this may be un­cov­ered which would throw doubt onto the propo­si­tion be­ing ar­gued.

Q: If you picked a book “You should be­lieve in Zeus”, you’d be­lieve in Zeus now!

A: Yes, but I would be wrong. You see, I ac­ci­den­tally picked the right one. Ac­tu­ally, it’s not en­tirely ac­ci­den­tal. You see, if Ahura Mazda ex­ists, he would with some pos­i­tive prob­a­bil­ity in­terfere with the dice and cause me to pick the book on the true re­li­gion be­cause he would like me to be his wor­shiper. (Same with other gods, of course). So, since P(I picked the book on Zoroas­tri­anism|Zoroas­tri­anism is a true re­li­gion) > P(I picked the book on Zoroas­tri­anism|Zoroas­tri­anism is a false re­li­gion), I can con­clude by Bayes’ rule that me pick­ing that book up is ev­i­dence for Zoroas­tri­anism. Of course, if the prior P(Zoroas­tri­anism is a true re­li­gion) is low, it’s not a lot of ev­i­dence, but it’s some.

Q: So you are re­ally say­ing you won the lot­tery.

A: Yes. A pri­ori, the prob­a­bil­ity is low, of course. But I ac­tu­ally have won the lot­tery: some peo­ple do, you know. Now that I have won it, the prob­a­bil­ity is close to 1 (It’s not 1, be­cause I rec­og­nize that I could be wrong, as a good Bayesian should. But the ev­i­dence is so over­whelming, my model says it’s re­ally close to 1).

Q: Why don’t you ask your su­per-in­tel­li­gent AI di­rectly whether the book’s rea­son­ing is sound?

A: Ac­cord­ing to the book, I am not sup­posed to do it be­cause Ahura Mazda wouldn’t like it.

Q: Of course, the book is writ­ten by the su­per­in­tel­li­gent AI in such a way that there’s no trick I can think of that it didn’t cover. Your ig­no­rance is now in­vin­cible.

A: I still re­main a rea­son­able per­son and I don’t like be­ing de­nied ac­cess to in­for­ma­tion. How­ever, I am now con­vinced that while hav­ing more in­for­ma­tion is use­ful, it is not my high­est pri­or­ity any­more. I know it is pos­si­ble for me to dis­be­lieve again if given cer­tain (ob­vi­ously false!) in­for­ma­tion, but my es­ti­mate of the chance that any fur­ther true in­for­ma­tion could change my opinion is very low. In fact, I am far more likely to be de­ceived by false in­for­ma­tion about Ahura Mazda, be­cause I am not su­per­in­tel­li­gent. This is why Ahura Mazda (who is su­per­in­tel­li­gent, by the way) ad­vises that one should not tempt one­self into sin by read­ing any crit­i­cism of Zoroas­tri­anism.

Q: Just read that athe­ist book and be­come nor­mal again!

A: You are pos­sessed by demons! Re­pent and be­come the fol­lower of Ahura Mazda!

So, are you now con­vinced that you should be a Zoroas­trian one can re­ally in­ten­tion­ally de­ceive one­self and be in a men­tally con­sis­tent (al­though weird) state?

To an­swer one po­ten­tial ob­jec­tion, an or­di­nary per­son may well not need a su­per-in­tel­li­gent AI to fool him, just a very con­vinc­ing book or very clever in­ter­locu­tor. As to why some­one would want to sub­mit to this, I’ll dis­cuss this in a sep­a­rate post.


Here are some points that I think are use­ful to add from var­i­ous com­ments.

  • Re­quire the AI to use tricks as lit­tle as pos­si­ble. In par­tic­u­lar, ev­ery true propo­si­tion it ar­gues should be ar­gued for in an en­tirely log­i­cally valid fash­ion. This short­cuts the part of the Q&A where I dis­cuss whether the book you picked can pos­si­bly be en­tirely valid.

  • From the above point, and as­sum­ing that re­ally only one re­li­gion of our set can be “true”, P(Book on X is valid) = P(X is true). If more than one re­li­gion can be true at the same time, it’s pos­si­ble that P(Book on X is valid) > P(X is true).

  • P(X is true) for any re­li­gion X is some value strictly be­tween 0 and 1. It’s ob­vi­ous, but just to en­sure we are not rul­ing out any­thing a pri­ori. This is my or­di­nary prob­a­bil­ity of be­lief in X.

  • P(Book on X will con­vince me to be­lieve X with prob­a­bil­ity > 1-ep­silon) is some value close to 1-f(ep­silon), where f(ep­silon) is small, but pre­sum­ably may in­crease as ep­silon → 0. In other words, it’s harder for the AI to con­vince me with­out a shadow of a doubt. Ep­silon is picked by AI to keep f(ep­silon) rea­son­ably small. I don’t know what value was picked, it clearly de­pends on X: the more ridicu­lous it is, the larger ep­silon is. If the AI is in­finitely more in­tel­li­gent than I am, per­haps it can keep f(ep­silon) small even as ep­silon → 0.

  • How­ever, what­ever P(Book on X will con­vince me to be­lieve X with prob­a­bil­ity > 1-ep­silon) for X=Zoroas­tri­anism is, it was high enough that the book suc­ceeded in my case.

  • I do not think it is valid to make a meta-state­ment on what the value of the pos­te­rior P(X is true|I have read the book on X) can be, with­out ac­tu­ally read­ing the book. In par­tic­u­lar, the book has at least this prob­a­bil­ity of be­ing true: P(Book on X is valid) >= P(X is true) > 0, so you can­not claim that the pos­te­rior is the same as prior be­cause you be­lieve that the book will con­vince you of X and it does. Ad­di­tion­ally, any meta-ar­gu­ment clearly de­pends on f(ep­silon), which I don’t know.

  • The book can con­vince me to ad­just my world view in such a way that will rule out the in­visi­ble elephant prob­lem, at least where mod­ern sci­ence is con­cerned. I will re­mem­ber what the sci­ence says, of course, but where it con­flicts with my re­li­gion I will re­ally be­lieve what the re­li­gion says, even if it says it’s tur­tles all the way down and will re­ally be afraid of fal­ling of the edge of the Earth if that’s what my re­li­gion teaches.

Any thoughts on whether I should post this on the main site?