A Problem About Bargaining and Logical Uncertainty

Sup­pose you wake up as a pa­per­clip max­i­mizer. Omega says “I calcu­lated the mil­lionth digit of pi, and it’s odd. If it had been even, I would have made the uni­verse ca­pa­ble of pro­duc­ing ei­ther 1020 pa­per­clips or 1010 sta­ples, and given con­trol of it to a sta­ples max­i­mizer. But since it was odd, I made the uni­verse ca­pa­ble of pro­duc­ing 1010 pa­per­clips or 1020 sta­ples, and gave you con­trol.” You dou­ble check Omega’s pi com­pu­ta­tion and your in­ter­nal calcu­la­tor gives the same an­swer.

Then a sta­ples max­i­mizer comes to you and says, “You should give me con­trol of the uni­verse, be­cause be­fore you knew the mil­lionth digit of pi, you would have wanted to pre-com­mit to a deal where each of us would give the other con­trol of the uni­verse, since that gives you 12 prob­a­bil­ity of 1020 pa­per­clips in­stead of 12 prob­a­bil­ity of 1010 pa­per­clips.”

Is the sta­ples max­i­mizer right? If so, the gen­eral prin­ci­ple seems to be that we should act as if we had pre­com­mited to a deal we would have made in ig­no­rance of log­i­cal facts we ac­tu­ally pos­sess. But how far are we sup­posed to push this? What deal would you have made if you didn’t know that the first digit of pi was odd, or if you didn’t know that 1+1=2?

On the other hand, sup­pose the sta­ples max­i­mizer is wrong. Does that mean you also shouldn’t agree to ex­change con­trol of the uni­verse be­fore you knew the mil­lionth digit of pi?

To make this more rele­vant to real life, con­sider two hu­mans ne­go­ti­at­ing over the goal sys­tem of an AI they’re jointly build­ing. They have a lot of ig­no­rance about the rele­vant log­i­cal facts, like how smart/​pow­er­ful the AI will turn out to be and how effi­cient it will be in im­ple­ment­ing each of their goals. They could ne­go­ti­ate a solu­tion now in the form of a weighted av­er­age of their util­ity func­tions, but the weights they choose now will likely turn out to be “wrong” in full view of the rele­vant log­i­cal facts (e.g., the ac­tual shape of the util­ity-pos­si­bil­ity fron­tier). Or they could pro­gram their util­ity func­tions into the AI sep­a­rately, and let the AI de­ter­mine the weights later us­ing some for­mal bar­gain­ing solu­tion when it has more knowl­edge about the rele­vant log­i­cal facts. Which is the right thing to do? Or should they fol­low the sta­ples max­i­mizer’s rea­son­ing and bar­gain un­der the pre­tense that they know even less than they ac­tu­ally do?

Other Re­lated Posts: Coun­ter­fac­tual Mug­ging and Log­i­cal Uncer­tainty, If you don’t know the name of the game, just tell me what I mean to you