Naturalism versus unbounded (or unmaximisable) utility options

There are many para­doxes with un­bounded util­ity func­tions. For in­stance, con­sider whether it’s ra­tio­nal to spend eter­nity in Hell:

Sup­pose that you die, and God offers you a deal. You can spend 1 day in Hell, and he will give you 2 days in Heaven, and then you will spend the rest of eter­nity in Pur­ga­tory (which is po­si­tioned ex­actly mid­way in util­ity be­tween heaven and hell). You de­cide that it’s a good deal, and ac­cept. At the end of your first day in Hell, God offers you the same deal: 1 ex­tra day in Hell, and you will get 2 more days in Heaven. Again you ac­cept. The same deal is offered at the end of the sec­ond day.

And the re­sult is… that you spend eter­nity in Hell. There is never a ra­tio­nal mo­ment to leave for Heaven—that de­ci­sion is always dom­i­nated by the de­ci­sion to stay in Hell.

Or con­sider a sim­pler para­dox:

You’re im­mor­tal. Tell Omega any nat­u­ral num­ber, and he will give you that much util­ity. On top of that, he will give you any util­ity you may have lost in the de­ci­sion pro­cess (such as the time wasted choos­ing and spec­i­fy­ing your num­ber). Then he de­parts. What num­ber will you choose?

Again, there’s no good an­swer to this prob­lem—any num­ber you name, you could have got more by nam­ing a higher one. And since Omega com­pen­sates you for ex­tra effort, there’s never any rea­son to not name a higher num­ber.

It seems that these are prob­lems caused by un­bounded util­ity. But that’s not the case, in fact! Con­sider:

You’re im­mor­tal. Tell Omega any real num­ber r > 0, and he’ll give you 1-r util­ity. On top of that, he will give you any util­ity you may have lost in the de­ci­sion pro­cess (such as the time wasted choos­ing and spec­i­fy­ing your num­ber). Then he de­parts. What num­ber will you choose?

Again, there is not best an­swer—for any r, r/​2 would have been bet­ter. So these prob­lems arise not be­cause of un­bounded util­ity, but be­cause of un­bounded op­tions. You have in­finitely many op­tions to choose from (se­quen­tially in the Heaven and Hell prob­lem, all at once in the other two) and the set of pos­si­ble util­ities from your choices does not pos­sess a max­i­mum—so there is no best choice.

What should you do? In the Heaven and Hell prob­lem, you end up worse off if you make the lo­cally dom­i­nant de­ci­sion at each de­ci­sion node—if you always choose to add an ex­tra day in Hell, you’ll never get out of it. At some point (maybe at the very be­gin­ning), you’re go­ing to have to give up an ad­van­ta­geous deal. In fact, since giv­ing up once means you’ll never be offered the deal again, you’re go­ing to have to give up ar­bi­trar­ily much util­ity. Is there a way out of this co­nun­drum?

As­sume first that you’re a de­ter­minis­tic agent, and imag­ine that you’re sit­ting down for an hour to think about this (don’t worry, Satan can wait, he’s just warm­ing up the pok­ers). Since you’re de­ter­minis­tic, and you know it, then your ul­ti­mate life fu­ture will be en­tirely de­ter­mined by what you de­cide right now (in fact your life his­tory is already de­ter­mined, you just don’t know it yet—still, by the Markov prop­erty, your cur­rent de­ci­sion also de­ter­mines the fu­ture). Now, you don’t have to reach any grand de­ci­sion now—you’re just de­cid­ing what you’ll do for the next hour or so. Some pos­si­ble op­tions are:

  • Ig­nore ev­ery­thing, sing songs to your­self.

  • Think about this some more, think­ing of your­self as an al­gorithm.

  • Think about this some more, think­ing of your­self as a col­lec­tion of ar­gu­ing agents.

  • Pick a num­ber N, and ac­cept all of God’s deals un­til day N.

  • Promise your­self you’ll re­ject all of God’s deals.

  • Ac­cept God’s deal for to­day, hope some­thing turns up.

  • Defer any de­ci­sion un­til an­other hour has passed.

  • ...

There are many other op­tions—in fact, there are pre­cisely as many op­tions as you’ve con­sid­ered dur­ing that hour. And, cru­cially, you can put an es­ti­mated ex­pected util­ity to each one. For in­stance, you might know your­self, and sus­pect that you’ll always do the same thing (you have no self dis­ci­pline where cake and Heaven are con­cerned), so any de­ci­sion apart from im­me­di­ately re­ject­ing all of God’s deals will give you -∞ util­ity. Or maybe you know your­self, and have great self dis­ci­pline and perfect pre­com­mit­ments- there­fore if you pick a num­ber N in the com­ing hour, you’ll stick to it. Think­ing some more may have a cer­tain ex­pected util­ity—which may differ de­pend­ing on what di­rec­tions you di­rect your thoughts. And if you know that you can’t di­rect your thoughts—well then they’ll all have the same ex­pected util­ity.

But no­tice what’s hap­pen­ing here: you’ve re­duced the ex­pected util­ity calcu­la­tion over in­finitely many op­tions, to one over finitely many op­tions—namely, all the in­terim de­ci­sions that you can con­sider in the course of an hour. Since you are de­ter­minis­tic, the in­finitely many op­tions don’t have an im­pact: what­ever in­terim de­ci­sion you fol­low, will uniquely de­ter­mine how much util­ity you ac­tu­ally get out of this. And given finitely many op­tions, each with ex­pected util­ity, choos­ing one doesn’t give any para­doxes.

And note that you don’t need de­ter­minism—adding stochas­tic com­po­nents to your­self doesn’t change any­thing, as you’re already us­ing ex­pected util­ity any­way. So all you need is an as­sump­tion of nat­u­ral­ism—that you’re sub­ject to the laws of na­ture, that your de­ci­sion will be the re­sult of de­ter­minis­tic or stochas­tic pro­cesses. In other words, you don’t have ‘spooky’ free will that con­tra­dicts the laws of physics.

Of course, you might be wrong about your es­ti­mates—maybe you have more/​less willpower than you ini­tially thought. That doesn’t in­val­i­date the model—at ev­ery hour, at ev­ery in­terim de­ci­sion, you need to choose the op­tion that will, in your es­ti­ma­tion, ul­ti­mately re­sult in the most util­ity (not just for the next few mo­ments or days).

If we want to be more for­mal, we can say that you’re de­cid­ing on a de­ci­sion policy—choos­ing among the differ­ent agents that you could be, the one most likely to reach high ex­pected util­ity. Here are some poli­cies you could choose from (the challenge is to find a policy that gets you the most days in Hell/​Heaven, with­out get­ting stuck and go­ing on for­ever):

  • De­cide to count the days, and re­ject God’s deal as soon as you lose count.

  • Fix a prob­a­bil­ity dis­tri­bu­tion over fu­ture days, and re­ject God’s deal with a cer­tain prob­a­bil­ity.

  • Model your­self as a finite state ma­chine. Figure out the Busy Beaver num­ber of that finite state ma­chine. Re­ject the deal when the num­ber of days climbs close to that.

  • Real­ise that you prob­a­bly can’t com­pute the Busy Beaver num­ber for your­self, and in­stead use some very fast grow­ing func­tion like the Ack­er­mann func­tions in­stead.

  • Use the Ack­er­mann func­tion to count down the days dur­ing which you for­mu­late a policy; af­ter that, im­ple­ment it.

  • Es­ti­mate that there is a non-zero prob­a­bil­ity of fal­ling into a loop (which would give you -∞ util­ity), so re­ject God’s deal as soon as pos­si­ble.

  • Es­ti­mate that there is a non-zero prob­a­bil­ity of ac­ci­den­tally tel­ling God the wrong thing, so com­mit to ac­cept­ing all of God’s deals (and count on ac­ci­dents to res­cue you from -∞ util­ity).

But why spend a whole hour think­ing about it? Surely the same ap­plies for half an hour, a minute, a sec­ond, a microsec­ond? That’s en­tirely a con­ve­nience choice—if you think about things in one sec­ond in­cre­ments, then the in­terim de­ci­sion “think some more” is nearly always go­ing to be the dom­i­nant one.

The men­tion of the Busy Beaver num­ber hints at a truth—given the limi­ta­tions of your mind and de­ci­sion abil­ities, there is one policy, among all pos­si­ble poli­cies that you could im­ple­ment, that gives you the most util­ity. More com­pli­cated poli­cies you can’t im­ple­ment (which gen­er­ally means you’d hit a loop and get -∞ util­ity), and sim­pler poli­cies would give you less util­ity. Of course, you likely won’t find that policy, or any­thing close to it. It all re­ally de­pends on how good your policy find­ing policy is (and your policy find­ing policy find­ing policy...).

That’s maybe the most im­por­tant as­pect of these prob­lems: some agents are just bet­ter than oth­ers. Un­like finite cases where any agent can sim­ply list all the op­tions, take their time, and choose the best one, here an agent with a bet­ter de­ci­sion al­gorithm will out­perform an­other. Even if they start with the same re­sources (mem­ory ca­pac­ity, cog­ni­tive short­cuts, etc...) one may be a lot bet­ter than an­other. If the agents don’t ac­quire more re­sources dur­ing their time in Hell, then their max­i­mal pos­si­ble util­ity is re­lated to their Busy Beaver num­ber—ba­si­cally the max­i­mal length that a finite-state agent can sur­vive with­out fal­ling into an in­finite loop. Busy Beaver num­bers are ex­tremely un­com­putable, so some agents, by pure chance, may be ca­pa­ble of ac­quiring much greater util­ity than oth­ers. And agents that start with more re­sources have a much larger the­o­ret­i­cal max­i­mum—not fair, but deal with it. Hence it’s not re­ally an in­finite op­tion sce­nario, but an in­finite agent sce­nario, with each agent hav­ing a differ­ent max­i­mal ex­pected util­ity that they can ex­tract from the setup.

It should be noted that God, or any be­ing ca­pa­ble of hy­per­com­pu­ta­tion, has real prob­lems in these situ­a­tions: they ac­tu­ally have in­finite op­tions (not a finite op­tions of choos­ing their fu­ture policy), and so don’t have any solu­tion available.

This is also re­lated to the­o­ret­i­cal max­i­mally op­ti­mum agent that is AIXI: for any com­putable agent that ap­prox­i­mates AIXI, there will be other agents that ap­prox­i­mate it bet­ter (and hence get higher ex­pected util­ity). Again, it’s not fair, but not un­ex­pected ei­ther: smarter agents are smarter.

What to do?

This anal­y­sis doesn’t solve the vex­ing ques­tion of what to do—what is the right an­swer to these kind of prob­lems? Th­ese de­pend on what type of agent you are, but what you need to do is es­ti­mate the max­i­mal in­te­ger you are ca­pa­ble of com­put­ing (and stor­ing), and en­dure for that many days. Cer­tain prob­a­bil­is­tic strate­gies may im­prove your perfor­mance fur­ther, but you have to put the effort into find­ing them.