Towards a New Decision Theory

It com­monly ac­knowl­edged here that cur­rent de­ci­sion the­o­ries have defi­cien­cies that show up in the form of var­i­ous para­doxes. Since there seems to be lit­tle hope that Eliezer will pub­lish his Time­less De­ci­sion The­ory any time soon, I de­cided to try to syn­the­size some of the ideas dis­cussed in this fo­rum, along with a few of my own, into a co­her­ent al­ter­na­tive that is hope­fully not so para­dox-prone.

I’ll start with a way of fram­ing the ques­tion. Put your­self in the place of an AI, or more speci­fi­cally, the de­ci­sion al­gorithm of an AI. You have ac­cess to your own source code S, plus a bit string X rep­re­sent­ing all of your mem­o­ries and sen­sory data. You have to choose an out­put string Y. That’s the de­ci­sion. The ques­tion is, how? (The an­swer isn’t “Run S,” be­cause what we want to know is what S should be in the first place.)

Let’s pro­ceed by ask­ing the ques­tion, “What are the con­se­quences of S, on in­put X, re­turn­ing Y as the out­put, in­stead of Z?” To be­gin with, we’ll con­sider just the con­se­quences of that choice in the realm of ab­stract com­pu­ta­tions (i.e. com­pu­ta­tions con­sid­ered as math­e­mat­i­cal ob­jects rather than as im­ple­mented in phys­i­cal sys­tems). The most im­me­di­ate con­se­quence is that any pro­gram that calls S as a sub­rou­tine with X as in­put, will re­ceive Y as out­put, in­stead of Z. What hap­pens next is a bit harder to tell, but sup­pos­ing that you know some­thing about a pro­gram P that call S as a sub­rou­tine, you can fur­ther de­duce the effects of choos­ing Y ver­sus Z by trac­ing the differ­ence be­tween the two choices in P’s sub­se­quent ex­e­cu­tion. We could call these the com­pu­ta­tional con­se­quences of Y. Sup­pose you have prefer­ences about the ex­e­cu­tion of a set of pro­grams, some of which call S as a sub­rou­tine, then you can satisfy your prefer­ences di­rectly by choos­ing the out­put of S so that those pro­grams will run the way you most pre­fer.

A more gen­eral class of con­se­quences might be called log­i­cal con­se­quences. Con­sider a pro­gram P’ that doesn’t call S, but a differ­ent sub­rou­tine S’ that’s log­i­cally equiv­a­lent to S. In other words, S’ always pro­duces the same out­put as S when given the same in­put. Due to the log­i­cal re­la­tion­ship be­tween S and S’, your choice of out­put for S must also af­fect the sub­se­quent ex­e­cu­tion of P’. Another ex­am­ple of a log­i­cal re­la­tion­ship is an S’ which always re­turns the first bit of the out­put of S when given the same in­put, or one that re­turns the same out­put as S on some sub­set of in­puts.

In gen­eral, you can’t be cer­tain about the con­se­quences of a choice, be­cause you’re not log­i­cally om­ni­scient. How to han­dle log­i­cal/​math­e­mat­i­cal un­cer­tainty is an open prob­lem, so for now we’ll just as­sume that you have ac­cess to a “math­e­mat­i­cal in­tu­ition sub­rou­tine” that some­how al­lows you to form be­liefs about the likely con­se­quences of your choices.

At this point, you might ask, “That’s well and good, but what if my prefer­ences ex­tend be­yond ab­stract com­pu­ta­tions? What about con­se­quences on the phys­i­cal uni­verse?” The an­swer is, we can view the phys­i­cal uni­verse as a pro­gram that runs S as a sub­rou­tine, or more gen­er­ally, view it as a math­e­mat­i­cal ob­ject which has S em­bed­ded within it. (From now on I’ll just re­fer to pro­grams for sim­plic­ity, with the un­der­stand­ing that the sub­se­quent dis­cus­sion can be gen­er­al­ized to non-com­putable uni­verses.) Your prefer­ences about the phys­i­cal uni­verse can be trans­lated into prefer­ences about such a pro­gram P and pro­grammed into the AI. The AI, upon re­ceiv­ing an in­put X, will look into P, de­ter­mine all the in­stances where it calls S with in­put X, and choose the out­put that op­ti­mizes its prefer­ences about the ex­e­cu­tion of P. If the prefer­ences were trans­lated faith­fully, the the AI’s de­ci­sion should also op­ti­mize your prefer­ences re­gard­ing the phys­i­cal uni­verse. This faith­ful trans­la­tion is a sec­ond ma­jor open prob­lem.

What if you have some un­cer­tainty about which pro­gram our uni­verse cor­re­sponds to? In that case, we have to spec­ify prefer­ences for the en­tire set of pro­grams that our uni­verse may cor­re­spond to. If your prefer­ences for what hap­pens in one such pro­gram is in­de­pen­dent of what hap­pens in an­other, then we can rep­re­sent them by a prob­a­bil­ity dis­tri­bu­tion on the set of pro­grams plus a util­ity func­tion on the ex­e­cu­tion of each in­di­vi­d­ual pro­gram. More gen­er­ally, we can always rep­re­sent your prefer­ences as a util­ity func­tion on vec­tors of the form <E1, E2, E3, …> where E1 is an ex­e­cu­tion his­tory of P1, E2 is an ex­e­cu­tion his­tory of P2, and so on.

Th­ese con­sid­er­a­tions lead to the fol­low­ing de­sign for the de­ci­sion al­gorithm S. S is coded with a vec­tor <P1, P2, P3, …> of pro­grams that it cares about, and a util­ity func­tion on vec­tors of the form <E1, E2, E3, …> that defines its prefer­ences on how those pro­grams should run. When it re­ceives an in­put X, it looks in­side the pro­grams P1, P2, P3, …, and uses its “math­e­mat­i­cal in­tu­ition” to form a prob­a­bil­ity dis­tri­bu­tion P_Y over the set of vec­tors <E1, E2, E3, …> for each choice of out­put string Y. Fi­nally, it out­puts a string Y* that max­i­mizes the ex­pected util­ity Sum P_Y(<E1, E2, E3, …>) U(<E1, E2, E3, …>). (This speci­fi­cally as­sumes that ex­pected util­ity max­i­miza­tion is the right way to deal with math­e­mat­i­cal un­cer­tainty. Con­sider it a tem­po­rary place­holder un­til that prob­lem is solved. Also, I’m de­scribing the al­gorithm as a brute force search for sim­plic­ity. In re­al­ity, you’d prob­a­bly want it to do some­thing clev­erer to find the op­ti­mal Y* more quickly.)

Ex­am­ple 1: Coun­ter­fac­tual Mugging

Note that Bayesian up­dat­ing is not done ex­plic­itly in this de­ci­sion the­ory. When the de­ci­sion al­gorithm re­ceives in­put X, it may de­ter­mine that a sub­set of pro­grams it has prefer­ences about never calls it with X and are also log­i­cally in­de­pen­dent of its out­put, and there­fore it can safely ig­nore them when com­put­ing the con­se­quences of a choice. There is no need to set the prob­a­bil­ities of those pro­grams to 0 and renor­mal­ize.

So, with that in mind, we can model Coun­ter­fac­tual Mug­ging by the fol­low­ing Python pro­gram:

def P(coin):
AI_bal­ance = 100
if coin == “heads”:
if S(“heads”) == “give $100”:
AI_bal­ance -= 100
if coin == “tails”:
if Omega_Pre­dict(S, “heads”) == “give $100″:
AI_bal­ance += 10000

The AI’s goal is to max­i­mize ex­pected util­ity = .5 * U(AI_bal­ance af­ter P(“heads”)) + .5 * U(AI_bal­ance af­ter P(“tails”)). As­sum­ing U(AI_bal­ance)=AI_bal­ance, it’s easy to de­ter­mine U(AI_bal­ance af­ter P(“heads”)) as a func­tion of S’s out­put. It equals 0 if S(“heads”) == “give $100”, and 100 oth­er­wise. To com­pute U(AI_bal­ance af­ter P(“tails”)), the AI needs to look in­side the Omega_Pre­dict func­tion (not shown here), and try to figure out how ac­cu­rate it is. As­sum­ing the math­e­mat­i­cal in­tu­ition mod­ule says that choos­ing “give $100” as the out­put for S(“heads”) makes it more likely (by a suffi­ciently large mar­gin) for Omega_Pre­dict(S, “heads”) to out­put “give $100”, then that choice max­i­mizes ex­pected util­ity.

Ex­am­ple 2: Re­turn of Bayes

This ex­am­ple is based on case 1 in Eliezer’s post Pri­ors as Math­e­mat­i­cal Ob­jects. An urn con­tains 5 red balls and 5 white balls. The AI is asked to pre­dict the prob­a­bil­ity of each ball be­ing red as it as drawn from the urn, its goal be­ing to max­i­mize the ex­pected log­a­r­ith­mic score of its pre­dic­tions. The main point of this ex­am­ple is that this de­ci­sion the­ory can re­pro­duce the effect of Bayesian rea­son­ing when the situ­a­tion calls for it. We can model the sce­nario us­ing prefer­ences on the fol­low­ing Python pro­gram:

def P(n):
urn = [‘red’, ‘red’, ‘red’, ‘red’, ‘red’, ‘white’, ‘white’, ‘white’, ‘white’, ‘white’]
his­tory = []
score = 0
while urn:
i = n%len(urn)
n = n/​len(urn)
ball = urn[i]
urn[i:i+1] = []
pre­dic­tion = S(his­tory)
if ball == ‘red’:
score += math.log(pre­dic­tion, 2)
score += math.log(1-pre­dic­tion, 2)
print (score, ball, pre­dic­tion)

Here is a print­out from a sam­ple run, us­ing n=1222222:

-1.0 red 0.5
-2.16992500144 red 0.444444444444
-2.84799690655 white 0.375
-3.65535182861 white 0.428571428571
-4.65535182861 red 0.5
-5.9772799235 red 0.4
-7.9772799235 red 0.25
-7.9772799235 white 0.0
-7.9772799235 white 0.0
-7.9772799235 white 0.0

S should use de­duc­tive rea­son­ing to con­clude that re­turn­ing (num­ber of red balls re­main­ing /​ to­tal balls re­main­ing) max­i­mizes the av­er­age score across the range of pos­si­ble in­puts to P, from n=1 to 10! (rep­re­sent­ing the pos­si­ble or­ders in which the balls are drawn), and do that. Alter­na­tively, S can ap­prox­i­mate the cor­rect pre­dic­tions us­ing brute force: gen­er­ate a ran­dom func­tion from his­to­ries to pre­dic­tions, and com­pute what the av­er­age score would be if it were to im­ple­ment that func­tion. Re­peat this a large num­ber of times and it is likely to find a func­tion that re­turns val­ues close to the op­ti­mum pre­dic­tions.

Ex­am­ple 3: Level IV Multiverse

In Teg­mark’s Level 4 Mul­ti­verse, all struc­tures that ex­ist math­e­mat­i­cally also ex­ist phys­i­cally. In this case, we’d need to pro­gram the AI with prefer­ences over all math­e­mat­i­cal struc­tures, per­haps rep­re­sented by an or­der­ing or util­ity func­tion over con­junc­tions of well-formed sen­tences in a for­mal set the­ory. The AI will then pro­ceed to “op­ti­mize” all of math­e­mat­ics, or at least the parts of math that (A) are log­i­cally de­pen­dent on its de­ci­sions and (B) it can rea­son or form in­tu­itions about.

I sug­gest that the Level 4 Mul­ti­verse should be con­sid­ered the de­fault set­ting for a gen­eral de­ci­sion the­ory, since we can­not rule out the pos­si­bil­ity that all math­e­mat­i­cal struc­tures do in­deed ex­ist phys­i­cally, or that we have di­rect prefer­ences on math­e­mat­i­cal struc­tures (in which case there is no need for them to ex­ist “phys­i­cally”). Clearly, ap­pli­ca­tion of de­ci­sion the­ory to the Level 4 Mul­ti­verse re­quires that the pre­vi­ously men­tioned open prob­lems be solved in their most gen­eral forms: how to han­dle log­i­cal un­cer­tainty in any math­e­mat­i­cal do­main, and how to map fuzzy hu­man prefer­ences to well-defined prefer­ences over the struc­tures of math­e­mat­i­cal ob­jects.

Added: For fur­ther in­for­ma­tion and ad­di­tional posts on this de­ci­sion the­ory idea, which came to be called “Up­date­less De­ci­sion The­ory”, please see its en­try in the LessWrong Wiki.