# AlexMennen comments on A utility-maximizing varient of AIXI

• I’m hon­estly not sure whether one should chase the path of U = 0 or not. This is clouded by the fact that the prob­a­bil­is­tic na­ture of things will prob­a­bly push you off that path even­tu­ally.

Mak­ing the as­sump­tion that there is a small prob­a­bil­ity that you will de­vi­ate from your cur­rent plan on each fu­ture move, and that these prob­a­bil­ities add up to a near guaran­tee that you will even­tu­ally, has a more com­pli­cated effect on your plan­ning than just jus­tify­ing chas­ing the supre­mum.

For in­stance, con­sider this mod­ifi­ca­tion to the toy ex­am­ple I gave ear­lier. Y:={a,b,c}, and if the first b comes be­fore the first c, then the re­sult­ing util­ity is 1 − 1/​n, where n is the in­dex of the first b (all pre­vi­ous el­e­ments be­ing a), as be­fore. But we’ll change it so that the util­ity of out­putting an in­finite stream of a is 1. If there is a c in your ac­tion se­quence and it comes be­fore the first b, then the util­ity you get is −1000. In this situ­a­tion, supre­mum-chas­ing works just fine if you com­pletely trust your fu­ture self: you out­put a ev­ery time, and get a util­ity of 1, the best you could pos­si­bly do. But if you think that there is a small risk that you could end up out­putting c at some point, then even­tu­ally it will be worth­while to out­put b, since the gain you could get from con­tin­u­ing to out­put a gets ar­bi­trar­ily small com­pared to the loss from ac­ci­den­tally out­putting c.

There’s some­thing else that is strange to me. If we are con­sid­er­ing in­finite in­ter­ac­tion his­to­ries, then we’re look­ing at the en­tire bi­nary tree at once. But this tree has un­countably in­finite paths! Al­most all of the (in­finite) paths are in­com­putable se­quences. This means that any com­putable AI couldn’t even con­sider travers­ing them. And it also seems to have in­ter­est­ing things to say about the util­ity func­tion. Does it only need to be defined over com­putable se­quences? What if we have util­ity over in­comp­tu­able se­quences? Th­ese could be defined by sec­ond-or­der logic state­ments, but re­main in­com­putable. It gives me lots of ques­tions.

I don’t re­ally have an­swers to these ques­tions. One thing you could do is re­place the set of all poli­cies (P) with the set of all com­putable poli­cies, so that the agent would never out­put an un­com­putable ac­tion se­quence [Edit: oops, not true. You could con­sider only com­putable poli­cies, but then end up at an un­com­putable policy any­way by chas­ing the supre­mum].

• Mak­ing the as­sump­tion that...

Yeah, I was in­ten­tion­ally vague with “the prob­a­bil­is­tic na­ture of things”. I am also think­ing about how any AI will have log­i­cal un­cer­tainty, un­cer­tainty about the pre­ci­sion of its ob­ser­va­tions, et cetera, so that as it con­sid­ers fur­ther points in the fu­ture, its dis­tri­bu­tion be­comes flat­ter. And hav­ing a non-du­al­ist frame­work would in­tro­duce un­cer­tainty about the agent’s self, its util­ity func­tion, its mem­ory, …