Utility uncertainty vs. expected information gain

It is a rel­a­tively in­tu­itive thought that if a Bayesian agent is un­cer­tain about its util­ity func­tion, it will act more con­ser­va­tively un­til it has a bet­ter han­dle on what its true util­ity func­tion is.

This might be deeply flawed in a way that I’m not aware of, but I’m go­ing to point out a way in which I think this in­tu­ition is slightly flawed. For a Bayesian agent, a nat­u­ral mea­sure of un­cer­tainty is the en­tropy of its dis­tri­bu­tion over util­ity func­tions (the dis­tri­bu­tion over which pos­si­ble util­ity func­tion it thinks is the true one). No mat­ter how un­cer­tain a Bayesian agent is about which util­ity func­tion is the true one, if the agent does not be­lieve that any fu­ture ob­ser­va­tions will cause it to up­date its be­lief dis­tri­bu­tion, then it will just act as if it has a util­ity func­tion equal to the Bayes’ mix­ture over all the util­ity func­tions it con­sid­ers plau­si­ble (weighted by its cre­dence in each one).

It seems like what our in­tu­ition is grasp­ing for is not un­cer­tainty about the util­ity func­tion, but ex­pected in­for­ma­tion gain about the util­ity func­tion. If the agent ex­pects to gain in­for­ma­tion about the util­ity func­tion, then (in­tu­itively to me, at least) it will act more con­ser­va­tively un­til it has a bet­ter han­dle on what its true util­ity func­tion is.

Ex­pected in­for­ma­tion gain (at time t) is nat­u­rally for­mal­ized as the ex­pec­ta­tion (w.r.t. cur­rent be­liefs) of KL(pos­te­rior dis­tri­bu­tion at time t + m || pos­te­rior dis­tri­bu­tion at time t). Roughly, this is how poorly it ex­pects its cur­rent be­liefs will ap­prox­i­mate its fu­ture be­liefs (in m timesteps).

So if any­one has a safety idea to which util­ity un­cer­tainty feels cen­tral, my guess is that a men­tal sub­sti­tu­tion from un­cer­tainty to ex­pected in­for­ma­tion gain would be helpful.

Un­for­tu­nately, on-policy ex­pected in­for­ma­tion gain goes to 0 pretty fast (The­o­rem 5 here).