I came across this problem awhile ago, made some notes and never published. Seems like a good time to write up the highlights.
A world model is a mapping from decisions to probability distributions over outcomes. A preference function is a total ordering over probability distributions over outcomes. Utility functions are the subset of preference functions which is conveniently linear (but we don’t need those linearity properties right now). A decision theory maps from (world model, preference function) pairs to decisions, and the goodness of decision theories is a partial order such that for decision theories A and B, A>=B if for all preference functions Pref and world models World, Pref(World(A(World,Pref))) >= Pref(World(B(World,Pref))). This order has a supremum iff for all world models Pref(World(d)) has a supremum.
Where these orderings lack suprema, it is nevertheless possible to define limiting sequences of successively better options and successively better decision theories which generate them, such that for any single decision or decision theory the seqeunce must contain an element which is better. Finding and proving the correctness of such a sequence, for a particular world model, is proving that there is no optimal decision for that world model.
A bounded agent could approximate (but not reach) optimality by finding such a sequence, and traversing some distance along it. Such sequences, and the number of steps taken along them, can each be classified by their improvement rates. The partial order formed by those rates (with an ignored multiplicative constant on the sequence traversal distance) makes a second-order decision problem. If you ignore the constant, this one does have a supremum: the busy beaver function. (All further games you could play with BB can be folded into the one constant.)
I came across this problem awhile ago, made some notes and never published. Seems like a good time to write up the highlights.
A world model is a mapping from decisions to probability distributions over outcomes. A preference function is a total ordering over probability distributions over outcomes. Utility functions are the subset of preference functions which is conveniently linear (but we don’t need those linearity properties right now). A decision theory maps from (world model, preference function) pairs to decisions, and the goodness of decision theories is a partial order such that for decision theories A and B, A>=B if for all preference functions Pref and world models World, Pref(World(A(World,Pref))) >= Pref(World(B(World,Pref))). This order has a supremum iff for all world models Pref(World(d)) has a supremum.
Where these orderings lack suprema, it is nevertheless possible to define limiting sequences of successively better options and successively better decision theories which generate them, such that for any single decision or decision theory the seqeunce must contain an element which is better. Finding and proving the correctness of such a sequence, for a particular world model, is proving that there is no optimal decision for that world model.
A bounded agent could approximate (but not reach) optimality by finding such a sequence, and traversing some distance along it. Such sequences, and the number of steps taken along them, can each be classified by their improvement rates. The partial order formed by those rates (with an ignored multiplicative constant on the sequence traversal distance) makes a second-order decision problem. If you ignore the constant, this one does have a supremum: the busy beaver function. (All further games you could play with BB can be folded into the one constant.)