rationality for turing machines

What can we say about rational behaviour among realizable algorithms, i.e. Turing machines? What limits can we put on their performance? What exactly does Cox’s theorem have to say?

Cox’s theorem tells us that optimally rational beliefs imply Bayesian reasoning. But such results are independent of specific models of computation. Can we rigorously apply machinery like Cox’s theorem if we state the problem in terms of optimal behaviour among general recursive algorithms—i.e. realizable algorithms.

To make that a little more concrete: take the problem of writing an algorithm (designing a Turing machine) that processes sensor inputs and writes actions as output. As a concrete Turing machine, the algorithm consists of discrete steps at which it either writes an output symbol (corresponding to some action) or does not (which we can interpret as some null action). There is a second Turing machine corresponding to the external world that responds to each action by changing its state, which feeds back into the sensor inputs of our algorithm. We seek the algorithm maximizing the expectation of some pre-specified utility function U, defined over states of the world. To define this rigorously we need to nail down how utility is summed over time, perhaps something along the lines of Legg and Hutter’s formalism would be appropriate. Or perhaps we can just terminate the process after some pre-specified number of steps.

So it seems to me that Cox’s theorem establishes that there must be a Bayes-optimal decision at any point in time, and a “Bayes oracle” that always outputs this decisions would be optimal among all algorithms. But such an oracle is certainly not directly realizable as a Turing machine, since at each time step a Turing machine simply changes state, moves the tape, and writes a symbol, whereas a full Bayesian update could require arbitrary amounts of computation. And there is no option of not acting, only of outputting the null action.

One approach would be to make a Bayes-optimal decision about how much computation to spend on each update that takes into account opportunity costs. But while this seems intuitively reasonable, there is also an opportunity cost to performing this computation, so the question really is how sure we can be that this is the best design choice.

Nevertheless, it seems quite reasonable to write Bayesian algorithms, and indeed to expect them to perform optimally on average. But can we formalize and prove a result along these lines? Does someone know of existing work in this direction? Perhaps Cox’s theorem or something similar applies in some direct way that I haven’t perceived?