As a PSA for anyone curious, here’s a quick summary of the paper:
Assume that we assign a non-zero probability to every hypothesis consistent with our data so far, and that our subjective probability distribution is computable. Assume there is an algorithm F such that F(M) is a hypothesis (consistent with available data) in which whatever we do, we get constant utility u’ satisfying |u’| > M.
Arrange all ‘computations returning natural numbers’ in some order. Let C_n be the n-th such computation.
Arrange all of the hypotheses with non-zero probabilities in some order. Now define function f(n) = the index of the hypothesis F(the result of C_n). Somewhat delicate point: f is a computable function (even though some of the C_n do not halt). Therefore, q(n) = 1/(the probability of hypotheses f(n)) is also a computable function.
However, for any computable function g, there must exist (by Busy Beaverish arguments) infinitely many values of n such that g(n) < the result of C_n. This is true of q(n) in particular.
If q(n) < the result of C_n < |the utility of hypothesis f(n)| then the term corresponding to f(n) in our expected utility must have magnitude >= 1. But there are infinitely many such terms, so the expected utility is divergent.
[Have my various simplifications broken the proof?]
As a PSA for anyone curious, here’s a quick summary of the paper:
Assume that we assign a non-zero probability to every hypothesis consistent with our data so far, and that our subjective probability distribution is computable. Assume there is an algorithm F such that F(M) is a hypothesis (consistent with available data) in which whatever we do, we get constant utility u’ satisfying |u’| > M.
Arrange all ‘computations returning natural numbers’ in some order. Let C_n be the n-th such computation.
Arrange all of the hypotheses with non-zero probabilities in some order. Now define function f(n) = the index of the hypothesis F(the result of C_n). Somewhat delicate point: f is a computable function (even though some of the C_n do not halt). Therefore, q(n) = 1/(the probability of hypotheses f(n)) is also a computable function.
However, for any computable function g, there must exist (by Busy Beaverish arguments) infinitely many values of n such that g(n) < the result of C_n. This is true of q(n) in particular.
If q(n) < the result of C_n < |the utility of hypothesis f(n)| then the term corresponding to f(n) in our expected utility must have magnitude >= 1. But there are infinitely many such terms, so the expected utility is divergent.
[Have my various simplifications broken the proof?]