I can’t understand the proof, but I can construct some counterexamples. Either these disprove the theorem, or they will show where I misunderstand the proof.
The first class relies on the fact that the proof places no restrictions on the probability distribution. Define U(n) = n. This is bounded below in abs. value by the unbounded computable function U(n)=n. Define p:N->R such that p(1) = 1, and p=0 for all other n. Now EU(h(k)) = 1.
Notice that I am defining p as a function from the integers into the reals, whereas Peter defined it as a function from the reals into the reals. Peter’s definition must be incorrect, since in equation 2, the term should not be p(f)U(f(k)), as Peter has defined it; it must be either p(f)U(f) or p(f(k))U(f(k)). p and U must always be operating on the same thing within one term, so p should be defined over the same domain as U.
(Peter has disagreed with me in email on this, but I can’t agree. In an expected utility calculation, each term must be the utility of something, times the probability of that same something.)
The second class of counter-examples relies on the fact that the utility function is not constrained to be positive. The proof says, “To establish that this series does not converge, we will show that infinitely many of its terms have absolute value >= 1.” The consequent does not follow from the antecedent if the terms can be negative; so we already know the proof is flawed in that case. (And if it isn’t the case, why does the proof mention absolute value?)
I define U(n) = n if n is even, -n if n is odd. Now I define p(n) = 1 / 2^n. p(n) is a valid probability distribution, as it sums to one; and the infinite sum of p(f)U(f) converges on zero.
The third and most important class of counter-examples is building a reasonable positive utility function and probability distribution for which the theorem fails. Again use p(n) = 1 / 2^n, which sums to 1. My utility function will be
U(n) = log(n) for n = 2^i, i an integer
U(n) = 0 otherwise
Now U(n) is bounded below in absolute value by itself, and U(n) is unbounded; and yet the infinite sum p(n)U(n) converges to 1. (Intuitive proof: The sum of all the n terms from (2^n)+1 to 2^(n+1) is n+1; their average is thus 1, and they all have very similar probabilities. Therefore, the sum of the infinite series is going to be the same as if it used the constant U(n) = 1, which would make the entire sum p(n)U(n) sum to 1.)
Peter does place a limit on the probability distribution: It must never be zero. (I read a ‘<’ as ‘<=’.) This removes counterexamples 1 and 3. However, I am not sure it’s possible to build a probability distribution satisfying his requirements (one that has cardinality 2^N, no zero terms, and sums to one). This link says it is not possible.
The reason Peter’s expected value calculation is not of the form p(x)U(x) is because he is summing over one possible action. p(h) is the probability that a particular hypothesis h is true; h(k) is the result of action k when h is true; U(h(k)) is the utility from that result.
“To establish that this series does not converge, we will show that infinitely many of its terms have absolute value >= 1.” This statement is valid.
However, my second counterexample still looks solid to me: U(n) = n if n is even, -n if n is odd. p(n) = 1/2^n. This doesn’t work in Peter’s framework because the domain of p is not countable.
So I give 3 caveats on Peter’s theorem:
It only applies if the number of possible worlds, and possible actions you are summing over, is uncountable. This seems overly restrictive. It really doesn’t matter if expected utility converges or not, when the sum for the expected utility for a single action would be non-computable regardless of whether it converged.
It is impossible to construct a probability distribution satisfying his requirements, so the theorem doesn’t apply to any possible situations.
It doesn’t prove that the expected utility isn’t bounded. The way the theorem works, it can’t rule out a bound over an interval of 2, e.g., U(k) might be provably between −1.1 and 1.1. A reasoner can work fine with bounded utility functions.
I can’t agree with any of your caveats. (This is not the same as saying that I think everything in PdB’s paper is correct. I haven’t looked at it carefully enough to have an opinion on that point.)
“my second counterexample still looks solid to me … It only applies if the number of … is uncountable”:
The function U in PdB’s paper doesn’t take integers as arguments, it takes infinite sequences of “perceptions”. Provided there are at least two possible perceptions at each time step, there are uncountably many such sequences. How are you proposing to change the model to make the domain of U countable, while still making it be a model of agents acting in time?
One way would be to use finite instead of infinite sequences. That corresponds to considering only a finite number of time-steps. (In that case, obviously temporal discounting isn’t going to have any impact on the convergence of anything.)
Another possibility would be to use a utility function that’s zero almost everywhere. Aside from the obvious remark that this seems unlikely to be a good model for real agents’ (actual or idealized) utility functions, I think (but this is just pure intuition; I haven’t tried to construct a proof or looked for refutations) that a utility function satisfying PdB’s conditions about being bounded away from zero by a computable function is necessarily nonzero on an uncountable set of arguments.
“It is impossible to construct a probability distribution satisfying his requirements”″:
I think you are mistaken about what those requirements are. What he needs is a probability measure on the space of environments; he doesn’t insist that each individual possible environment have nonzero probability. What he does insist on is a nonzero probability for each possible computable environment (note: an environment is a function from agent action-sequences to agent perception-sequences), and there are only countably many of those.
“It doesn’t prove that the expected utility isn’t bounded.”
I think it does, actually, although he hasn’t said so explicitly. He has this noncomputable function B, and he uses the assertion that B(n) > rho(n) infinitely often when rho is computable. But then consider, say, 2^n rho(n), which is also computable; B(n) / rho(n) is then > 2^n infinitely often. Unless I’m confused (which I might be) I think this leads not only to the conclusion (which he states) that the absolute values of the terms in his series are >=1 infinitely often, but that you can find an infinite subsequence of the terms that grows faster than 2^n, or indeed faster than any computable function.
In any case, if you try to sum an infinite series with infinitely many terms of absolute value > 1, then the range of values in your partial sums depends on the order of summation. In particular, even if the U(k) were bounded, that wouldn’t enable you to say things like “the expected utility is a divergent series but its value oscillates only between −13 and +56” unless there were One True Order of summation for the terms of the series. Which I can’t see that there is or could be.
The function U in PdB’s paper doesn’t take integers as arguments, it takes infinite sequences of “perceptions”. Provided there are at least two possible perceptions at each time step, there are uncountably many such sequences. How are you proposing to change the model to make the domain of U countable, while still making it be a model of agents acting in time?
The domain of U must not only be countable; it must be finite.
Remember, an agent needs to compute this sum once for every possible action!
Both the number of possible worlds, and the number of possible actions, need to be finite; or the utility function needs to be solvable analytically (unlikely); or else the agent will take an infinite amount of time to make a decision.
So, the model of an agent acting in time, where we compute utility at each timestep in the future, does not seem to produce a computable decision process. Is that right?
I think you are mistaken about what those requirements are. What he needs is a probability measure on the space of environments; he doesn’t insist that each individual possible environment have nonzero probability.
You’re right. Let’s try to figure out how many possible environments there are.
“Let p:H->R be our probability distribution on the hypothesis set. We require that there exist some computable function p-bar:N->Q such that for all psi_n in S_I, 0 < p-bar(n) ⇐ p(psi_n)”.
p(psi_n) must be non-zero for all psi_n in S_I. S is the set of mu-recursive functions on a single argument. A mu-recursive function is one that can be computed by Turing machines. S_I is the set of computable environments f such that for every input that has been tested, f(i) = h(i), where h is the true environment.
How large is S_I? It could be up to the size of S. Is the number of computable functions less than R (2^N)?
In any case, if you try to sum an infinite series with infinitely many terms of absolute value > 1, then the range of values in your partial sums depends on the order of summation. In particular, even if the U(k) were bounded, that wouldn’t enable you to say things like “the expected utility is a divergent series but its value oscillates only between −13 and +56” unless there were One True Order of summation for the terms of the series. Which I can’t see that there is or could be.
If I construct a countable series so as to have a 1-1 pairing between terms with value 1 and terms with value −1, does that do no good? e.g. U(n) = 1 if n is even, −1 if n is odd.
Is there a way to enter subscripts and superscripts in this thing?
No, the domain of U need not be finite. (1) The paper explicitly says that U need not be computable. (2) The model of computation implicit in the paper permits computable functions to have infinite domain (it’s basically Turing machines). (3) The way the paper proposes that a computable U should be implemented involves only looking at a finite amount of the information in the argument of U. (Kinda.)
But yes: the paper’s model does permit a non-computable decision process. (Neither does it require that U be uncomputable.)
There is a countable infinite of computable functions. (Because there is a countable infinity of programs, because a program is a finite string or something of the kind.) An environment is defined to be a function from sequences of actions to sequences of perceptions, and there are uncountably many (more precisely: continuum-many) of those. So the situation envisaged by the paper is as follows: our agent assigns a nonzero probability to each computable environment; if the sum of those probabilities is less than 1, it also assigns a nonzero probability to some sets (though not necessarily any singleton sets) of other possible environments.
Suppose you get an expression for the expected utility in some situation, and it takes the form of an infinite series: 1 − 1 + 1 − 1 + 1 − 1 + 1 - … (etc.). Then you might want to say “well, there’s no actual expected utility, but we can say that the expected utility is some kind of fuzzy thing taking values between 0 and 1”—but unless there’s some reason why the order of summation you used there is the only possible one, you could rearrange the terms a little (swap terms 2 and 3, terms 6 and 7, etc.) and get 1 + 1 − 1 − 1 + 1 + 1 − 1 − 1 + … whose partial sums vary between 0 and 2 instead. Or you could rearrange the terms more drastically and get partial sums that become arbitrarily large-and-positive and arbitrarily large-and-negative. Or you could just swap the first two terms, getting a series whose values vary between −1 and +1 instead of between 0 and +1.
Perhaps it’s possible to construct some theory of “fuzzy convergence” for series like this when there’s a single right order to add up the terms in. But it’s far from obvious that you could base any usable sort of decision theory on utilities defined as the sums of such series. In any case, in this instance there’s one term for each possible computable environment, and it seems very obvious that there isn’t a canonical way to order those, and that different orderings will give substantially different sequences of partial sums.
I was unhappy with having an uncomputable decision process, but I realize now that isn’t important—we can approximate it, but only if it exists. The paper is about whether it exists atall.
I can’t understand the proof, but I can construct some counterexamples. Either these disprove the theorem, or they will show where I misunderstand the proof.
The first class relies on the fact that the proof places no restrictions on the probability distribution. Define U(n) = n. This is bounded below in abs. value by the unbounded computable function U(n)=n. Define p:N->R such that p(1) = 1, and p=0 for all other n. Now EU(h(k)) = 1.
Notice that I am defining p as a function from the integers into the reals, whereas Peter defined it as a function from the reals into the reals. Peter’s definition must be incorrect, since in equation 2, the term should not be p(f)U(f(k)), as Peter has defined it; it must be either p(f)U(f) or p(f(k))U(f(k)). p and U must always be operating on the same thing within one term, so p should be defined over the same domain as U.
(Peter has disagreed with me in email on this, but I can’t agree. In an expected utility calculation, each term must be the utility of something, times the probability of that same something.)
The second class of counter-examples relies on the fact that the utility function is not constrained to be positive. The proof says, “To establish that this series does not converge, we will show that infinitely many of its terms have absolute value >= 1.” The consequent does not follow from the antecedent if the terms can be negative; so we already know the proof is flawed in that case. (And if it isn’t the case, why does the proof mention absolute value?)
I define U(n) = n if n is even, -n if n is odd. Now I define p(n) = 1 / 2^n. p(n) is a valid probability distribution, as it sums to one; and the infinite sum of p(f)U(f) converges on zero.
The third and most important class of counter-examples is building a reasonable positive utility function and probability distribution for which the theorem fails. Again use p(n) = 1 / 2^n, which sums to 1. My utility function will be
U(n) = log(n) for n = 2^i, i an integer
U(n) = 0 otherwise
Now U(n) is bounded below in absolute value by itself, and U(n) is unbounded; and yet the infinite sum p(n)U(n) converges to 1. (Intuitive proof: The sum of all the n terms from (2^n)+1 to 2^(n+1) is n+1; their average is thus 1, and they all have very similar probabilities. Therefore, the sum of the infinite series is going to be the same as if it used the constant U(n) = 1, which would make the entire sum p(n)U(n) sum to 1.)
Some retractions, clarified by Peter in email:
Peter does place a limit on the probability distribution: It must never be zero. (I read a ‘<’ as ‘<=’.) This removes counterexamples 1 and 3. However, I am not sure it’s possible to build a probability distribution satisfying his requirements (one that has cardinality 2^N, no zero terms, and sums to one). This link says it is not possible.
The reason Peter’s expected value calculation is not of the form p(x)U(x) is because he is summing over one possible action. p(h) is the probability that a particular hypothesis h is true; h(k) is the result of action k when h is true; U(h(k)) is the utility from that result.
“To establish that this series does not converge, we will show that infinitely many of its terms have absolute value >= 1.” This statement is valid.
However, my second counterexample still looks solid to me: U(n) = n if n is even, -n if n is odd. p(n) = 1/2^n. This doesn’t work in Peter’s framework because the domain of p is not countable.
So I give 3 caveats on Peter’s theorem:
It only applies if the number of possible worlds, and possible actions you are summing over, is uncountable. This seems overly restrictive. It really doesn’t matter if expected utility converges or not, when the sum for the expected utility for a single action would be non-computable regardless of whether it converged.
It is impossible to construct a probability distribution satisfying his requirements, so the theorem doesn’t apply to any possible situations.
It doesn’t prove that the expected utility isn’t bounded. The way the theorem works, it can’t rule out a bound over an interval of 2, e.g., U(k) might be provably between −1.1 and 1.1. A reasoner can work fine with bounded utility functions.
I can’t agree with any of your caveats. (This is not the same as saying that I think everything in PdB’s paper is correct. I haven’t looked at it carefully enough to have an opinion on that point.)
“my second counterexample still looks solid to me … It only applies if the number of … is uncountable”:
The function U in PdB’s paper doesn’t take integers as arguments, it takes infinite sequences of “perceptions”. Provided there are at least two possible perceptions at each time step, there are uncountably many such sequences. How are you proposing to change the model to make the domain of U countable, while still making it be a model of agents acting in time?
One way would be to use finite instead of infinite sequences. That corresponds to considering only a finite number of time-steps. (In that case, obviously temporal discounting isn’t going to have any impact on the convergence of anything.)
Another possibility would be to use a utility function that’s zero almost everywhere. Aside from the obvious remark that this seems unlikely to be a good model for real agents’ (actual or idealized) utility functions, I think (but this is just pure intuition; I haven’t tried to construct a proof or looked for refutations) that a utility function satisfying PdB’s conditions about being bounded away from zero by a computable function is necessarily nonzero on an uncountable set of arguments.
“It is impossible to construct a probability distribution satisfying his requirements”″:
I think you are mistaken about what those requirements are. What he needs is a probability measure on the space of environments; he doesn’t insist that each individual possible environment have nonzero probability. What he does insist on is a nonzero probability for each possible computable environment (note: an environment is a function from agent action-sequences to agent perception-sequences), and there are only countably many of those.
“It doesn’t prove that the expected utility isn’t bounded.”
I think it does, actually, although he hasn’t said so explicitly. He has this noncomputable function B, and he uses the assertion that B(n) > rho(n) infinitely often when rho is computable. But then consider, say, 2^n rho(n), which is also computable; B(n) / rho(n) is then > 2^n infinitely often. Unless I’m confused (which I might be) I think this leads not only to the conclusion (which he states) that the absolute values of the terms in his series are >=1 infinitely often, but that you can find an infinite subsequence of the terms that grows faster than 2^n, or indeed faster than any computable function.
In any case, if you try to sum an infinite series with infinitely many terms of absolute value > 1, then the range of values in your partial sums depends on the order of summation. In particular, even if the U(k) were bounded, that wouldn’t enable you to say things like “the expected utility is a divergent series but its value oscillates only between −13 and +56” unless there were One True Order of summation for the terms of the series. Which I can’t see that there is or could be.
The domain of U must not only be countable; it must be finite. Remember, an agent needs to compute this sum once for every possible action! Both the number of possible worlds, and the number of possible actions, need to be finite; or the utility function needs to be solvable analytically (unlikely); or else the agent will take an infinite amount of time to make a decision.
So, the model of an agent acting in time, where we compute utility at each timestep in the future, does not seem to produce a computable decision process. Is that right?
You’re right. Let’s try to figure out how many possible environments there are.
“Let p:H->R be our probability distribution on the hypothesis set. We require that there exist some computable function p-bar:N->Q such that for all psi_n in S_I, 0 < p-bar(n) ⇐ p(psi_n)”.
p(psi_n) must be non-zero for all psi_n in S_I. S is the set of mu-recursive functions on a single argument. A mu-recursive function is one that can be computed by Turing machines. S_I is the set of computable environments f such that for every input that has been tested, f(i) = h(i), where h is the true environment.
How large is S_I? It could be up to the size of S. Is the number of computable functions less than R (2^N)?
If I construct a countable series so as to have a 1-1 pairing between terms with value 1 and terms with value −1, does that do no good? e.g. U(n) = 1 if n is even, −1 if n is odd.
Is there a way to enter subscripts and superscripts in this thing?
No, the domain of U need not be finite. (1) The paper explicitly says that U need not be computable. (2) The model of computation implicit in the paper permits computable functions to have infinite domain (it’s basically Turing machines). (3) The way the paper proposes that a computable U should be implemented involves only looking at a finite amount of the information in the argument of U. (Kinda.)
But yes: the paper’s model does permit a non-computable decision process. (Neither does it require that U be uncomputable.)
There is a countable infinite of computable functions. (Because there is a countable infinity of programs, because a program is a finite string or something of the kind.) An environment is defined to be a function from sequences of actions to sequences of perceptions, and there are uncountably many (more precisely: continuum-many) of those. So the situation envisaged by the paper is as follows: our agent assigns a nonzero probability to each computable environment; if the sum of those probabilities is less than 1, it also assigns a nonzero probability to some sets (though not necessarily any singleton sets) of other possible environments.
Suppose you get an expression for the expected utility in some situation, and it takes the form of an infinite series: 1 − 1 + 1 − 1 + 1 − 1 + 1 - … (etc.). Then you might want to say “well, there’s no actual expected utility, but we can say that the expected utility is some kind of fuzzy thing taking values between 0 and 1”—but unless there’s some reason why the order of summation you used there is the only possible one, you could rearrange the terms a little (swap terms 2 and 3, terms 6 and 7, etc.) and get 1 + 1 − 1 − 1 + 1 + 1 − 1 − 1 + … whose partial sums vary between 0 and 2 instead. Or you could rearrange the terms more drastically and get partial sums that become arbitrarily large-and-positive and arbitrarily large-and-negative. Or you could just swap the first two terms, getting a series whose values vary between −1 and +1 instead of between 0 and +1.
Perhaps it’s possible to construct some theory of “fuzzy convergence” for series like this when there’s a single right order to add up the terms in. But it’s far from obvious that you could base any usable sort of decision theory on utilities defined as the sums of such series. In any case, in this instance there’s one term for each possible computable environment, and it seems very obvious that there isn’t a canonical way to order those, and that different orderings will give substantially different sequences of partial sums.
I was unhappy with having an uncomputable decision process, but I realize now that isn’t important—we can approximate it, but only if it exists. The paper is about whether it exists atall.
There are numerical methods for estimating integrals, you know.