It doesn’t say that utility functions don’t converge. It says that the expected value of unbounded utility functions don’t converge.
Ironically, I read that paper last week, and my main reaction was that it was not very relevant because it did not incorporate time discounting; or that you could interpret it as an argument for time discounting, because time discounting makes its result go away.
As Manfred said, it does not appear that the results of the paper are affected by time discounting.
Let’s be a bit more explicit about this. The model in the paper is that an action (or perhaps a pair (action,context)) is represented by a single natural number; this is provided to the environment and it returns a single natural number; the agent feeds that number into its utility function, and out comes a utility. The agent has measured what the environment does in response to a finite set of actions; what it cares about is the expected utility (over computable environments whose behaviour is consistent with the agent’s past observations, with some not-necessarily-computable but not too ill-behaved probability distribution on them) of the environment’s response to its actions.
The paper says that the expected utilities don’t exist, if the utility function is unbounded and computable (or merely bounded below in absolute value by an unbounded computable function).
(Remark: It seems to me that this isn’t necessarily fatal; if the agent cannot exactly repeat previous actions, and if it happens that the expected utility difference between two of the agent’s actions exists, then it can still decide between actions. However, (1) the “cannot repeat previous actions” condition seems a bit artificial and (2) I’d guess that the arguments in the paper can be adjusted to show that expected utility differences are also divergent. But I could be wrong, and it would be interesting to know.)
So, anyway. How does time discounting fit into this? It seems to me that this is meant to model the immediate response of the environment to the agent’s action; time doesn’t come into it at all. And the conclusion is that even then—even without considering the possible infinite future—the relevant expectations don’t exist.
The pathology described in the paper doesn’t seem to me to have anything to do with not discounting in time. Turning the consequences of an action into an infinite stream rather than a single result might make things worse, but it can’t possibly make them better.
Actually, that’s not quite fair. Here’s one way it could make them better. One way to avoid the divergence described in the paper is to have a bounded utility function. That seems troublesome to many people. But it’s not so unreasonable to say that the utility you attach to what happens in any bounded region of spacetime should be bounded. So maybe there’s some mileage to taking the bounded-utility case of this model (where the expectations all exist happily), then representing an agent’s actual deliberations as involving (say) some kind of limit as the spacetime region gets larger, and hoping that lim { larger spacetime region } E { utility } converges even though E { lim { larger spacetime region} utility } doesn’t. Which might be the case; I haven’t thought about it carefully enough to have much idea how plausible that is.
In that scenario, you might well get saved by exponential time-discounting. Or even something weaker like 1/t^2. (Probably not 1/t, though.) But it seems to me that filling in the details is a big job; and I don’t think it can possibly be right to assert that time discounting makes the result of the paper go away, without doing that work.
Hi, downvoter! If you happen not to be PhilGoetz (whose objections I already know from his reply), could you please let me know what you didn’t like about what I wrote? Did I make a mistake or express something unclearly?
Hi, (other?) downvoter! Whatever do you object to about the above?
If someone’s downvoting me for good reason, I would like to know how I can improve. If someone’s downvoting me without good reason, I would like to make that apparent. The interests of LW, as much as my own, are advanced in both cases.
See my response to Manfred above. The paper does not define what it means by “expected value”; I am assuming that it means the sum of all possible continuations of the infinite series, multiplied by their probabilities, because that seems consistent and appropriate. It could alternately mean the expectation of the value that an infinite series converges to, which would be a peculiar way of talking about utility calculations (it would be the opposite of time-discounting: the present doesn’t matter, only the infinite future), but would probably also be consistent with the paper.
The paper assumes that you have a universe of possible infinite series, all of which diverge; and proves (not surprisingly) that the sum of an expected value over an infinite number of such infinite series diverges.
If instead of taking the sum x1 + x2 + …, you use time-discounting into the future from the present time t=1:
x1 + f(x2 + f(x3 + f(x4 + …))))
where f(x) = x/c, then you are using exponential time-discounting; and you can find series that meet the particular definition of “unbounded” in the paper, but that are exponentially bounded, and for which the expected value of time-discounted infinite series, multiplied by their probabilities, would converge.
As I just pointed out at that other place in the thread, you are talking about a different paper from the one endoself linked to. The paper we are discussing here does not make the assumption you describe.
It doesn’t say that utility functions don’t converge. It says that the expected value of unbounded utility functions don’t converge.
Ironically, I read that paper last week, and my main reaction was that it was not very relevant because it did not incorporate time discounting; or that you could interpret it as an argument for time discounting, because time discounting makes its result go away.
As Manfred said, it does not appear that the results of the paper are affected by time discounting.
Let’s be a bit more explicit about this. The model in the paper is that an action (or perhaps a pair (action,context)) is represented by a single natural number; this is provided to the environment and it returns a single natural number; the agent feeds that number into its utility function, and out comes a utility. The agent has measured what the environment does in response to a finite set of actions; what it cares about is the expected utility (over computable environments whose behaviour is consistent with the agent’s past observations, with some not-necessarily-computable but not too ill-behaved probability distribution on them) of the environment’s response to its actions.
The paper says that the expected utilities don’t exist, if the utility function is unbounded and computable (or merely bounded below in absolute value by an unbounded computable function).
(Remark: It seems to me that this isn’t necessarily fatal; if the agent cannot exactly repeat previous actions, and if it happens that the expected utility difference between two of the agent’s actions exists, then it can still decide between actions. However, (1) the “cannot repeat previous actions” condition seems a bit artificial and (2) I’d guess that the arguments in the paper can be adjusted to show that expected utility differences are also divergent. But I could be wrong, and it would be interesting to know.)
So, anyway. How does time discounting fit into this? It seems to me that this is meant to model the immediate response of the environment to the agent’s action; time doesn’t come into it at all. And the conclusion is that even then—even without considering the possible infinite future—the relevant expectations don’t exist.
The pathology described in the paper doesn’t seem to me to have anything to do with not discounting in time. Turning the consequences of an action into an infinite stream rather than a single result might make things worse, but it can’t possibly make them better.
Actually, that’s not quite fair. Here’s one way it could make them better. One way to avoid the divergence described in the paper is to have a bounded utility function. That seems troublesome to many people. But it’s not so unreasonable to say that the utility you attach to what happens in any bounded region of spacetime should be bounded. So maybe there’s some mileage to taking the bounded-utility case of this model (where the expectations all exist happily), then representing an agent’s actual deliberations as involving (say) some kind of limit as the spacetime region gets larger, and hoping that lim { larger spacetime region } E { utility } converges even though E { lim { larger spacetime region} utility } doesn’t. Which might be the case; I haven’t thought about it carefully enough to have much idea how plausible that is.
In that scenario, you might well get saved by exponential time-discounting. Or even something weaker like 1/t^2. (Probably not 1/t, though.) But it seems to me that filling in the details is a big job; and I don’t think it can possibly be right to assert that time discounting makes the result of the paper go away, without doing that work.
Hi, downvoter! If you happen not to be PhilGoetz (whose objections I already know from his reply), could you please let me know what you didn’t like about what I wrote? Did I make a mistake or express something unclearly?
Thanks.
Hi, (other?) downvoter! Whatever do you object to about the above?
If someone’s downvoting me for good reason, I would like to know how I can improve. If someone’s downvoting me without good reason, I would like to make that apparent. The interests of LW, as much as my own, are advanced in both cases.
What, if anything, am I missing?
See my response to Manfred above. The paper does not define what it means by “expected value”; I am assuming that it means the sum of all possible continuations of the infinite series, multiplied by their probabilities, because that seems consistent and appropriate. It could alternately mean the expectation of the value that an infinite series converges to, which would be a peculiar way of talking about utility calculations (it would be the opposite of time-discounting: the present doesn’t matter, only the infinite future), but would probably also be consistent with the paper.
The paper assumes that you have a universe of possible infinite series, all of which diverge; and proves (not surprisingly) that the sum of an expected value over an infinite number of such infinite series diverges.
If instead of taking the sum x1 + x2 + …, you use time-discounting into the future from the present time t=1:
x1 + f(x2 + f(x3 + f(x4 + …))))
where f(x) = x/c, then you are using exponential time-discounting; and you can find series that meet the particular definition of “unbounded” in the paper, but that are exponentially bounded, and for which the expected value of time-discounted infinite series, multiplied by their probabilities, would converge.
As I just pointed out at that other place in the thread, you are talking about a different paper from the one endoself linked to. The paper we are discussing here does not make the assumption you describe.