Any temporal discounting other than temporal is provably inconsistent
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present. If we could self-modify to relax that architectural constraint—perhaps weighting according to some fixed less temporally indexical schedule, or valuing something other than weighted integrals of reward—would you nonetheless hold that rational consistency would require us to continue to engage in exponential temporal discounting? Whether or not the architectural constraint had previously been a matter of choice? (And who would be the “us” who would thus be required by rational consistency, so that we could extract a normative discount rate from them? Different aspects of a person or civilization exhibit discount functions with different timescales, and our discount functions and architectural constraints can themselves partially be traced to decision-like evolutionary and ecological phenomena in the biosphere, whose “reasoning” we may wish to re-examine.)
(ETA: Maybe I should be less uncharitable about your implied position, since you may not have been aware of the conditions of the proof you cited, or not thought to consider a wider range of agent motivational architectures. But if that was the sort of thing you didn’t know, and it was crucial to your original case, you should have known to state your case in more measured and careful language. If you commit strongly to a hostile conclusion that seems unjustifiable, I unthinkingly respond by exploiting the unjustifiability and strength of commitment to make the hostile conclusion look bad, using lines of modus tollens reasoning that wouldn’t be able to rhetorically connect if your commitment had been weaker.)
To my current thinking, preferences would be one form of information about desirability of events, and any information about desirability of events would be timeless—even if the events that were desirable were within time, and even if the information about their desirability must have been acquired within time. There’s no direct reason why questions of “when you learned about the desirability” or “when you had to act on the desirability” should enter into it.
Why were you so certain [...] that exponential temporal discounting behavior was a matter of intrinsic value rather than instrumental value [...]?
Perhaps I should have left out the distraction of the term “exponential”, and asked: “Why were you so certain that temporal discounting in behavior was a matter of intrinsic value rather than instrumental value?” In part my comment was to argue that:
discounting behavior can be generated for instrumental reasons;
we may reach different conclusions as to whether discounting behavior is a matter of intrinsic or instrumental value, depending on the level of analysis at which we identify agency (and/or instrumental agency);
there are reasons to expect that, in interpreting utility functions from preference claims, we may easily become confused and inappropriately assign intrinsicality to values or rules of valuation which were actually instrumental.
I should have argued more explicitly that:
Instrumental exponential discounting is conditional, not eternal; it lasts only as long as the exponentially growing opportunity costs which motivate it.
Inappropriate hypotheses of intrinsicality of values can lead to paradoxes, which the corresponding hypotheses of instrumentality may avoid. This is because instrumental values have effect conditionally while intrinsic values have effect unconditionally. Thus, if you observe an apparent paradox during an analysis that assumes intrinsicality, you should put more weight on competing analyses that assume instrumentality, on the theory that you missed a relevant condition which prevents a conflicting value from extending to the paradoxical case.
(My argument was meant to cover non-exponential discounting as well, and show that exponential discounting behavior can be caused by a same mechanism as non-exponential discounting behavior, since I did not specify that market rates of return were constant.)
My comment was also to argue that we are simply confused about the right way to extract utility functions from information about behavior or reported preferences, and therefore that apparent paradoxes do not necessarily mean that the premises are wrong which they appear to mean are wrong.
Any temporal discounting other than temporal is provably inconsistent
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present.
To recap, the idea is that it is the self-similarity property of exponential functions that produces this result—and the exponential function is the only non-linear function with that property.
All other forms of discounting allow for the possibility of preference reversals with the mere passage of time—as discussed here.
This idea has nothing to do with reinforcement learning.
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present. If we could self-modify to relax that architectural constraint—perhaps weighting according to some fixed less temporally indexical schedule, or valuing something other than weighted integrals of reward—would you nonetheless hold that rational consistency would require us to continue to engage in exponential temporal discounting? Whether or not the architectural constraint had previously been a matter of choice? (And who would be the “us” who would thus be required by rational consistency, so that we could extract a normative discount rate from them? Different aspects of a person or civilization exhibit discount functions with different timescales, and our discount functions and architectural constraints can themselves partially be traced to decision-like evolutionary and ecological phenomena in the biosphere, whose “reasoning” we may wish to re-examine.)
(ETA: Maybe I should be less uncharitable about your implied position, since you may not have been aware of the conditions of the proof you cited, or not thought to consider a wider range of agent motivational architectures. But if that was the sort of thing you didn’t know, and it was crucial to your original case, you should have known to state your case in more measured and careful language. If you commit strongly to a hostile conclusion that seems unjustifiable, I unthinkingly respond by exploiting the unjustifiability and strength of commitment to make the hostile conclusion look bad, using lines of modus tollens reasoning that wouldn’t be able to rhetorically connect if your commitment had been weaker.)
To my current thinking, preferences would be one form of information about desirability of events, and any information about desirability of events would be timeless—even if the events that were desirable were within time, and even if the information about their desirability must have been acquired within time. There’s no direct reason why questions of “when you learned about the desirability” or “when you had to act on the desirability” should enter into it.
Perhaps I should have left out the distraction of the term “exponential”, and asked: “Why were you so certain that temporal discounting in behavior was a matter of intrinsic value rather than instrumental value?” In part my comment was to argue that:
discounting behavior can be generated for instrumental reasons;
we may reach different conclusions as to whether discounting behavior is a matter of intrinsic or instrumental value, depending on the level of analysis at which we identify agency (and/or instrumental agency);
there are reasons to expect that, in interpreting utility functions from preference claims, we may easily become confused and inappropriately assign intrinsicality to values or rules of valuation which were actually instrumental.
I should have argued more explicitly that:
Instrumental exponential discounting is conditional, not eternal; it lasts only as long as the exponentially growing opportunity costs which motivate it.
Inappropriate hypotheses of intrinsicality of values can lead to paradoxes, which the corresponding hypotheses of instrumentality may avoid. This is because instrumental values have effect conditionally while intrinsic values have effect unconditionally. Thus, if you observe an apparent paradox during an analysis that assumes intrinsicality, you should put more weight on competing analyses that assume instrumentality, on the theory that you missed a relevant condition which prevents a conflicting value from extending to the paradoxical case.
(My argument was meant to cover non-exponential discounting as well, and show that exponential discounting behavior can be caused by a same mechanism as non-exponential discounting behavior, since I did not specify that market rates of return were constant.)
My comment was also to argue that we are simply confused about the right way to extract utility functions from information about behavior or reported preferences, and therefore that apparent paradoxes do not necessarily mean that the premises are wrong which they appear to mean are wrong.
To recap, the idea is that it is the self-similarity property of exponential functions that produces this result—and the exponential function is the only non-linear function with that property.
All other forms of discounting allow for the possibility of preference reversals with the mere passage of time—as discussed here.
This idea has nothing to do with reinforcement learning.