What information can be derived about utility functions from behavior?
(Here, “information about utility functions” may be understood in your policy-relevant sense, of “factors influencing the course of action that rational expected-utility maximization might surprisingly choose to force upon us after it was too late to decommit.”)
Suppose you observe that some agents, when they are investing, take into account projected market rates of return when trading off gains and losses at different points in time. Here are two hypotheses about the utility functions of those agents.
Hypothesis 1: These agents happened to already have a utility function whose temporal discounting was to match what the market rate of return would be. This is to say: The utility function already assigned particular intrinsic values to hypothetical events in which assets were gained or lost at different times. The ratios between these intrinsic values were already equal to what the appropriate exponential of the integrated market rate of return would later turn out to be.
Hypothesis 2: These agents have a utility function in which assets gained or lost in the near term are valued because of an intrinsic good which could be purchased with those assets at a point in the distant future. These agents evaluate near-term investments and payoffs happening at different times in terms of market rates of return, for understandable and purely instrumental reasons relating to opportunity cost.
Neither hypothesis is quite plausible psychologically or historically, but the second hypothesis is closer to being plausible, and each hypothesis makes the same predictive distribution about the agents’ near-term investment behaviors. This is to say that the “preference likelihood” ratio between the two hypotheses is flat.
(In your apparent policy terms, this would correspond roughly to the idea that, while rational expected-utility maximization may be trying to “choose” which of these two utility functions to define as normative, so that it can then “force” the courses of action dictated by the chosen utility function “upon” the agents, in this case the balance of factors affecting rational expected-utility maximization’s “choice” evens out. Therefore, rational expected-utility maximization’s “decision” will depend on its prior disposition to “prefer” one or the other utility function, for reasons unrelated to observation.)
Now, suppose that the agents from the second hypothesis forecast market rates of return for some period, and then create new agents. These new agents have recognizable internal data structures representing utility functions in a form as per the first hypothesis, and these data structures will be queried to determine the new agents’ decisions about near-term trades. However, the new agents’ only source of information about their utility functions comes from observing their own behavior: they do not have direct introspective access to their internal data structure, and they do not know about the asset conversion event in the future. (However, they will convert their holdings at that time, as a hard-coded instinct; in terms of revealed preference, this can be interpreted as having a utility function that assigns the purchased good infinite relative value). Now, which hypothesis should we say is “really” true of these new agents’ utility functions?
(And how do we delineate what the parts of this situation even are, that supposedly “have” the utility functions we want to inquire about?)
This is a general problem with our present framework for reasoning about utility. The predictions and recommendations from a hypothesized utility function are invariant under various transformations of the hypothesis; in particular, transformations that preserve relative intervals of expected utility between available actions at each juncture. For example, for a perfect expected-utility maximizer, the reward function constructed by a perfectly trained temporal-difference reinforcement learning system motivates exactly the same behavior as the reward function whose integrals the TD learner was trained to predict. (This is quite apart from the problem of invariance under transformations that stretch or squeeze probability and reward simultaneously, such as the transformations that relate different methods of anthropic reasoning.)
As if to add to the confusion, when humans are informed about utility theory, and asked to interpret their introspective information about their preferences in terms of utility, they will report different preferences as being “intrinsic” vs. “instrumental” at different points in time [citation: folk belief]. There may be a psychological process related to temporal-difference reinforcement learning which converts preferences which introspectively appear “instrumental” into preferences which introspectively appear “intrinsic”.
Why were you so certain, in your original draft, that exponential temporal discounting behavior was a matter of intrinsic value rather than instrumental value, so that a normative framework of utilitarian reasoning would force it upon us, and the alternative possibility was not worth mentioning?
Why were you so certain, in your original draft, that exponential temporal discounting behavior was a matter of intrinsic value rather than instrumental value, so that a normative framework of utilitarian reasoning would force it upon us, and the alternative possibility was not worth mentioning?
Any temporal discounting other than temporal is provably inconsistent, a point Eliezer makes in his post against temporal discounting. Exponential temporal discounting is the default assumption. My post works with the default assumption. Arguing that you can use an alternate method of discounting would require a second post.
When you have a solution that is provably the only self-consistent solution, it’s a drastic measure to say, “I will simply override that with my preferences. I will value irrationality. And I will build a FAI that I am entrusting with the future of the universe, and teach it to be irrational.”
It’s not off the table. But it needs a lot of justification. I’m glad the post has triggered discussion of possible other methods of temporal discounting. But only if it leads to a serious discussion of it, not if it just causes people to say, “Oh, we can get around this problem with a non-exponential discounting”, without realizing all the problems that entails.
Any temporal discounting other than temporal is provably inconsistent
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present. If we could self-modify to relax that architectural constraint—perhaps weighting according to some fixed less temporally indexical schedule, or valuing something other than weighted integrals of reward—would you nonetheless hold that rational consistency would require us to continue to engage in exponential temporal discounting? Whether or not the architectural constraint had previously been a matter of choice? (And who would be the “us” who would thus be required by rational consistency, so that we could extract a normative discount rate from them? Different aspects of a person or civilization exhibit discount functions with different timescales, and our discount functions and architectural constraints can themselves partially be traced to decision-like evolutionary and ecological phenomena in the biosphere, whose “reasoning” we may wish to re-examine.)
(ETA: Maybe I should be less uncharitable about your implied position, since you may not have been aware of the conditions of the proof you cited, or not thought to consider a wider range of agent motivational architectures. But if that was the sort of thing you didn’t know, and it was crucial to your original case, you should have known to state your case in more measured and careful language. If you commit strongly to a hostile conclusion that seems unjustifiable, I unthinkingly respond by exploiting the unjustifiability and strength of commitment to make the hostile conclusion look bad, using lines of modus tollens reasoning that wouldn’t be able to rhetorically connect if your commitment had been weaker.)
To my current thinking, preferences would be one form of information about desirability of events, and any information about desirability of events would be timeless—even if the events that were desirable were within time, and even if the information about their desirability must have been acquired within time. There’s no direct reason why questions of “when you learned about the desirability” or “when you had to act on the desirability” should enter into it.
Why were you so certain [...] that exponential temporal discounting behavior was a matter of intrinsic value rather than instrumental value [...]?
Perhaps I should have left out the distraction of the term “exponential”, and asked: “Why were you so certain that temporal discounting in behavior was a matter of intrinsic value rather than instrumental value?” In part my comment was to argue that:
discounting behavior can be generated for instrumental reasons;
we may reach different conclusions as to whether discounting behavior is a matter of intrinsic or instrumental value, depending on the level of analysis at which we identify agency (and/or instrumental agency);
there are reasons to expect that, in interpreting utility functions from preference claims, we may easily become confused and inappropriately assign intrinsicality to values or rules of valuation which were actually instrumental.
I should have argued more explicitly that:
Instrumental exponential discounting is conditional, not eternal; it lasts only as long as the exponentially growing opportunity costs which motivate it.
Inappropriate hypotheses of intrinsicality of values can lead to paradoxes, which the corresponding hypotheses of instrumentality may avoid. This is because instrumental values have effect conditionally while intrinsic values have effect unconditionally. Thus, if you observe an apparent paradox during an analysis that assumes intrinsicality, you should put more weight on competing analyses that assume instrumentality, on the theory that you missed a relevant condition which prevents a conflicting value from extending to the paradoxical case.
(My argument was meant to cover non-exponential discounting as well, and show that exponential discounting behavior can be caused by a same mechanism as non-exponential discounting behavior, since I did not specify that market rates of return were constant.)
My comment was also to argue that we are simply confused about the right way to extract utility functions from information about behavior or reported preferences, and therefore that apparent paradoxes do not necessarily mean that the premises are wrong which they appear to mean are wrong.
Any temporal discounting other than temporal is provably inconsistent
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present.
To recap, the idea is that it is the self-similarity property of exponential functions that produces this result—and the exponential function is the only non-linear function with that property.
All other forms of discounting allow for the possibility of preference reversals with the mere passage of time—as discussed here.
This idea has nothing to do with reinforcement learning.
Any temporal discounting other than temporal is provably inconsistent, a point Eliezer makes in his post against temporal discounting.
People are in fact inconsistent, and would like to bind their future selves and future generations. Folk care more about themselves than future generations, but don’t care much more about people 100 generations out than 101 generations out. If current people could, they would commit to a policy that favored the current generation, but was much more long-term focused thereafter.
What information can be derived about utility functions from behavior?
(Here, “information about utility functions” may be understood in your policy-relevant sense, of “factors influencing the course of action that rational expected-utility maximization might surprisingly choose to force upon us after it was too late to decommit.”)
Suppose you observe that some agents, when they are investing, take into account projected market rates of return when trading off gains and losses at different points in time. Here are two hypotheses about the utility functions of those agents.
Hypothesis 1: These agents happened to already have a utility function whose temporal discounting was to match what the market rate of return would be. This is to say: The utility function already assigned particular intrinsic values to hypothetical events in which assets were gained or lost at different times. The ratios between these intrinsic values were already equal to what the appropriate exponential of the integrated market rate of return would later turn out to be.
Hypothesis 2: These agents have a utility function in which assets gained or lost in the near term are valued because of an intrinsic good which could be purchased with those assets at a point in the distant future. These agents evaluate near-term investments and payoffs happening at different times in terms of market rates of return, for understandable and purely instrumental reasons relating to opportunity cost.
Neither hypothesis is quite plausible psychologically or historically, but the second hypothesis is closer to being plausible, and each hypothesis makes the same predictive distribution about the agents’ near-term investment behaviors. This is to say that the “preference likelihood” ratio between the two hypotheses is flat.
(In your apparent policy terms, this would correspond roughly to the idea that, while rational expected-utility maximization may be trying to “choose” which of these two utility functions to define as normative, so that it can then “force” the courses of action dictated by the chosen utility function “upon” the agents, in this case the balance of factors affecting rational expected-utility maximization’s “choice” evens out. Therefore, rational expected-utility maximization’s “decision” will depend on its prior disposition to “prefer” one or the other utility function, for reasons unrelated to observation.)
Now, suppose that the agents from the second hypothesis forecast market rates of return for some period, and then create new agents. These new agents have recognizable internal data structures representing utility functions in a form as per the first hypothesis, and these data structures will be queried to determine the new agents’ decisions about near-term trades. However, the new agents’ only source of information about their utility functions comes from observing their own behavior: they do not have direct introspective access to their internal data structure, and they do not know about the asset conversion event in the future. (However, they will convert their holdings at that time, as a hard-coded instinct; in terms of revealed preference, this can be interpreted as having a utility function that assigns the purchased good infinite relative value). Now, which hypothesis should we say is “really” true of these new agents’ utility functions?
(And how do we delineate what the parts of this situation even are, that supposedly “have” the utility functions we want to inquire about?)
This is a general problem with our present framework for reasoning about utility. The predictions and recommendations from a hypothesized utility function are invariant under various transformations of the hypothesis; in particular, transformations that preserve relative intervals of expected utility between available actions at each juncture. For example, for a perfect expected-utility maximizer, the reward function constructed by a perfectly trained temporal-difference reinforcement learning system motivates exactly the same behavior as the reward function whose integrals the TD learner was trained to predict. (This is quite apart from the problem of invariance under transformations that stretch or squeeze probability and reward simultaneously, such as the transformations that relate different methods of anthropic reasoning.)
As if to add to the confusion, when humans are informed about utility theory, and asked to interpret their introspective information about their preferences in terms of utility, they will report different preferences as being “intrinsic” vs. “instrumental” at different points in time [citation: folk belief]. There may be a psychological process related to temporal-difference reinforcement learning which converts preferences which introspectively appear “instrumental” into preferences which introspectively appear “intrinsic”.
Why were you so certain, in your original draft, that exponential temporal discounting behavior was a matter of intrinsic value rather than instrumental value, so that a normative framework of utilitarian reasoning would force it upon us, and the alternative possibility was not worth mentioning?
Any temporal discounting other than temporal is provably inconsistent, a point Eliezer makes in his post against temporal discounting. Exponential temporal discounting is the default assumption. My post works with the default assumption. Arguing that you can use an alternate method of discounting would require a second post.
When you have a solution that is provably the only self-consistent solution, it’s a drastic measure to say, “I will simply override that with my preferences. I will value irrationality. And I will build a FAI that I am entrusting with the future of the universe, and teach it to be irrational.”
It’s not off the table. But it needs a lot of justification. I’m glad the post has triggered discussion of possible other methods of temporal discounting. But only if it leads to a serious discussion of it, not if it just causes people to say, “Oh, we can get around this problem with a non-exponential discounting”, without realizing all the problems that entails.
The conditions of the proof are applicable only to reinforcement agents which, as a matter of architecture, are forced to integrate anticipated rewards using a fixed weighting function whose time axis is constantly reindexed to be relative to the present. If we could self-modify to relax that architectural constraint—perhaps weighting according to some fixed less temporally indexical schedule, or valuing something other than weighted integrals of reward—would you nonetheless hold that rational consistency would require us to continue to engage in exponential temporal discounting? Whether or not the architectural constraint had previously been a matter of choice? (And who would be the “us” who would thus be required by rational consistency, so that we could extract a normative discount rate from them? Different aspects of a person or civilization exhibit discount functions with different timescales, and our discount functions and architectural constraints can themselves partially be traced to decision-like evolutionary and ecological phenomena in the biosphere, whose “reasoning” we may wish to re-examine.)
(ETA: Maybe I should be less uncharitable about your implied position, since you may not have been aware of the conditions of the proof you cited, or not thought to consider a wider range of agent motivational architectures. But if that was the sort of thing you didn’t know, and it was crucial to your original case, you should have known to state your case in more measured and careful language. If you commit strongly to a hostile conclusion that seems unjustifiable, I unthinkingly respond by exploiting the unjustifiability and strength of commitment to make the hostile conclusion look bad, using lines of modus tollens reasoning that wouldn’t be able to rhetorically connect if your commitment had been weaker.)
To my current thinking, preferences would be one form of information about desirability of events, and any information about desirability of events would be timeless—even if the events that were desirable were within time, and even if the information about their desirability must have been acquired within time. There’s no direct reason why questions of “when you learned about the desirability” or “when you had to act on the desirability” should enter into it.
Perhaps I should have left out the distraction of the term “exponential”, and asked: “Why were you so certain that temporal discounting in behavior was a matter of intrinsic value rather than instrumental value?” In part my comment was to argue that:
discounting behavior can be generated for instrumental reasons;
we may reach different conclusions as to whether discounting behavior is a matter of intrinsic or instrumental value, depending on the level of analysis at which we identify agency (and/or instrumental agency);
there are reasons to expect that, in interpreting utility functions from preference claims, we may easily become confused and inappropriately assign intrinsicality to values or rules of valuation which were actually instrumental.
I should have argued more explicitly that:
Instrumental exponential discounting is conditional, not eternal; it lasts only as long as the exponentially growing opportunity costs which motivate it.
Inappropriate hypotheses of intrinsicality of values can lead to paradoxes, which the corresponding hypotheses of instrumentality may avoid. This is because instrumental values have effect conditionally while intrinsic values have effect unconditionally. Thus, if you observe an apparent paradox during an analysis that assumes intrinsicality, you should put more weight on competing analyses that assume instrumentality, on the theory that you missed a relevant condition which prevents a conflicting value from extending to the paradoxical case.
(My argument was meant to cover non-exponential discounting as well, and show that exponential discounting behavior can be caused by a same mechanism as non-exponential discounting behavior, since I did not specify that market rates of return were constant.)
My comment was also to argue that we are simply confused about the right way to extract utility functions from information about behavior or reported preferences, and therefore that apparent paradoxes do not necessarily mean that the premises are wrong which they appear to mean are wrong.
To recap, the idea is that it is the self-similarity property of exponential functions that produces this result—and the exponential function is the only non-linear function with that property.
All other forms of discounting allow for the possibility of preference reversals with the mere passage of time—as discussed here.
This idea has nothing to do with reinforcement learning.
People are in fact inconsistent, and would like to bind their future selves and future generations. Folk care more about themselves than future generations, but don’t care much more about people 100 generations out than 101 generations out. If current people could, they would commit to a policy that favored the current generation, but was much more long-term focused thereafter.
You meant to say: “other than exponential”.