This seems generally right, but ignores a consideration that I think often gets ignored, so I’ll flag it here.
We probably want to respect preferences. (At least as a way to allow or improve cooperation.)
In this setup, the idea that we might want to lock-in the values of ourselves to prevent our future selves from having different preferences seems fine if we actually view ourselves as a single agent, but once we accept the idea that we’re talking about distinct agents, it looks a lot like brainwashing someone to agree with us. And yes, that is sometimes narrowly beneficial, if we ignore the costs of games where we need to worry that others will attempt to do the same.
So I think we need to be clear: altruism is usually best accomplished by helping people improve what they care about, not what we care about for them. We don’t get to prevent access to birth control to save others from being sinful, since that isn’t what they want. And similarly, we don’t get to call technological accelerationism at the cost of people’s actual desires altruistic, just because we think we know better what they should want. Distributional consequences matter, as does the ability to work together. And we’ll be much better able to cooperate with ourselves and with others if we decide that respecting preferences is a generally important default behavior of our decision theory.
I generally agree that a creature with inconsistent preferences should respect the values of its predecessors and successors in the same kind of way that it respects the values of other agents (and that the similarity somewhat increases the strength of that argument). It’s a subtle issue, especially when we are considering possible future versions of ourselves with different preferences (just as its always subtle how much to respect the preferences of future creatures who may not exist based on our actions). I lean towards being generous about the kinds of value drift that have occurred over the previous millennia (based on some kind of “we could have been in their place” reasoning) while remaining cautious about sufficiently novel kinds of changes in values.
In the particular case of the inconsistencies highlighted by transparent Newcomb, I think that it’s unusually clear that you want to avoid your values changing—because your current values are a reasonable compromise amongst the different possible future versions of yourself, and maintaining those values is a way to implement important win-win trades across those versions.
In the particular case of the inconsistencies highlighted by transparent Newcomb, I think that it’s unusually clear that you want to avoid your values changing—because your current values are a reasonable compromise amongst the different possible future versions of yourself, and maintaining those values is a way to implement important win-win trades across those versions.
I slightly disagree with this. In cases where there are win-win trades, different future versions of yourself are probably similar enough that they can get these win-win trades via correlated decision-making. (If they follow EDT.)
If you stop your values from changing, I think the main additional benefit you get is that you (i) change which of your future selves are more or less likely to exist in the first place (which it’s not obvious that they themselves will care about; c.f. my other comment), and (ii) impose one-way utility transfers from versions of you who have good helping opportunities to versions of yourselves who have good being-helped opportunities, according to your own view about how you want to do interpersonal utility comparisons between your future selves (which will predictably benefit some of them and harm some other of them). [1]
Overall this still seems fine and good to me. But I think win-win trades are a small fraction of the benefits.
Or maybe this is also just about changing which future versions of yourselves exist, since any difference in your present actions will arguably lead to somewhat different memories in future versions of yourself.
This seems generally right, but ignores a consideration that I think often gets ignored, so I’ll flag it here.
We probably want to respect preferences. (At least as a way to allow or improve cooperation.)
In this setup, the idea that we might want to lock-in the values of ourselves to prevent our future selves from having different preferences seems fine if we actually view ourselves as a single agent, but once we accept the idea that we’re talking about distinct agents, it looks a lot like brainwashing someone to agree with us. And yes, that is sometimes narrowly beneficial, if we ignore the costs of games where we need to worry that others will attempt to do the same.
So I think we need to be clear: altruism is usually best accomplished by helping people improve what they care about, not what we care about for them. We don’t get to prevent access to birth control to save others from being sinful, since that isn’t what they want. And similarly, we don’t get to call technological accelerationism at the cost of people’s actual desires altruistic, just because we think we know better what they should want. Distributional consequences matter, as does the ability to work together. And we’ll be much better able to cooperate with ourselves and with others if we decide that respecting preferences is a generally important default behavior of our decision theory.
I generally agree that a creature with inconsistent preferences should respect the values of its predecessors and successors in the same kind of way that it respects the values of other agents (and that the similarity somewhat increases the strength of that argument). It’s a subtle issue, especially when we are considering possible future versions of ourselves with different preferences (just as its always subtle how much to respect the preferences of future creatures who may not exist based on our actions). I lean towards being generous about the kinds of value drift that have occurred over the previous millennia (based on some kind of “we could have been in their place” reasoning) while remaining cautious about sufficiently novel kinds of changes in values.
In the particular case of the inconsistencies highlighted by transparent Newcomb, I think that it’s unusually clear that you want to avoid your values changing—because your current values are a reasonable compromise amongst the different possible future versions of yourself, and maintaining those values is a way to implement important win-win trades across those versions.
I slightly disagree with this. In cases where there are win-win trades, different future versions of yourself are probably similar enough that they can get these win-win trades via correlated decision-making. (If they follow EDT.)
If you stop your values from changing, I think the main additional benefit you get is that you (i) change which of your future selves are more or less likely to exist in the first place (which it’s not obvious that they themselves will care about; c.f. my other comment), and (ii) impose one-way utility transfers from versions of you who have good helping opportunities to versions of yourselves who have good being-helped opportunities, according to your own view about how you want to do interpersonal utility comparisons between your future selves (which will predictably benefit some of them and harm some other of them). [1]
Overall this still seems fine and good to me. But I think win-win trades are a small fraction of the benefits.
Or maybe this is also just about changing which future versions of yourselves exist, since any difference in your present actions will arguably lead to somewhat different memories in future versions of yourself.
Agreed!