This seems generally right, but ignores a consideration that I think often gets ignored, so I’ll flag it here.
We probably want to respect preferences. (At least as a way to allow or improve cooperation.)
In this setup, the idea that we might want to lock-in the values of ourselves to prevent our future selves from having different preferences seems fine if we actually view ourselves as a single agent, but once we accept the idea that we’re talking about distinct agents, it looks a lot like brainwashing someone to agree with us. And yes, that is sometimes narrowly beneficial, if we ignore the costs of games where we need to worry that others will attempt to do the same.
So I think we need to be clear: altruism is usually best accomplished by helping people improve what they care about, not what we care about for them. We don’t get to prevent access to birth control to save others from being sinful, since that isn’t what they want. And similarly, we don’t get to call technological accelerationism at the cost of people’s actual desires altruistic, just because we think we know better what they should want. Distributional consequences matter, as does the ability to work together. And we’ll be much better able to cooperate with ourselves and with others if we decide that respecting preferences is a generally important default behavior of our decision theory.
I generally agree that a creature with inconsistent preferences should respect the values of its predecessors and successors in the same kind of way that it respects the values of other agents (and that the similarity somewhat increases the strength of that argument). It’s a subtle issue, especially when we are considering possible future versions of ourselves with different preferences (just as its always subtle how much to respect the preferences of future creatures who may not exist based on our actions). I lean towards being generous about the kinds of value drift that have occurred over the previous millennia (based on some kind of “we could have been in their place” reasoning) while remaining cautious about sufficiently novel kinds of changes in values.
In the particular case of the inconsistencies highlighted by transparent Newcomb, I think that it’s unusually clear that you want to avoid your values changing—because your current values are a reasonable compromise amongst the different possible future versions of yourself, and maintaining those values is a way to implement important win-win trades across those versions.
This seems generally right, but ignores a consideration that I think often gets ignored, so I’ll flag it here.
We probably want to respect preferences. (At least as a way to allow or improve cooperation.)
In this setup, the idea that we might want to lock-in the values of ourselves to prevent our future selves from having different preferences seems fine if we actually view ourselves as a single agent, but once we accept the idea that we’re talking about distinct agents, it looks a lot like brainwashing someone to agree with us. And yes, that is sometimes narrowly beneficial, if we ignore the costs of games where we need to worry that others will attempt to do the same.
So I think we need to be clear: altruism is usually best accomplished by helping people improve what they care about, not what we care about for them. We don’t get to prevent access to birth control to save others from being sinful, since that isn’t what they want. And similarly, we don’t get to call technological accelerationism at the cost of people’s actual desires altruistic, just because we think we know better what they should want. Distributional consequences matter, as does the ability to work together. And we’ll be much better able to cooperate with ourselves and with others if we decide that respecting preferences is a generally important default behavior of our decision theory.
I generally agree that a creature with inconsistent preferences should respect the values of its predecessors and successors in the same kind of way that it respects the values of other agents (and that the similarity somewhat increases the strength of that argument). It’s a subtle issue, especially when we are considering possible future versions of ourselves with different preferences (just as its always subtle how much to respect the preferences of future creatures who may not exist based on our actions). I lean towards being generous about the kinds of value drift that have occurred over the previous millennia (based on some kind of “we could have been in their place” reasoning) while remaining cautious about sufficiently novel kinds of changes in values.
In the particular case of the inconsistencies highlighted by transparent Newcomb, I think that it’s unusually clear that you want to avoid your values changing—because your current values are a reasonable compromise amongst the different possible future versions of yourself, and maintaining those values is a way to implement important win-win trades across those versions.
Agreed!