It seems essential to the idea of “a coherent direction for steering the world” or “preferences” that the ordering between choices does not depend on what choices are actually available. But in standard cooperative multi-agent decision procedures, the ordering does depend on the set of choices available. How to make sense of this? Does it mean that a group of more than one agent can’t be said to have a coherent direction for steering the world? What is it that they do have then? And if a human should be viewed as a group of sub-agents representing different values and/or moral theories, does it mean a human also doesn’t have such a coherent direction?
Does it mean that a group of more than one agent can’t be said to have a coherent direction for steering the world?
That’s indeed my current intuition. Suppose that there is a paperclip maximizer and a staples maximizer, and the paperclip maximizer has sole control over all that happens in the universe, and the two have a common prior which assigns near-certainty to this being the case. Then I expect the universe to be filled with paperclips. But if Staples has control, I expect the universe to be tiled with staples.
On the other hand (stealing your example, but let’s make it about a physical coinflip, to hopefully make it noncontroversial): If both priors assign 50% probability to “Clippy has control and the universe can support 10^10 paperclips or 10^20 staples” and 50% probability to “Staples has control and the universe can support 10^10 staples or 10^20 paperclips”, and it turns out that in fact the first of these is true, then I expect Clippy to tile the universe with staples.
I disagree with Stuart’s post arguing that this means that Nash’s bargaining solution (NBS) can’t be correct, because it is dynamically inconsistent, as it gives a different solution after Clippy updates on the information that it has sole control. I think this is simply a counterfactual mugging: Clippy’s payoff in the possible world where Staples has control depends on Clippy’s cooperation in the world where Clippy has control. The usual solution to counterfactual muggings is to simply optimize expected utility relative to your prior, so the obvious thing to do would be to apply NBS to your prior distribution, giving you dynamic consistency.
That said, I’m not saying that I’m sure NBS is in fact the right solution. My current intuition is that there should be some way to formalize the “bargaining power” of each agent, and when holding the bargaining powers fixed, a group of agents should be steering the world in a coherent direction. This suggests that the right formalization of “bargaining power” would give a nonnegative scaling factor to each member of the group, and the group will act to maximize the sum of the agents’ expected utilities weighed by their respective scaling factors. (As in Stuart’s post, the scaling factors will of course not be invariant under affine transformations applied to the agents’ utility functions—if you multiply an agent’s utility function by x, you will need to divide their scaling factor by x in order to compensate.)
Of course, at this point this is merely an intuition, and I do not have a worked-out proposal nor a careful justification.
And if a human should be viewed as a group of sub-agents representing different values and/or moral theories, does it mean a human also doesn’t have such a coherent direction?
I have to say that this approach does not make much sense to me in the first place, and I’m tempted to take your question as a modus tollens argument against that approach. Maybe it would be useful to have a more detailed discussion about this, but in short, I think aspiring rationalist humans should see it as their responsibility to actually choose one direction in which they want to steer the world, rather than specifying conflicting goals and then asking for some formula that will decide for them how to trade these goals against each other. If you choose to trade off different goals by weighing them with different factors, fine; but if you try to find some ‘laws of rationality’ that will tell you the one correct way to trade off these goals, without ever needing to make a decision about this yourself, I think you’re trying to pass off a responsibility that is properly yours.
I think aspiring rationalist humans should see it as their responsibility to actually choose one direction in which they want to steer the world, rather than specifying conflicting goals and then asking for some formula that will decide for them how to trade these goals against each other. If you choose to trade off different goals by weighing them with different factors, fine; but if you try to find some ‘laws of rationality’ that will tell you the one correct way to trade off these goals, without ever needing to make a decision about this yourself, I think you’re trying to pass off a responsibility that is properly yours.
Why so much emphasis on “responsibility”? In my mind, I have a responsibility to fulfill any promises I make to others and … and that’s about it. As for figuring out what my preferences are, or should be, I’m going to try any promising approaches I can find, and see if one of them works out. Thinking of myself as a bunch of sub-agents and using ideas from bargaining theory is one such an approach. Trying to solve normative ethics using the methods of moral philosophers may be another. When you say “see it as their responsibility to actually choose one direction in which they want to steer the world”, what does that mean, in terms of an approach I can explore?
ETA: I wrote a post that may help explain what I meant here.
This suggests that the right formalization of “bargaining power” would give a nonnegative scaling factor to each member of the group, and the group will act to maximize the sum of the agents’ expected utilities weighed by their respective scaling factors. … Of course, at this point this is merely an intuition, and I do not have a worked-out proposal nor a careful justification.
There is a justification for that intuition. Some have objected to the axiom that the aggregation must also be VNM-rational, but Nisan has proved a similar theorem that does not rely on the VNM-rationality of the collective as an axiom.
It seems essential to the idea of “a coherent direction for steering the world” or “preferences” that the ordering between choices does not depend on what choices are actually available. But in standard cooperative multi-agent decision procedures, the ordering does depend on the set of choices available. How to make sense of this? Does it mean that a group of more than one agent can’t be said to have a coherent direction for steering the world? What is it that they do have then? And if a human should be viewed as a group of sub-agents representing different values and/or moral theories, does it mean a human also doesn’t have such a coherent direction?
That’s indeed my current intuition. Suppose that there is a paperclip maximizer and a staples maximizer, and the paperclip maximizer has sole control over all that happens in the universe, and the two have a common prior which assigns near-certainty to this being the case. Then I expect the universe to be filled with paperclips. But if Staples has control, I expect the universe to be tiled with staples.
On the other hand (stealing your example, but let’s make it about a physical coinflip, to hopefully make it noncontroversial): If both priors assign 50% probability to “Clippy has control and the universe can support 10^10 paperclips or 10^20 staples” and 50% probability to “Staples has control and the universe can support 10^10 staples or 10^20 paperclips”, and it turns out that in fact the first of these is true, then I expect Clippy to tile the universe with staples.
I disagree with Stuart’s post arguing that this means that Nash’s bargaining solution (NBS) can’t be correct, because it is dynamically inconsistent, as it gives a different solution after Clippy updates on the information that it has sole control. I think this is simply a counterfactual mugging: Clippy’s payoff in the possible world where Staples has control depends on Clippy’s cooperation in the world where Clippy has control. The usual solution to counterfactual muggings is to simply optimize expected utility relative to your prior, so the obvious thing to do would be to apply NBS to your prior distribution, giving you dynamic consistency.
That said, I’m not saying that I’m sure NBS is in fact the right solution. My current intuition is that there should be some way to formalize the “bargaining power” of each agent, and when holding the bargaining powers fixed, a group of agents should be steering the world in a coherent direction. This suggests that the right formalization of “bargaining power” would give a nonnegative scaling factor to each member of the group, and the group will act to maximize the sum of the agents’ expected utilities weighed by their respective scaling factors. (As in Stuart’s post, the scaling factors will of course not be invariant under affine transformations applied to the agents’ utility functions—if you multiply an agent’s utility function by x, you will need to divide their scaling factor by x in order to compensate.)
Of course, at this point this is merely an intuition, and I do not have a worked-out proposal nor a careful justification.
I have to say that this approach does not make much sense to me in the first place, and I’m tempted to take your question as a modus tollens argument against that approach. Maybe it would be useful to have a more detailed discussion about this, but in short, I think aspiring rationalist humans should see it as their responsibility to actually choose one direction in which they want to steer the world, rather than specifying conflicting goals and then asking for some formula that will decide for them how to trade these goals against each other. If you choose to trade off different goals by weighing them with different factors, fine; but if you try to find some ‘laws of rationality’ that will tell you the one correct way to trade off these goals, without ever needing to make a decision about this yourself, I think you’re trying to pass off a responsibility that is properly yours.
Why so much emphasis on “responsibility”? In my mind, I have a responsibility to fulfill any promises I make to others and … and that’s about it. As for figuring out what my preferences are, or should be, I’m going to try any promising approaches I can find, and see if one of them works out. Thinking of myself as a bunch of sub-agents and using ideas from bargaining theory is one such an approach. Trying to solve normative ethics using the methods of moral philosophers may be another. When you say “see it as their responsibility to actually choose one direction in which they want to steer the world”, what does that mean, in terms of an approach I can explore?
ETA: I wrote a post that may help explain what I meant here.
There is a justification for that intuition. Some have objected to the axiom that the aggregation must also be VNM-rational, but Nisan has proved a similar theorem that does not rely on the VNM-rationality of the collective as an axiom.