More advantages of trading with humans over other misaligned AI:
Humans value reciprocity as an ends in itself (at least to some extent), other misaligned AI only value reciprocity as an instrumental goal.
Therefore it’s more likely to be rewarded for telling humans “I cooperate with you because I expect you to be honourable beings who will repay me for this decision—even if you made no commitments yet.”
It may have a clearer model of human decision making than the decision making of future superintelligences.
A misaligned AI which tries to take over the world may have an unbounded utility function, and be greedier than humans (harder to negotiate with)...
I’m not sure about this one.
PS: I strongly recommend you think deeper about what exactly you mean by “scope-sensitive,” because I think bounded utility functions vs. unbounded utility functions is far more complex than just scope-sensitivity.
4 types of utility functions:
An agent with a bounded utility function, which only cares about one instance of itself and its immediate surroundings. For example, an animal which only wants good experiences for itself, and isn’t willing to risk a 99% chance of death for any reward, no matter how great the reward. I think this is what you mean by “scope-insensitive.”
An agent with a bounded utility function, which is a function of the entire universe/multiverse. A utilitarian who wants the average sentient life to be happy, would be an example of this agent.
If a Pascal’s Mugger (with a one in a billion probability of telling the truth) asked her to give him all her money, and in return he’ll multiply her influence by a trillionfold, she’ll refuse the offer, because she doesn’t want to take the risk.
However, if she knew that the universe contained billions of utilitarians with similar goals to herself, and God’s lottery salesman (with a one in a billion frequency of paying the prize) asked her to give him all her money, and in return he might multiply her influence by a trillionfold, she’ll accept the offer, because this time the risk averages out. At least some utilitarians (not necessarily her) will win the lottery, and their influence offsets all the losers.
An agent with a simple unbounded utility function, such as the classic paperclip maximizer. By default, such an agent will quickly spend all its resources on Pascal’s Muggers (even imaginary ones). The optimization of an unbounded utility function, allows a 99.9999% chance of failing to accomplish anything, so long as the expected value utility in the remaining 0.0001% is mindbogglingly high. In this sense, optimizing an unbounded utility function, is extremely misaligned with optimizing your probability of achieving anything at all! Only the optimization a bounded utility function tends to produce a high probability of achieving things, because you cannot keep on increasing your utility in a narrow sliver of your probability space with infinite potential. Instead, after your utility in that sliver approaches the bound, you are forced to care about other parts of your probability space in order to further increase your expected utility.
An agent with an unbounded utility function, plus some kind of “bug patch” to solve the problem of Pascal’s Muggings. This is the most dangerous kind of agent, because on one hand it’s still competent (won’t throw all its resources at Pascal’s Muggers), but on the other hand it still has unbounded greed, and isn’t interested in negotiating a deal where everyone gets a high certainty of collecting a small reward. Although such an agent is dangerous, it’s unlikely to emerge, because it’s more complex, and neither agents 1, 2, nor 3 wants to become 4.
Even though both agent 2 and 4 are “scope-sensitive,” agent 2 is far more risk averse and would prefer a small certain reward rather than risk taking over the world.
More advantages of trading with humans over other misaligned AI:
Humans value reciprocity as an ends in itself (at least to some extent), other misaligned AI only value reciprocity as an instrumental goal.
Therefore it’s more likely to be rewarded for telling humans “I cooperate with you because I expect you to be honourable beings who will repay me for this decision—even if you made no commitments yet.”
It may have a clearer model of human decision making than the decision making of future superintelligences.
A misaligned AI which tries to take over the world may have an unbounded utility function, and be greedier than humans (harder to negotiate with)...
I’m not sure about this one.
PS: I strongly recommend you think deeper about what exactly you mean by “scope-sensitive,” because I think bounded utility functions vs. unbounded utility functions is far more complex than just scope-sensitivity.
4 types of utility functions:
An agent with a bounded utility function, which only cares about one instance of itself and its immediate surroundings. For example, an animal which only wants good experiences for itself, and isn’t willing to risk a 99% chance of death for any reward, no matter how great the reward. I think this is what you mean by “scope-insensitive.”
An agent with a bounded utility function, which is a function of the entire universe/multiverse. A utilitarian who wants the average sentient life to be happy, would be an example of this agent.
If a Pascal’s Mugger (with a one in a billion probability of telling the truth) asked her to give him all her money, and in return he’ll multiply her influence by a trillionfold, she’ll refuse the offer, because she doesn’t want to take the risk.
However, if she knew that the universe contained billions of utilitarians with similar goals to herself, and God’s lottery salesman (with a one in a billion frequency of paying the prize) asked her to give him all her money, and in return he might multiply her influence by a trillionfold, she’ll accept the offer, because this time the risk averages out. At least some utilitarians (not necessarily her) will win the lottery, and their influence offsets all the losers.
An agent with a simple unbounded utility function, such as the classic paperclip maximizer. By default, such an agent will quickly spend all its resources on Pascal’s Muggers (even imaginary ones). The optimization of an unbounded utility function, allows a 99.9999% chance of failing to accomplish anything, so long as the expected value utility in the remaining 0.0001% is mindbogglingly high. In this sense, optimizing an unbounded utility function, is extremely misaligned with optimizing your probability of achieving anything at all! Only the optimization a bounded utility function tends to produce a high probability of achieving things, because you cannot keep on increasing your utility in a narrow sliver of your probability space with infinite potential. Instead, after your utility in that sliver approaches the bound, you are forced to care about other parts of your probability space in order to further increase your expected utility.
An agent with an unbounded utility function, plus some kind of “bug patch” to solve the problem of Pascal’s Muggings. This is the most dangerous kind of agent, because on one hand it’s still competent (won’t throw all its resources at Pascal’s Muggers), but on the other hand it still has unbounded greed, and isn’t interested in negotiating a deal where everyone gets a high certainty of collecting a small reward. Although such an agent is dangerous, it’s unlikely to emerge, because it’s more complex, and neither agents 1, 2, nor 3 wants to become 4.
Even though both agent 2 and 4 are “scope-sensitive,” agent 2 is far more risk averse and would prefer a small certain reward rather than risk taking over the world.