Since AIs can self-modify, one nice property would be if no player can obtain a better outcome for itself by modifying its utility function ahead of time. In this case, if I take the pure outcome that most favors me, and change my utility function so that I value it less, I can increase my bargaining power and possibly get a better outcome under your proposed solution. Right?
It’s like with unfair coins: there are unfair utility functions as well. ;-) You’d have to look at what the utility could be, as with the prior probability of Omega’s coin. All the AI can do is adapt its utility to environment, while there is also a “prior” utility, that describes how it looks at all possible environments, how it chose this particular utility for this particular environment.
Yes. But I don’t see this as a very grave objection, because equilibria in non-zero-sum games are typically vulnerable to manipulation by self-handicapping. A great book on this topic is “Strategy of Conflict” by Thomas Schelling, can’t recommend it enough.
Of course, if I invented a system that was resistant to manipulation, there’d be much rejoicing. But this goal seems still far away if at all possible.
Of course, if I invented a system that was resistant to manipulation, there’d be much rejoicing. But this goal seems still far away if at all possible.
It’s not hard to fix a system so that it is resistant to manipulation: given a set of players, compute how the players would want to modify their utility functions under the original system, then use the original system on the modified utility functions to pick an outcome, then make that the solution under the new system. In the new system, players no longer has any incentive to modify their utility functions.
Of course if you do this, the new system might lose some desirable properties of the original system. But since you’re assuming that the players are self-modifying AIs, that just means your system never really had those properties to begin with.
How a player would want to modify their utility function depends on how other players modify theirs. Schelling introduces “strategic moves” in games: selectively reducing your payoffs under some outcomes. (This is a formalization of unilateral commitments, e.g. burning your bridges.) A simultaneous-move game of strategic moves derived from any simple game quickly becomes a pretty tricky beast. I’ll have to look at it closer; thanks for reminding.
Since AIs can self-modify, one nice property would be if no player can obtain a better outcome for itself by modifying its utility function ahead of time. In this case, if I take the pure outcome that most favors me, and change my utility function so that I value it less, I can increase my bargaining power and possibly get a better outcome under your proposed solution. Right?
It’s like with unfair coins: there are unfair utility functions as well. ;-) You’d have to look at what the utility could be, as with the prior probability of Omega’s coin. All the AI can do is adapt its utility to environment, while there is also a “prior” utility, that describes how it looks at all possible environments, how it chose this particular utility for this particular environment.
Yes. But I don’t see this as a very grave objection, because equilibria in non-zero-sum games are typically vulnerable to manipulation by self-handicapping. A great book on this topic is “Strategy of Conflict” by Thomas Schelling, can’t recommend it enough.
Of course, if I invented a system that was resistant to manipulation, there’d be much rejoicing. But this goal seems still far away if at all possible.
It’s not hard to fix a system so that it is resistant to manipulation: given a set of players, compute how the players would want to modify their utility functions under the original system, then use the original system on the modified utility functions to pick an outcome, then make that the solution under the new system. In the new system, players no longer has any incentive to modify their utility functions.
Of course if you do this, the new system might lose some desirable properties of the original system. But since you’re assuming that the players are self-modifying AIs, that just means your system never really had those properties to begin with.
Interesting.
How a player would want to modify their utility function depends on how other players modify theirs. Schelling introduces “strategic moves” in games: selectively reducing your payoffs under some outcomes. (This is a formalization of unilateral commitments, e.g. burning your bridges.) A simultaneous-move game of strategic moves derived from any simple game quickly becomes a pretty tricky beast. I’ll have to look at it closer; thanks for reminding.