Fairness and Geometry
This post was prompted by Vladimir Nesov’s comments, Wei Dai’s intro to cooperative games and Eliezer’s decision theory problems. Prerequisite: Re-formalizing PD.
Some people here have expressed interest in how AIs that know each other’s source code should play asymmetrical games, e.g. slightly asymmetrized PD. The problem is twofold: somehow assign everyone a strategy so that the overall outcome is “good and fair”, then somehow force everyone to play the assigned strategies.
For now let’s handwave around the second problem thus: AIs that have access to each other’s code and common random bits can enforce any correlated play by using the quining trick from Re-formalizing PD. If they all agree beforehand that a certain outcome is “good and fair”, the trick allows them to “mutually precommit” to this outcome without at all constraining their ability to aggressively play against those who didn’t precommit. This leaves us with the problem of fairness.
(Get ready, math ahead. It sounds massive, but is actually pretty obvious.)
Pure strategy plays of an N-player game are points in the N-dimensional space of utilities. Correlated plays form the convex hull of this point set, an N-polytope. Pareto-optimal outcomes are points on the polytope’s surface where the outward normal vector has all positive components. I want to somehow assign each player a “bargaining power” (by analogy with Nash bargaining solutions); collectively they will determine the slope of a hyperplane that touches the Pareto-optimal surface at a single point which we will dub “fair”. Utilities of different players are classically treated as incomparable, like metres to kilograms, i.e. having different dimensionality; thus we’d like the “fair point” to be invariant under affine recalibrations of utility scales. Coefficients of tangent hyperplanes transform as covectors under such recalibrations; components of a covector should have dimensionality inverse to components of a vector for the application operation to make sense; thus the bargaining power of each player must have dimensionality 1/utility of that player.
(Whew! It’ll get easier from now.)
A little mental visualization involving a sphere and a plane confirms that when a player stretches their utility scale 2x, stretching the sphere along one of the coordinate axes, the player’s power (the coefficient of that coordinate in the tangent hyperplane equation) must indeed go down 2x to keep the fair point from moving. Incidentally, this means that we cannot somehow assign each player “equal power” in a way that’s consistent under recalibration.
Now, there are many ways to process an N-polytope and obtain N values, dimensioned as 1/coordinate each. A natural way would be to take the inverse measure of the polytope’s projection onto each coordinate axis, but this approach fails because irrelevant alternatives can skew the result wildly. A better idea would be taking the inverse measures of projections of just the Pareto-optimal surface region onto the coordinate axes; this decision passes the smoke test of bargaining games, so it might be reasonable.
To reiterate the hypothesis: assign each player the amount of bargaining power inversely proportional to the range of their gains possible under Pareto-optimal outcomes. Then pick the point on the polytope’s surface that touches a hyperplane with those bargaining powers for coefficients, and call this point “fair”.
(NB: this idea doesn’t solve cases where the hyperplane touches the polytope at more than one point, e.g. risk-neutral division of the dollar. Some more refined fairness concept is required for those.)
At this point I must admit that I don’t possess a neat little list of “fairness properties” that would make my solution unique and inevitable, Shapley value style. It just… sounds natural. It’s an equilibrium, it’s symmetric, it’s invariant under recalibrations, it often gives a unique answer, it solves asymmetrized PD just fine, and the True PD, and other little games I’ve tried it on, and something like it might someday solve the general problem outlined at the start of the post; but then again, we’ve tossed out quite a lot of information along the way. For example, we didn’t use the row/column structure of strategies at all.
What should be the next step in this direction?
Can we solve fairness?
EDIT: thanks to Wei Dai for the next step! Now I know that any “purely geometric” construction that looks only at the Pareto set will fail to incentivize players to adopt it. The reason: we can, without changing the Pareto set, give any player an additional non-Pareto-optimal strategy that always assigns them higher utility than my proposed solution, thus making them want to defect. Pretty conclusive! So much for this line of inquiry, I guess.
- Explicit Optimization of Global Strategy (Fixing a Bug in UDT1) by 19 Feb 2010 1:30 UTC; 55 points) (
- An Alternative Approach to AI Cooperation by 31 Jul 2009 12:14 UTC; 15 points) (
- 2 Feb 2024 0:36 UTC; 3 points) 's comment on Incorporating Mechanism Design Into Decision Theory by (
Cousin_it, I don’t know if you’re still thinking about this problem, but I found a paper that takes an approach that is very similar to the one in this post. It’s Twofold optimality of the relative utilitarian bargaining solution.
See also this review paper which I found by looking for citations of the above paper.
Thanks!
I read that paper. I must confess I am impressed. It must have taken weeks to express the concepts in a way that is so difficult to understand. I’m sure that part in the middle was an except from the spell ‘Melf’s Minute Meteors’.
Thanks for the links, I’m reading through the review paper now.
Your solution is highly unstable under tiny changes to the bargaining set. Consider a two-player setting, with the Pareto frontier of convex but almost-flat form. The bargaining power of the players won’t depend on the tiny changes in the form of the frontier, but the point that meets the plane may vary from the maximum utility to minimum utility for each of the players depending on such changes.
As an example of a solution stable in this sense, you can, for example, rescale the utilities so that the Pareto frontier is inscribed in the unit cube, and select a play that lies on the cube’s main diagonal.
I abandoned this approach for now, instead I’m working an a more clear model of agents as processes, with the goal of finding a way of assigning preferences to the processes, so that the solution for cooperation should appear as just preference corresponding to the process composed of the participating agents, depending on their preferences.
Yes, stability is the big issue. My solution is almost certainly wrong, although instructive :-) The unit cube solution also seems unstable, but in a different sense: a tiny change in payoffs can grow the Pareto frontier a lot. At this point I’m wondering whether there is any continuous solution.
I tend to agree with Eliezer that this is not really about fairness, but insofar as we’re playing the “what’s fair?” game...
Proclaiming incomparability has always struck me as elevating a practical problem (it’s difficult to compare utilities) to the level of a conceptual problem (it’s impossible to compare utilities). At a practical level, we compare utilities all the time. To take a somewhat extreme example, it seems pretty obvious that a speck of dust in Adam’s eye is less bad than Eve being tortured.
The implication of this is that I actively do not want the fair point to be invariant to affine tranformations of the utility scales. If one person is getting much more utility than someone else, that is relevant information to me and I do not want it thrown away.
NB: In the event that I did think that utility was incomparable in the way “classically” assumed, then wouldn’t the solution need to be invariant to monotone transformations of the utility function? Why should affine invariance suffice?
Non-affine transformations break expected utility of lotteries over outcomes.
Ah. I was thinking of utility-as-a-thing-in-the-world (e.g. a pleasurable mental state) rather than utility-as-representation-of-preferences-over-gambles. (The latter would not be my preferred informational base for determining a fair outcome.)
The point was that applying positive affine transformation to utility doesn’t change preference, and so shouldn’t change the fair decision. Fairness is the way of comparing utility.
I get that (although my NB assumed that we were talking about preferences over certain outcomes rather than lotteries). My point is that this doesn’t follow, because fairness can depend on things that may not affect preferences—like the fact that one player is already incredibly well off.
Utility function isn’t even real, it’s a point in an equivalence class, and you only see the equivalence class. The choice of a particular point should affect the decisions no more than the epiphenomenal consciousness of Searle should affect how the meathead Searle writes his consciousness papers, or than the hidden abolute time should affect the timeless dynamic. State of the world is a different matter entirely. Only if for some reason your preferences include a term about specific form of utility function that is engraved on your mind, should this arbitrary factor matter (but then it won’t be exactly about the utility function).
I’m not sure I understand your final sentence, but I suspect we may just be using different senses of the word utility function. Insofar as I do understand you, I agree with you for utility-functions-defined-as-representations-of-preferences. It’s just that I would take utility-functions-defined-in-terms-of-well-being as the relevant informational base for any discussion of fairness. Preferences are not my primitives in this respect.
Let’s consider another agent with which you consider cooperating as an instrumental installation, not valued in itself, but only as a means of achieving your goals that lie elsewhere. Of such agent, you’re only interested in behavior. Preference is a specification of behavior, saying what the agent does in each given state of knowledge (under a simplifying assumption that the optimal action is always selected). How this preference is represented in that agent’s mind is irrelevant as it doesn’t influence its behavior, and so can’t matter for how you select a cooperative play with this agent.
Agreed. Which I think brings us back to it not really being about fairness.
In other words, conchis is taking a welfarist perspective on fairness, instead of a game theoretic one. (I’d like to once again recommend Hervé Moulin’s Fair Division and Collective Welfare which covers both of these approaches.)
In this case, the agents are self-modifying AIs. How do we measure and compare the well-being of such creatures? Do you have ideas or suggestions?
None, I’m afraid. I’m not even sure whether I’d care about their well-being even if I could conceive of what that would mean. (Maybe I would; I just don’t know.)
The equivalence class of the utility function should be the set of monotonous function of a canonical element.
However, what von Neumann-Morgenstern shows under mild assumptions is that for each class of utility functions, there is a subset of utility functions generated by the affine transforms of a single canonical element for which you can make decisions by computing expected utility. Therefore, looking at the set of all affine transforms of such an utility function really is the same as looking at the whole class. Still, it doesn’t make utility commensurable.
A speck in Adam’s eye vs Eve being tortured is not a utility comparison but a happiness comparison. Happiness is hard to compare but can be compared because it is a state, utility is an ordering function. There is no utility meter.
That’s not how I see the PD at all; each agent is only interested in maximizing individually, but they have common knowledge of each other’s source code and hence realize that their actions will be in some sense correlated; informally, Hofstadterian superrationality is a fine way of looking at it. The problem is how this extends to asymmetrical problems in which the Nash equilibrium is not a Pareto optimum.
There is no global “good and fair” pure external Archimedean viewpoint. Just a flock of agents only trying to maximize their own utilities, who are all using the same base decision algorithm, and who all know it, and who understand the correlation this implies.
Think bootstraps, not skyhooks.
My understanding is that cousin_it is suggesting just such a base decision algorithm, which works like this:
compute a global outcome that is “good and fair” for everyone who is using this decision algorithm
choose the option that implements the above outcome
(anyone who doesn’t follow this algorithm will be left out of the “good and fair” computation and presumably will fail to maximize its utility as a result)
We can all probably see a whole bunch of difficulties here, both technical and philosophical. But Eliezer, it’s not clear from reading your comment what your objection is exactly.
ETA: I just noticed that cousin it’s proposed “good and fair” formula doesn’t actually ensure my point above in parenthesis (that anyone who doesn’t follow the decision algorithm will fail to maximize its utility). To see this, suppose in PD one of the players can choose a third option, which is not Pareto-optimal but unilaterally gives it a higher payoff than assigned by cousin_it’s formula.
cousin_it, if you’re reading this, please see http://lesswrong.com/lw/102/indexical_uncertainty_and_the_axiom_of/sk8 , where Vladimir Nesov proposed a notion that turns out to coincide with the concept of the core in cooperative game theory. This is necessary to ensure that the “good and fair” solution will be self-enforcing.
If one of the PD players has a third option of “get two bucks guaranteed and screw everyone else”—if the game structure doesn’t allow other players to punish him—then no algorithm at all can punish him. Or did you mean something else?
Yep, I know what the core is, and it does seem relevant. But seeing as my solution is definitely wrong for stability reasons, I’m currently trying to think of any stable solution (continuous under small changes in game payoffs), and failing so far. Will think about the core later.
The “good and fair” solution needs to offer him a Pareto improvement over the outcome that he can reach by himself.
Wei Dai, thanks. I gave some thought to your comments and they seem to constitute a proof that any “purely geometric” construction (that depends only on the Pareto set) fails your criterion. Amending the post.
Sorry, I was being stupid. You’re right.
Can you take us through these, as an example? I’m lost.
The payoff set of any PD variant is a quadrilateral, see those graphs. The algorithm I outlined says the tangent line should be parallel to the slope from (C,D) to (D,C). The point (C,C) is the only Pareto-good point on the first graph where such a line could touch the payoff set, not intersect it. The second graph falls under the NB :-( Since the solution is invariant under scaling in any coordinate, it works for the Eliezer’s True PD just fine—it doesn’t require comparing human lives to paperclips anywhere.
That said, the algorithm has serious flaws. It doesn’t seem to be a likely focal point, and it’s not as continuous as I’d like the solution to be. The part of the post up to “now, there are many ways” is much less controversial.
Well, of course you can’t restrict your attention to the Pareto Set. Every presentation of the bargaining problem characterizes the problem using both the Pareto boundary and a “zero-point”, “threat point”, or “non-agreement point”. The additional strategies that Wei suggests also change the zero-point. That is, they change the problem.
As to whether we can solve fairness, it is already solved—at least in the 2 party perfect information case. And it has been solved since 1953.
Since AIs can self-modify, one nice property would be if no player can obtain a better outcome for itself by modifying its utility function ahead of time. In this case, if I take the pure outcome that most favors me, and change my utility function so that I value it less, I can increase my bargaining power and possibly get a better outcome under your proposed solution. Right?
It’s like with unfair coins: there are unfair utility functions as well. ;-) You’d have to look at what the utility could be, as with the prior probability of Omega’s coin. All the AI can do is adapt its utility to environment, while there is also a “prior” utility, that describes how it looks at all possible environments, how it chose this particular utility for this particular environment.
Yes. But I don’t see this as a very grave objection, because equilibria in non-zero-sum games are typically vulnerable to manipulation by self-handicapping. A great book on this topic is “Strategy of Conflict” by Thomas Schelling, can’t recommend it enough.
Of course, if I invented a system that was resistant to manipulation, there’d be much rejoicing. But this goal seems still far away if at all possible.
It’s not hard to fix a system so that it is resistant to manipulation: given a set of players, compute how the players would want to modify their utility functions under the original system, then use the original system on the modified utility functions to pick an outcome, then make that the solution under the new system. In the new system, players no longer has any incentive to modify their utility functions.
Of course if you do this, the new system might lose some desirable properties of the original system. But since you’re assuming that the players are self-modifying AIs, that just means your system never really had those properties to begin with.
Interesting.
How a player would want to modify their utility function depends on how other players modify theirs. Schelling introduces “strategic moves” in games: selectively reducing your payoffs under some outcomes. (This is a formalization of unilateral commitments, e.g. burning your bridges.) A simultaneous-move game of strategic moves derived from any simple game quickly becomes a pretty tricky beast. I’ll have to look at it closer; thanks for reminding.
I tend to agree with Eliezer that this is not really about fairness, but insofar as we’re playing the what’s fair game...
FWIW, this has always struck me as elevating a practical problem (it’s difficult to compare utilities) to the level of a conceptual problem (it’s impossible to compare utilities). At a practical level, we compare utilities all the time. To take a somewhat extreme example, it seems pretty obvious that a speck of dust in Adam’s eye is less bad than Eve being tortured.
The implication of this is that I actively do not want the fair point to be invariant to affine tranformations of the utility scales. If one person is getting much more utility than someone else, that is relevant information to me and I do not want it thrown away.
NB: In the event that I did think that utility was incomparable in the way “classically” assumed, then wouldn’t the solution need to be invariant to monotone transformations of the utility function? Why should affine invariance suffice?
Can adding irrelevant alternatives change the “range of possible gains”?
If by “irrelevant” you mean “not strongly Pareto-optimal”, then no. Unless my brain is turning to mush, of course.