Analyzing Multiplayer Games using IMPACT

This post is an informal writeup of an idea discovered while investigating POWER. It offers an additional tool for framing multi-agent games in terms of attainable utility, with the hope that data from both IMPACT and POWER can offer a more complete view of POWER-scarcity dynamics in multi-agent games.

Note on writing style: I use royal “we” for explaining math; the results are essentially my own unreviewed work.

Overview

The purpose of this post is to define and give intuition for a measure of a player’s impact on some real-valued outcome variable in an arbitrary multiplayer game; we call this measure “IMPACT”. We present motivating examples of multiplayer games, then define IMPACT in the more general context of arbitrary Bayesian networks. We then explore some basic connections between IMPACT and standard multiplayer game theory and present some conjectures to motivate further research.

Motivating Examples—Understanding Multiplayer Games

One of the difficulties in understanding multiplayer games is the principle that in general, a player’s optimal action is dependent on the actions of other players. One way around this problem is to restrict consideration to Nash Equilibria, which fully specify optimal actions for us. However, this loses the power to describe sub-optimal actions, which are relevant in real-world examples involving imperfect humans and intractably large action spaces.

While Nash Equilibria “work” by first conditioning on optimal actions and then considering probabilistic strategies, we go the other way around: fix some arbitrary mixed strategy profile, then analyze expected utility. In the single-player setting, this is analogous to a value function and can be constructed straightforwardly. In the multiplayer setting, we’re forced to deal with dependencies on other players’ strategies that complicate the notion of “expected utility”. To handle this dependency, our framework makes the following assumptions motivated by Bayesian games:

Each player’s posterior distribution over other players’ actions is optimal given that player’s information (“common prior assumption” )
We consider interim expected utility (IEU), which takes an expectation across the player’s posterior beliefs of other players’ actions

Borrowing notation from Bayesian games, we now have enough machinery to consider various games in terms of IEU $u_{i} : A_{i} \to R$ . In each example, we fix a strategy profile $a \sim σ$ and assume some common payoff $R$ (which we’ll later describe as the “outcome variable” in the formal definition of IMPACT).

Counting heads

Let $a_{i} \sim Ber (\frac{1}{2})$ , $R = \sum_{i = 1}^{n} a_{i}$ . This game can be thought of as follows: “everyone flips a coin, with reward given by the number of heads”. Intuitively, the strategy for this game should be simple: flipping heads is better than flipping tails. Calculating IEU, we see that the results match our prediction:

u_{i} (1) = 1 + E_{a_{- i}} [\sum j \neq i a_{i}] = 1 + \sum j \neq i (\frac{1}{2}) = \frac{n - 1}{2} + 1

u_{i} (0) = E_{a_{- i}} [\sum j \neq i a_{i}] = \sum j \neq i (\frac{1}{2}) = \frac{n - 1}{2}

In fact, the difference in IEU is exactly $1$ , contributed by the added heads coming from coin $i$ .

Illusory Impact

Let $a_{i} \sim Ber (\frac{1}{2})$ , $R = \oplus_{i = 1}^{n} a_{i}$ . This game can be thought of as follows: “everyone flips a coin; you win iff there are an odd number of heads”. This game has nice correlated equilibria: if players can coordinate and determine their coin’s side, then they can easily win. However, we assume each player just blindly flips their coin, regardless of reward. Even though player $i$ ’s strategy is random, we can compute IEU for each choice of action:

u_{i} (a_{i}) = P [\oplus_{j = 1}^{n} a_{j} = 1] = P [\oplus_{j \neq i} a_{j} \neq a_{i}] = \frac{1}{2}

Importantly, IEU is constant over choice of $a_{i}$ , which means that player $i$ has no “good” strategy (even assuming they can choose what side their coin lands on).

Coordinated Impact

Let $ω \sim Ber (\frac{1}{2})$ . Now, let $a_{i}, R = ω$ . This game can be thought of as follows: “a referee flips a coin, everyone shouts the result of the coin flip, and everyone wins iff it’s heads”. Intuitively, this is a ridiculous game: no player produces a meaningful action; the entire game is determined by the referee’s coin flip. However, if we shut our eyes and blindly compute IEU, we find that $u_{i} (1) = 1, u_{i} (0) = 0$ , thus player $i$ benefits from shouting “heads”?!

Well, in a sense, they do. The universes where player $i$ shouts “heads” are exactly the universes in which everyone wins. The problem is that of agency: player $i$ doesn’t choose their action, the coin ( $ω$ ) does. If we condition on the value of $ω$ , then each player’s action becomes deterministic, thus IEU is constant across each player’s (trivial) action space.

Interestingly, we have a clear notion of the “IEU” of values of $ω$ , even though it’s an external variable rather than an action in the game. This suggests a limitation of conceptualizing IMPACT strictly in terms of games: variables have impact, not just actions. In the Bayesian network formalization to come, we’ll see that the node $ω$ impacts the outcome variable, while no nodes $a_{i}$ do.

Defining IMPACT using Bayesian Networks

As suggested in the “Coordinated impact” example, the most principled approach is to define IMPACT as a property of dependent variables, then consider game theory as a useful application. Again motivated by the “Coordinated impact” example, we can show that IMPACT must explicitly consider variable dependencies to avoid issues with double-counting. We formalize with Bayesian networks to provide the desired dependency structure.

Definition—General Bayesian Networks

Borrowing notation from Wikipedia, consider an arbitrary Bayesian network $G = (V, E)$ with variables ${X_{v}}_{v \in V}$ . Additionally, choose an “outcome node” $v_{O} \in V$ (we can assume that $v_{O}$ is a descendant of each $v \in V$ , but the assumption isn’t required); we use $X_{v_{O}}$ as our outcome variable to measure IMPACT against.

We now define some notation:

Given arbitrary R.V.s $A, B$ , we define the conditional expectation of $A$ given $B = b$ as

e (A, B) := E_{B = b} [A]

Note that $e (A, B)$ is itself a random variable in the value of $B$ .

Given R.V.s $A, B$ , we call $B$ a marginal variable of $A$ if the R.V. identity $B = e (A, B)$ holds. Intuitively, we can think of $B$ as an estimate of $A$ given limited information.
Consider nodes $v_{1}, v_{2} \in V$ . We say $v_{1}$ is an ancestor of $v_{2}$ (equivalently, $v_{2}$ is a descendant of $v_{1}$ ) iff $v_{1} \neq v_{2}$ and there exists a directed path $v_{1} \to v_{2}$ . This relationship is direct iff such a path consists of a single edge.
Let $A (v)$ be the set of ancestors of node $v \in V$ . Let $A_{d} (v)$ be the set of direct ancestors of node $v \in V$ .
Given node $v \in V$ , define the IMPACT of $X_{v}$ on $X_{v_{O}}$ to be the following R.V.

I (v) := e (X_{V_{O}}, {X_{u} ∣ u \in A_{d} (v) \lor u = v}) - e (X_{V_{O}}, {X_{u} ∣ u \in A_{d} (v)})

We now work toward a notion of IMPACT-scarcity—the idea that the “magnitude” of IMPACT of each node is bounded above. We will eventually demonstrate this claim in terms of the sum of variances of $I (v)$ . First, we prove some necessary lemmas:

Lemma 1: Given an arbitrary topological ordering $V = {v_{i}}_{i = 1}^{n}$ , we can construct the following collection of R.V.s

Δ_{i} := e (X_{v_{O}}, {X_{v_{j}} ∣ j \leq i}) - e (X_{v_{O}}, {X_{v_{j}} ∣ j < i})

We now claim the following identity on R.V.s:

n \sum i = 1 Δ_{i} \equiv X_{v_{O}} - e (X_{v_{O}}, \emptyset)

Proof: The identity follows from a telescoping sums argument, as well as the observation that for $v_{i} = v_{O}$ , we have $e (X_{v_{i}}, {X_{v_{j}} ∣ j \leq i}) \equiv X_{v_{O}}$ .

Lemma 2: Consider R.V.s $A, B$ s.t. $B$ is a marginal variable of A. Then $Var (A) \geq Var (B)$ .

Proof: Consider an arbitrary vector space of R.V.s containing $A, B$ . We see that the function $f (v) \to e (v, B)$ is a projection, while $g (v) \to Var (v)$ is a norm. Thus, the claim is equivalent to $g (v) \geq g (f (v))$ , which is a property of general vector spaces.

Lemma 3: Consider arbitrary R.V.s $A, B, C$ . If $e (A, B) \equiv A$ (if $B$ fully determines $A$ ), then $e (C, A)$ is a marginal variable of $e (C, B)$ .

Proof: We observe the following

e (C, A) = E_{B | A} [e (C, {A, B})] = E_{B | A} [e (C, B)] = e (e (C, B), A)

Now, consider the quantity $e (e (C, B), e (C, A))$ . First, we see that $A$ fully determines $e (C, A)$ . Thus, viewing $e (X, Y)$ as a least-squares estimate of $X$ given $Y$ , we find that the estimate $e (e (C, B), e (C, A))$ is at most as accurate as $e (e (C, B), A) = e (C, A)$ . However, $e (e (C, B), e (C, A))$ “knows” $e (C, A)$ by the definition of $e$ , thus the optimal estimate is

e (e (C, B), e (C, A)) = e (C, A)

The result follows by the definition of a marginal variable.

Note: We could also prove the result by expressing “A is a marginal variable of B” as “some vector space projection maps B to A” (equivalently, $A = e (B, C)$ for some C), the result then follows from $e (C, A) = e (e (C, B), A)$ .

Lemma 4: Consider arbitrary $1 \leq i < j \leq n$ . Then the R.V. $e (Δ_{j}, Δ_{i}) = 0$ .

Proof: By “pausing” evaluation of the Bayesian network before $X_{v_{j}}$ is determined, we can argue that the R.V. $e (Δ_{j}, {X_{v_{k}} | k < j}) \equiv 0$ . Since $Δ_{i}$ is fully determined by ${X_{v_{k}} | k < j}$ , we conclude by Lemma 3 that $e (Δ_{j}, Δ_{i})$ is a marginal variable of $e (Δ_{j}, {X_{v_{k}} | k < j})$ .

By Lemma 2 (and noting that all vector space projections map 0 to 0), we conclude $e (Δ_{j}, Δ_{i}) \equiv 0$ .

Lemma 5: For each $1 \leq i \leq n$ , $I (v_{i})$ is a marginal variable of $Δ_{i}$ .

Proof: We invoke Lemma 3, choosing $A = {X_{u} ∣ u \in A (v_{i}) \lor u = v_{i}}$ and $B = {X_{v_{j}} ∣ j \leq i}$ .

We can now proceed with our claim of IMPACT-scarcity:

Theorem 1 (IMPACT-scarcity):

n \sum i = 1 Var (I (v_{i})) \leq Var (X_{v_{O}})

Proof: By Lemma 1, we have the following R.V. identity

n \sum i = 1 Δ_{i} \equiv X_{v_{O}} - e (X_{v_{O}}, \emptyset)

We now compute variance of both sides. By Lemma 4, the $Cov (Δ_{i}, Δ_{j})$ terms are 0 for $i \neq j$ . Thus, we’re left with

n \sum i = 1 Var (Δ_{i}) \leq Var (X_{v_{O}})

We finish by applying lemmas 5 and 2, which prove $Var (I (v_{i})) \leq Var (Δ_{i})$ .

Note: The only inequality in the above proof is the equation $Var (I (v_{i})) \leq Var (Δ_{i})$ . Thus, equality is achieved when this equation is an equality for each $1 \leq i \leq n$ (for example, in chain-shaped Bayesian networks).

Application—Representing a Multiplayer Game as a Bayesian Network

As promised, we now apply our framework for IMPACT to the game theory framework from earlier. To begin, consider an arbitrary multiplayer (Bayesian) game with fixed strategy profile $σ$ . We represent the mechanics of the game and the players’ strategies as a Bayesian network:

Note: In an abuse of notation, we let $a_{i}$ refer both to the R.V. representing player $i$ ‘s action and to the node $a_{i}$ in the Bayesian network (thus, $X_{a_{i}}$ (Bayesian network variable) $\sim a_{i}$ (action)). Thus, statements like $I (a_{i})$ can be parsed as “the impact of player $i$ ’s action $a_{i}$ ” without the need for cumbersome notation (and similarly for other nodes in the Bayesian network).

For now, we define an arbitrary outcome node $O$ as a direct descendant of every other node. We will later set $X_{O}$ to represent game-theoretically meaningful quantities; in particular player $i$ ’s reward $R_{i}$ .

Additionally, call a node $v$ deterministic if $X_{v}$ is fully determined by ${X_{u} ∣ u \in A_{d} (v)}$ (equivalently, if $e (X_{v}, {X_{u} ∣ u \in A_{d} (v)}) = X_{v}$ ). Observe that for any deterministic node $v$ , we have $I (v) \equiv 0$ by the definition of IMPACT. This has two important implications for our model:

We see that the $t_{i}$ are deterministic functions of $ω$ and the outcome $O$ is a deterministic function of all variables in the Bayesian network. Thus, the only non-deterministic variables are $ω$ and $a_{i}$ , which by the above must contribute all IMPACT.
Since $O$ contributes zero IMPACT (because it’s deterministic), its dependencies $A_{d} (O)$ won’t matter for our analysis. Thus, we can safely let $O$ depend on the entire Bayesian network, despite the fact that for certain relevant cases, $O$ depends only on certain variables (example: $O = R_{i}$ )

We’re left with the IMPACT terms from $ω$ and each $a_{i}$ , which we interpret in game-theoretic language:

The IMPACT $I (ω)$ can be thought of as “how good a random draw is the chosen value of $ω$ ?” This doesn’t translate precisely (as far as I know), but intuition can be gained from viewing the coordinated impact example as a function $ω \to O$ .
The IMPACT $I (a_{i})$ “looks like” player $i$ ’s IEU under the reward function given by $X_{O}$ . More specifically: letting $O = R_{i}$ , we have

u_{i} (a_{i}) \equiv e (R_{i}, {t_{i}, a_{i}}) \equiv e (R_{i}, {t_{i}}) + I (a_{i})

The residual term $e (X_{O}, {t_{i}})$ is best understood as analogous to $I (ω)$ , but considering $t_{i}$ as the fundamental random quantity (instead of $ω$ , its source of randomness). Since player $i$ only acts based on $t_{i}$ , this corresponds to player $i$ ’s logic of “how good a random draw do I think $ω$ is, given only knowledge of $t_{i}$ ?”

Game Theory in terms of IMPACT

While research on game-theoretic results from the perspective of IMPACT is extremely limited (I only know of the preliminary work I’ve already done), the best litmus test for a proposed framework is to see if it readily produces meaningful results. In this section, I’ll outline the immediate results from defining Impact in the setting of multiplayer games and suggest some avenues for further exploration.

POWER- and IMPACT-scarcity

The crux of these results is the fundamental notion of IMPACT-scarcity and connection between $I (a_{i})$ and player $i$ ’s IEU. We begin with stating our IMPACT-scarcity result in terms of outcome variable $O$ :

\sum Var (I (v_{i})) = I (ω) + n \sum i = 1 I (a_{i}) \leq Var (O)

One natural vein of results comes from plugging in variables for $O$ and seeing what comes out. We give some basic examples:

Letting $O$ be constant, we find $Var (I (a_{i})) = 0 \to I (a_{i})$ is constant. This makes sense—you can impact a constant variable, but you can’t do anything to change its value.
Letting $O = R_{i}$ , we find a competitive dynamic between player $i$ ‘s interests and “noise” generated by other players’ actions. This can be understood by arguing that as $Var (I (a_{i}))$ increases, player $i$ ‘s optimal strategy becomes increasingly robust to other players’ choices of action.

As mentioned in the intro, one goal of IMPACT research is to unify the idea of POWER- and IMPACT-scarcity. I suspect that the intuitive understanding is “IMPACT = change in POWER”, motivated by the simplification of “POWER = $E$ , IMPACT = $Var$ ”. While the notion remains far from precise, I conjecture that IMPACT on a player’s POWER is a marginal variable of IMPACT on that player’s reward, from which an upper bound on “ $Δ$ POWER” follows.

IEU in terms of IMPACT

We now start from our other main result: $u_{i} (a_{i}) = e (R_{i}, t_{i}) + I (a_{i})$ . Since we don’t assume any information about IEU by default, a natural starting point is in the case of a Nash Equilibrium.

By definition of (Bayesian) Nash Equilibrium, each player’s strategy must be a best response. Thus, for each $a_{i}$ in the support of $σ_{i} (t_{i})$ , we have $u_{i} (a) = {max}_{a_{i}} (u_{i} (a_{i})) = M$ . This implies $e (R_{i}, t_{i}) = I (a_{i}) = 0$ , which can be understood by arguing that in a Nash Equilibrium, no player can unilaterally increase their expected reward.

Sensing a deeper connection between IMPACT and Nash Equilibria, we define the self-IMPACT of action $a_{i}$ to be $I (a_{i})$ given $O = R_{i}$ . Above, we showed that in a Nash Equilibrium, each player’s actions have zero self-IMPACT. Generalizing to suboptimal actions, we find that all actions have self-IMPACT $\leq 0$ , with equality when the action is a best response.

Unfortunately, the converse doesn’t hold: each player only taking zero self-IMPACT actions doesn’t imply a Nash equilibrium. It implies a Nash Equilibrium of the game where the action spaces $A_{i}$ are restricted to only actions played with nonzero probability, but these notions aren’t equivalent if strong actions remain unplayed (consider the mixed-strategy equilibrium for rock-paper-scissors when generalized to rock-paper-scissors-[insta-win action]).

We can also explore the fact that in a Nash Equilibrium, POWER equals IEU. Thus, we can write $POWER (i, σ) = e (R_{i}, t_{i})$ , which is strictly a function of $ω$ . Intuitively, IMPACT accounts for variation in $a_{i}$ while POWER takes a max over it, thus they become equivalent in limiting cases for $σ_{i}$ .

Considering other players / IMPACT-trading

As a final and unexplored angle on IMPACT, consider the case where player 1 impacts $R_{2}$ and player 2 impacts $R_{1}$ . Assuming it wouldn’t adversely affect their own utilities, the players can “trade” by modifying their strategies to mutually grant each other increased reward. The premise itself immediately raises red flags; I’ll attempt to briefly address them:

“this requires communication between players!”—yep. Barring non-causal decision theory, agents need some way to coordinate strategies.
“what if the trade accidentally hurts one player?”—the simplest answer is “they only trade if it’s mutually beneficial”, but that’s equivalent to existing solutions to coordination problems like the Prisoners’ dilemma. Ideally, a notion of “reward exchange rate” could be computed using IMPACT, especially if allowing for generalizations of reward like quasilinear utility.
“IMPACT is essentially a measure of variance, while utility is a measure of expectation. How do you convert between them?”—I don’t know, but have ideas:
- Trade off values of independent “sub-variables” of your action space. Example: if we’re playing 10 simultaneous Prisoners’ Dilemma-s, then trade off “I cooperate if you do” for each individual PD instance.
- Find some linear measure of IMPACT and trade with that instead. This looks much more like POWER-trading, which offers a similar mechanism for mutually increased reward.

Temporarily putting the issues aside, I intend to explore IMPACT-trading as an attempt to understand coordination-centric games like the Prisoners’ Dilemma. More generally, I hope to apply the IMPACT framework to a broad range of multiplayer game-theoretic phenomena and see if new insight can be gained.