trying my best

# midco(Jacob Stavrianos)

# Game-theoretic Alignment in terms of Attainable Utility

Answering questions one-by-one:

I played fast and loose with IEU in the intro section. I think it can be consistently defined in the Bayesian game sense of “expected utility given your type”, where the games in the intro section are interpreted as each player having constant type. In the Bayesian Network section, this is explicitly the definition (in particular, player i’s IEU varies as a function of their type).

Upon reading the Wiki page, it seems like Shapley value and Impact share a lot of common properties? I’m not sure of any exact relationship, but I’ll look into connections in the future.

I think what’s going on is that the “causal order” of and is switched, which makes “look as though” it controls the value of . In terms of game theory the distinction is (I think) definitional; I include it because Impact has to explicitly consider this dynamic.

In retrospect: yep, that’s conditional expectation! My fault for the unnecessary notation. I introduced it to capture the idea of a vector space projection on random variables and didn’t see the connection to pre-existing notation.

# Analyzing Multiplayer Games using IMPACT

We conjecture this? We’ve only proven limiting cases so far, (constant-sum, and strongly suspected for common-payoff), but we’re still working on formulating a more general claim.

Thank you so much for the comments! I’m pretty new to the platform (and to EA research in general), so feedback is useful for getting a broader perspective on our work.

To add to TurnTrout’s comments about power-scarcity and the CCC, I’d say that the broader vision of the multi-agent formulation is to establish a general notion of power-scarcity as a function of “similarity” between players’ reward functions (I mention this in the post’s final notes). In this paradigm, the constant-sum case is one limiting case of “general power-scarcity”, which I see as the “big idea”. As a simple example, general power-scarcity would provide a direct motivation for fearing robustly instrumental goals, since we’d have reason to believe an AI with goals orthogonal(ish) from human goals would be incentivized to compete with humanity for Power.

We’re planning to continue investigating multi-agent Power and power-scarcity, so hopefully we’ll have a more fleshed-out notion of general power-scarcity in the months to come.

Also, re: “as players’ strategies improve, their collective Power tends to decrease”, I think your intuition is correct? Upon reflection, the effect can be explained reasonably well by “improving your actions has no effect on your Power, but a negative effect on opponents’ Power”.

Upon reflection, I now suspect that the Impact I(ai) is analogous to Shapley Value. In particular, the post could be reformulated using Shapley values and would attain similar results. I’m not sure whether Impact-scarcity of Shapley values holds, but the examples from the post suggest that it does.

(thanks to TurnTrout for pointing this out!)