I went back and re-read your https://www.lesswrong.com/posts/8LEPDY36jBYpijrSw/what-counts-as-defection post, and it’s much clearer to me that you’re NOT using standard game-theory payouts (utility) here. You’re using some hybrid of utility and resource payouts, where you seem to normalize payout amounts, but then don’t limit the decision to the payouts—players have a utility function which converts the payouts (for all players, not just themselves) into something they maximize in their decision. It’s not clear whether they include any non-modeled information (how much they like the other player, whether they think there are future games or reputation effects, etc.) in their decision.
Based on this, I don’t think the question is well-formed. A 2x2 normal-form game is self-contained and one-shot. There’s no alignment to measure or consider—it’s just ONE SELECTION, with one of two outcomes based on the other agent’s selection.
It would be VERY INTERESTING to define a game nomenclature to specify the universe of considerations that two (or more) agents can have to make a decision, and then to define an “alignment” measure about when a player’s utility function prefers similar result-boxes as the others’ do. I’d be curious about even very simple properties, like “is it symmetrical” (I suspect no—A can be more aligned with B than B is with A, even for symmetrical-in-resource-outcome games).
it’s much clearer to me that you’re NOT using standard game-theory payouts (utility) here.
Thanks for taking the time to read further / understand what I’m trying to communicate. Can you point me to the perspective you consider standard, so I know what part of my communication was unclear / how to reply to the claim that I’m not using “standard” payouts/utility?
Sorry, I didn’t mean to be accusatory in that, only descriptive in a way that I hope will let me understand what you’re trying to model/measure as “alignment”, with the prerequisite understanding of what the payout matrix indicates. http://cs.brown.edu/courses/cs1951k/lectures/2020/chapters1and2.pdf is one reference, but I’ll admit it’s baked in to my understanding to the point that I don’t know where I first saw it. I can’t find any references to the other interpretation (that the payouts are something other than a ranking of preferences by each player).
So the question is “what DO these payout numbers represent”? and “what other factors go into an agent’s decision of which row/column to choose”?
I think I agree that payout represents player utility.
The agent’s decision can be made in any way. Best response, worst response, random response, etc.
I just don’t want to assume the players are making decisions via best response to each strategy profile (which is just some joint strategy of all the game’s players). Like, in rock-paper-scissors, if we consider the strategy profile P1: rock, P2: scissors, I’m not assuming that P2 would respond to this by playing paper.
And when I talk about ‘responses’, I do mean ‘response’ in the ‘best response’ sense; the same way one can reason about Nash equilibria in non-iterated games, we can imagine asking “how would the player respond to this outcome?”.
Another point for triangulating my thoughts here is Vanessa’s answer, which I think resolves the open question.
I like Vanessa’s answer for the fact that it’s clearly NOT utility that is in the given payoff matrix. It’s not specified what it actually is, but the inclusion of a utility function that transforms the given outcomes into desirability (utility) for the players separates the concept enough to make sense. and then defining alignment as how well player A’s utility function supports player B’s game-outcome works. Not sure it’s useful, but it’s sensible.
How is it clearly not about utility being specified in the payoff matrix? Vanessa’s definition itself relies on utility, and both of us interchanged ‘payoff’ and ‘utility’ in the ensuing comments.
I went back and re-read your https://www.lesswrong.com/posts/8LEPDY36jBYpijrSw/what-counts-as-defection post, and it’s much clearer to me that you’re NOT using standard game-theory payouts (utility) here. You’re using some hybrid of utility and resource payouts, where you seem to normalize payout amounts, but then don’t limit the decision to the payouts—players have a utility function which converts the payouts (for all players, not just themselves) into something they maximize in their decision. It’s not clear whether they include any non-modeled information (how much they like the other player, whether they think there are future games or reputation effects, etc.) in their decision.
Based on this, I don’t think the question is well-formed. A 2x2 normal-form game is self-contained and one-shot. There’s no alignment to measure or consider—it’s just ONE SELECTION, with one of two outcomes based on the other agent’s selection.
It would be VERY INTERESTING to define a game nomenclature to specify the universe of considerations that two (or more) agents can have to make a decision, and then to define an “alignment” measure about when a player’s utility function prefers similar result-boxes as the others’ do. I’d be curious about even very simple properties, like “is it symmetrical” (I suspect no—A can be more aligned with B than B is with A, even for symmetrical-in-resource-outcome games).
Thanks for taking the time to read further / understand what I’m trying to communicate. Can you point me to the perspective you consider standard, so I know what part of my communication was unclear / how to reply to the claim that I’m not using “standard” payouts/utility?
Sorry, I didn’t mean to be accusatory in that, only descriptive in a way that I hope will let me understand what you’re trying to model/measure as “alignment”, with the prerequisite understanding of what the payout matrix indicates. http://cs.brown.edu/courses/cs1951k/lectures/2020/chapters1and2.pdf is one reference, but I’ll admit it’s baked in to my understanding to the point that I don’t know where I first saw it. I can’t find any references to the other interpretation (that the payouts are something other than a ranking of preferences by each player).
So the question is “what DO these payout numbers represent”? and “what other factors go into an agent’s decision of which row/column to choose”?
Right, thanks!
I think I agree that payout represents player utility.
The agent’s decision can be made in any way. Best response, worst response, random response, etc.
I just don’t want to assume the players are making decisions via best response to each strategy profile (which is just some joint strategy of all the game’s players). Like, in rock-paper-scissors, if we consider the strategy profile
P1: rock, P2: scissors, I’m not assuming that P2 would respond to this by playing paper.And when I talk about ‘responses’, I do mean ‘response’ in the ‘best response’ sense; the same way one can reason about Nash equilibria in non-iterated games, we can imagine asking “how would the player respond to this outcome?”.
Another point for triangulating my thoughts here is Vanessa’s answer, which I think resolves the open question.
I like Vanessa’s answer for the fact that it’s clearly NOT utility that is in the given payoff matrix. It’s not specified what it actually is, but the inclusion of a utility function that transforms the given outcomes into desirability (utility) for the players separates the concept enough to make sense. and then defining alignment as how well player A’s utility function supports player B’s game-outcome works. Not sure it’s useful, but it’s sensible.
How is it clearly not about utility being specified in the payoff matrix? Vanessa’s definition itself relies on utility, and both of us interchanged ‘payoff’ and ‘utility’ in the ensuing comments.