# Contrite Strategies and The Need For Standards

Epistemic Status: Confident

There’s a really interesting paper from 1996 called The Logic of Contrition, which I’ll summarize here. In it, the authors identify a strategy called “Contrite Tit For Tat”, which does better than either Pavlov or Generous Tit For Tat in Iterated Prisoner’s Dilemma.

In Contrite Tit For Tat, the player doesn’t only look at what he and the other player played on the last term, but also another variable, the standing of the players, which can be good or bad.

If Bob defected on Alice last round but Alice was in good standing, then Bob’s standing switches to bad, and Alice defects against Bob.

If Bob defected on Alice last round but Alice was in bad standing, then Bob’s standing stays good, and Alice cooperates with Bob.

If Bob cooperated with Alice last round, Bob keeps his good standing, and Alice cooperates.

This allows two Contrite Tit For Tat players to recover quickly from accidental defections without defecting against each other forever;

D/​C → C/​D → C/​C

But, unlike Pavlov, it consistently resists the “always defect” strategy

D/​C → D/​D → D/​D → D/​D …

Like TFT (Tit For Tat) and unlike Pavlov and gTFT (Generous Tit For Tat), cTFT (Contrite Tit For Tat) can invade a population of all Defectors.

A related contrite strategy is Remorse. Remorse cooperates only if it is in bad standing, or if both players cooperated in the previous round. In other words, Remorse is more aggressive; unlike cTFT, it can attack cooperators.

Against the strategy “always cooperate”, cTFT always cooperates but Remorse alternates cooperating and defecting:

C/​C → C/​D → C/​C → C/​D …

And Remorse defends effectively against defectors:

D/​C → D/​D → D/​D → D/​D…

But if one Remorse accidentally defects against another, recovery is more difficult:

C/​D → D/​C → D/​D → C/​D → …

If the Prisoner’s Dilemma is repeated a large but finite number of times, cTFT is an evolutionarily stable state in the sense that you can’t do better for yourself when playing against a cTFT player through doing anything that deviates from what cTFT would recommend. This implies that no other strategy can successfully invade a population of all cTFT’s.

REMORSE can sometimes be invaded by strategies better at cooperating with themselves, while Pavlov can sometimes be invaded by Defectors, depending on the payoff matrix; but for all Prisoner’s Dilemma payoff matrices, cTFT resists invasion.

Defector and a similar strategy called Grim Trigger (if a player ever defects on you, keep defecting forever) are evolutionarily stable, but not good outcomes — they result in much lower scores for everyone in the population than TFT or its variants. By contrast, a whole population that adopts cTFT, gTFT, Pavlov, or Remorse on average gets the payoff from cooperating each round.

The bottom line is, adding “contrition” to TFT makes it quite a bit better, and allows it to keep pace with Pavlov in exploiting TFT’s, while doing better than Pavlov at exploiting Defectors.

This is no longer true if we add noise in the perception of good or bad standing; contrite strategies, like TFT, can get stuck defecting against each other if they erroneously perceive bad standing.

The moral of the story is that there’s a game-theoretic advantage to not only having reciprocity (TFT) but standards (cTFT), and in fact reciprocity alone is not enough to outperform strategies like Pavlov which don’t map well to human moral maxims.

What do I mean by standards?

There’s a difference between saying “Behavior X is better than behavior Y” and saying “Behavior Y is unacceptable.”

The concept of “unacceptable” behavior functions like the concept of “standing” in the game theory paper. If I do something “unacceptable” and you respond in some negative way (you get mad or punish me or w/​e), I’m not supposed to retaliate against your negative response, I’m supposed to accept it.

Pure reciprocity results in blood feuds — “if you kill one of my family I’ll kill one of yours” is perfectly sound Tit For Tat reasoning, but it means that we can’t stop killing once we’ve started.

Arbitrary forgiveness fixes that problem and allows parties to reconcile even if they’ve been fighting, but still leaves you vulnerable to an attacker who just won’t quit.

Contrite strategies are like having a court system. (Though not an enforcement system! They are still “anarchist” in that sense — all cTFT bots are equal.) The “standing” is an assessment attached to each person of whether they are in the wrong and thereby restricted in their permission to retaliate.

In general, for actions not covered by the legal system and even for some that are, we don’t have widely shared standards of acceptable vs. unacceptable behavior. We’re aware (and especially so given the internet) that these standards differ from subculture to subculture and context to context, and we’re often aware that they’re arbitrary, and so we have enormous difficulty getting widely shared clarity on claims like “he was deceptive and that’s not OK”. Because…was he deceptive in a way that counts as fraud? Was it just “puffery” of the kind that’s normal in PR? Was it a white lie to spare someone’s feelings? Was it “just venting” and thus not expected to be as nuanced or fact-checked as more formal speech? What level or standard of honesty could he reasonably have been expected to be living up to?

We can’t say “that’s not OK” without some kind of understanding that he had failed to live up to a shared expectation. And where is that bar? It’s going to depend who you ask and what local context they’re living in. And not only that, but the fact that nobody is keeping track of where even the separate, local standards are, eventually standards will have to be dropped to the lowest common denominator if not made explicit.

MBTI isn’t science but it’s illustrative descriptively, and it seems to me that the difference between “Perceivers” and “Judgers”, which is basically the difference between the kinds of people who get called “judgmental” in ordinary English and the people who don’t, is that “Judgers” have a clear idea of where the line is between “acceptable” and “unacceptable” behavior, while Perceivers don’t. I’m a Perceiver, and I’ve often had this experience where someone is saying “that’s just Not OK” and I’m like “whoa, where are you getting that? I can certainly see that it’s suboptimal, this other thing would be better, but why are you drawing the line for acceptability here instead of somewhere else?”

The lesson of cTFT is that having a line in the first place, having a standard that you can either be in line with or in violation of, has survival value.

• The bottom line is, adding “contrition” to TFT makes it quite a bit better, and allows it to keep pace with Pavlov in exploiting TFT’s, while doing better than Pavlov at exploiting Defectors.
This is no longer true if we add noise in the perception of good or bad standing; contrite strategies, like TFT, can get stuck defecting against each other if they erroneously perceive bad standing.

So cTFT moves TFT’s weakness to noise somewhere else. Where can we find real robustness?

From page 2 of the paper:

“cTFT is not the only evolutionarily stable rule which is Pareto-optimal (and hence yields the maximal pay-off if the whole population adopts it).”

We discuss cTFT, PAVLOV and REMORSE with analytical methods and numerical simulations, embedding them in a large class of stochastic strategies. Finally, we show that by replacing the conventions concerning the ‘‘standing’’ by another set (which is even easier to implement, and only depends on an ‘‘internal variable’’) one is led to a PRUDENT PAVLOV strategy which is an ESS and immune against errors both in implementing and in perceiving moves.

That sounds very useful for a population to have.

The problem in general, if you’re fond of strategies that “have short memories” but keep track of similar statistics instead:

page 11, being careful about bias:

In principle, one could apply other rules of ‘‘standing’’. To start with, we should replace this term by a more neutral one, in order not to get trapped by its connotations, and think only of an arbitrary ‘‘tagging’’ of the states without specifying which is ‘‘good’’ or ‘‘bad’’. A strategy is now specified by the probability to cooperate and/​or change the standing in the next round, depending on the current state (including the current standing) of both opponents. It is plausible that we can obtain some evolutionarily stable strategies for many such codes.

12, after pPavlov’s implementation is explained.

It seems highly plausible that there exists a wide variety of workable ‘‘taggings’’ which yield interesting ESS’s. The question is whether an evolution based on mutation and selection would tend to lead to one form of ‘‘tagging’’ rather than another. This could ultimately shed light on why humans developed a sense of fairness, feelings of guilt, and highly effective social norms [see also Sugden (1986) and Young (1993) on the evolution of conventions]. The sheer combinatorial complexity of encompassing all conceivable codes, or taggings, is enormous, and the costs (in fitness) for reckoning with these ‘‘tags’’ seem difficult to evaluate. But it is a tempting problem.
• Remorse cooperates only if it is in bad standing, or if both players cooperated in the previous round. In other words, Remorse is more aggressive; unlike cTFT, it can attack cooperators.
Against the strategy “always cooperate”, cTFT always cooperates but Remorse alternates cooperating and defecting:
C/​C → C/​D → C/​C → C/​D …

Shouldn’t the second one of these be C/​C, since one player is “always cooperate” and the other player cooperates “if both players cooperated in the previous round”?

• Yes. Page 287 of the paper affirms your interpretation: “REMORSE does not exploit suckers, i.e. AllC players, whereas PAVLOV does.”

The OP has a mistake:

Remorse is more aggressive; unlike cTFT, it can attack cooperators

Neither Remorse nor cTFT will attack cooperators.

• I don’t quite understand the conclusion, so this question might be wrong, but—is a line really necessary? Do we need a discrete “acceptable/​unacceptable” judgment assigned to each action, or is it the universal agreement that’s most active in causing the effect you’re talking about?