Aumann’s Agreement Revisited

This post is about reviewing some of the slightly counter-intuitive conclusions that Baysean probability theory has on agreement between people’s probabilities about the same belief. You have probably heard something about Aumann’s agreement theorem saying that “with common priors, people’s estimates should converge upon gaining common knowledge about their estimates.” However, it’s worth noting that this holds only for case of common priors, new data in the form of estimates and when people have the same models of one another’s rationality. In case the priors are not common, arbitrary new data could create a further divergence in likelihoods. In case people have the same priors on propositions, but different priors on each other’s rationality, Aumann’s agreement doesn’t apply after sharing likelihoods. The theorem is still a theorem and obviously still holds given its assumptions, however the real question is how realistic those assumptions are. I would generally argue that a model that considers some partial mistrust of each other’s rationality is more realistic and could help explain why we don’t see Aumann’s agreement in practice.

Sample misuse of the theorem:

I will first go through simple cases building up examples of agreement / disagreement. If you are interested in just the “agreement with mistrust of rationality” result, skip to the bottom.

The first example is about people with different priors reacting to new evidence. Taken from Probability the Logic of Science:

The new information D is: ‘Mr N has gone on TV with a sensational claim that a commonly used drug is unsafe’, and three viewers, Mr A, MrB, and Mr C, see this. Their prior probabilities P(S|I) that the drug is safe are (0.9, 0.1, 0.9), respectively; i.e. initially, Mr A and Mr C were believers in the safety of the drug, Mr B a disbeliever. But they interpret the information D very differently, because they have different views about the reliability of Mr N. They all agree that, if the drug had really been proved unsafe, Mr N would be right there shouting it: that is, their probabilities P(D|SI) are (1, 1, 1); but Mr A trusts his honesty while Mr C does not. Their probabilities P(D|SI) that, if the drug is safe, Mr N would say that it is unsafe, are (0.01, 0.3, 0.99), respectively. Applying Bayes’theorem P(S|DI)= P(S|I) P(D|SI)/P(D|I),and expanding the denominator by the product and sum rules, P(D|I) = P(S|I) P(D|SI)+ P(S|I) P(D|SI), we ﬁnd their posterior probabilities that the drug is safe to be (0.083,0.032,0.899),respectively.

A: ‘Mr N is a ﬁne fellow, doing a notable public service. I had thought the drug to be safe from other evidence, but he would not knowingly misrepresent the facts; therefore hearing his report leads me to change my mind and think that the drug is unsafe after all. My belief in safety is lowered by 20.0 db, so I will not buy any more.’

B: ‘Mr N is an erratic fellow, inclined to accept adverse evidence too quickly. I was already convinced that the drug is unsafe; but even if it is safe he might be carried away into saying otherwise. So, hearing his claim does strengthen my opinion, but only by 5.3 db. I would never under any circumstances use the drug.’

C ‘Mr N is an unscrupulous rascal, who does everything in his power to stirup trouble by sensational publicity. The drug is probably safe, but he would almost certainly claim it is unsafe whatever the facts. So, hearing his claim has practically no effect (only 0.005db) on myconﬁdence that the drug is safe. I will continue to buy it and use it.’

The opinions of Mr A and Mr B converge – become closer to each other because both are willing to trust Mr N’s veracity to some extent. But Mr A and Mr C diverge because their prior probabilities of deception are entirely different.

Note that while both A and C both update downwards on the drug reliability, their opinions diverge in the terms of the log ratio of probabilities increasing rather than decreasing. This is a good counter-example to twisting the agreement theorem from to create some sort of assumption of convergence on arbitrary new data. Yes, that means that even two rational people’s beliefs in a proposition could diverge after learning something.

A second example illustrates a similar point with regards to considering multiple hypothesis. Here we are considering an experiment which tries to prove that a person has ESP – Extra Sensory Perception.

In the reported experiment, from the experimental design the probability for guessing a card correctly should have been p = 0.2, independently in each trial. Let Hp be the ‘null hypothesis’ which states this and supposes that only ‘pure chance’ is operating (whatever that means). According to the binomial distribution, Hp predicts that if a subject has no ESP, the number r of successful guesses in n trials should be about (mean ± standard deviation)

For n =37100 trials, this is 7420±77.

But, according to the report, Mrs. Gloria Stewart guessed correctly r = 9410 times in 37100 trials, for a fractional success rate of f = 0.2536. These numbers constitute our data D. At ﬁrst glance, they may not look very sensational; note, however, that her score was (9410−7420) /77 = 25.8 standard deviations away from the chance expectation.

This basically gives an estimated P-value of 3.15×10^−139

Forget the debate of whether to lower the P-value! Very few experiments have that kind of p-value, so any reasonable threshold would make this pass. Forget the debate of whether this is enough data to update the prior. You can be an ESP skeptic and there is no reason your prior should be lower than 3.15×10^−139!

So, I am not asking you to believe in ESP, rather the question is how to update probabilities in this case? The simple answer is that instead of considering the two hypotheses:

H_E = ESP is true,

H_null = this result happens by chance,

you need to consider the three hypotheses:

H_E = ESP is true,

H_null = this result happens by chance

H_fake = the experiment had an error in its setup.

Based on the data, we can rule out the H_null convincingly and our resulting ratios of how likely ESP is true depends on the ratio of priors of H_E and H_fake. So, for example, if we have person A, who is a small amount of skeptic in both science and ESP with H_E_A = 10^−4% and H_fake_A = 10% and person B who is a bit more of believer in both with H_E_B = 2% and H_fake_B = 0.5%, then their resulting likelihoods will be around:

P(H_E_A | D ) = 10^−3%

P(H_fake_A | D ) = 100 − 10^−3%

P(H_E_B | D ) = 80%

P(H_fake_B | D ) = 20%

So, B is convinced by the evidence in ESP because the data pulled his prior in ESP more than the prior in the experiment being fake or false. A is now convinced with ~ 100% that the experiment was false.

I have heard a strange objection to this experiment and that being that experiments with that kind of p-value are somehow less likely to be true because the p-value is unusual. While people don’t generally publish these kinds of the results due to the fear of being unusual, there is nothing inherently wrong with a discovering a low p-val on a large N experiment trials. There are plenty of real processes that could give this low number.

What does this mean? In practice this means that if the theoretical agent is skeptical of your rationality with P (your irrationality) = p, that means it’s hard for you to push them to update a lot on a hypothesis which they have a prior of < p.

The last example is from the original Aumann paper and illustrated how agreement should work.

As an illustration, suppose 1 and 2 have a uniform prior over on the parameter – chance that a coin comes up heads. This is the probability of H/T is a random number between 0 and 1. Let A be an event that a coin comes up heads the next toss. It’s a 50% prior. Person 1 and 2 each observe a single coin toss privately. It’s H and T respectfully. Their new likelihoods for the coin parameter are ²⁄₃ and ¹⁄₃. However, after learning each other’s likelihoods, they immediately deduce what the other’s coin toss was and update it back to 50%. This is an example of convergence with a single sharing of updates.

However, if there is some other unknown number of tosses that have caused 1 and 2 to arrive at their prior, then the update takes two steps – the first update reveals the private amount of tosses that 1 and 2 have observed privates and the second update combines them into a single estimate. Note, I haven’t checked the math myself, but I am assuming the original paper is going to be correct.

So, if followed the 3 examples carefully, then you might say that they seem to show two very different and somewhat unrelated findings:

1. With uncommon priors, new arbitrary information could create a divergence in likelihoods

2. With common priors, sharing knowledge about subjective likelihoods should eventually result in convergence.

This is true in theory in that the two theories don’t contradict each other, however in practice, the application is somewhat complex. The main issue with Aumann’s theorem is the assumption of “common priors”. This has been discussed at length, with various solutions, such as punting the issue to deeper meta-priors. see citations here

I want to consider another example of when “uncommon priors” prevent agreement – when people have a different model of each other’s rationality. In a lot of places that mention the agreement theorem, there is the assumption that two people assume each other’s rationality with 100% probability.

The moment this assumption is relaxed a bit, to where two people assume each other’s rationality on a given subject with 80% probability, while assuming their own rationality with 100% probability, then learning likelihoods can act as evidence towards the other person’s irrationality on this subject as well as towards convergence of beliefs.

Here is a sample setup. Person A and person B are trying to consider the probability of a coin toss. Each one observes the coin toss through a private oracle which has 99% accuracy. In addition, each one considers themselves rational, but does not trust the other person completely—they consider each other irrational with 80% chance.

In case of irrationality of B, A believes that the statement B makes is always 99% tails and can be ignored. Consider the case when the coin came up heads, but oracle has told A it’s heads and told B it’s tails (it’s the 1% chance of the oracle being wrong). This happens 99% 1% =0.99% of the time.

Note that the perspectives of A and B are symmetrical with respect to each other’s probability. Each one will report that they have 99% of heads and tails respectfully. What happens in the update after they share information?

From A’s perspective before the sharing, this can be divided into 4 categories. Here B_rat is the proposition that b is rational, H is the probability of heads, E is the evidence in presented in the situation.

P (B_rat) = 80%

P (H) = 99%

P (B_rat & H & E) = 80% * 99% * 1% = 0.792%

P(B_rat & !H & E) = 80% * 1% * 99% = 0.792%

P(!B_rat & H & E) = 20% * 99% =19.8%

P(!B_rat & !H & E) = 20% * 1% = 0.2%

Thus we get:

P(E) = 21.584%

P(! B_rat | E) = P(!B_rat & E) | P(E) = ²⁰⁄₂₁.584 ~ 92.6%

P (!H |E ) = P(!H & E) | P(E) = (0.792 + 0.2 ) / 21.584 ~ 4.6%

So, in other words after hearing each other’s estimates, both will update their hypothesis that they are wrong from 1% to 4.6%. However, they will update the hypothesis that the other person is irrational from 20% to 92.6%. What happens after this update? In the least convenient possible world each person assumes at the other would act in the exact same way in the case of rationality and irrationality and thus no further updates are possible. ,

What does this mean? This suggests an alternative explanation for the notion of “agreeing to disagree.” Instead of viewing this as a series of contradictory statement about two people trusting each other’s rationality perfectly and having different credences, it could instead be interpreted as two people trusting each other’s rationality generally, but not on a particular subject.

The conclusion is that idea of “rational people can’t agree to disagree” is too strong and the normal human ability to agree to disagree does indeed have a theoretical basis. The practical advice is generally in line with spending your weirdness points wisely—to make sure to not and try to convince people of statements of too low probability if their original trust in you is too low.

Wanting people to take each other’s beliefs as “some evidence” is an important and positive desire. I have seen many instances where discussion would benefit from taking someone’s belief into account.

However, the expectation that everybody “must” agree with each other about everything after sharing a small amount of beliefs is not realistic. The broader philosophical meta-point is to be careful when trying to use math to “prove” that a common-sense thing cannot or should not exist.