Probability Space & Aumann Agreement

The first part of this post de­scribes a way of in­ter­pret­ing the ba­sic math­e­mat­ics of Bayesi­anism. Eliezer already pre­sented one such view at http://​​less­wrong.com/​​lw/​​hk/​​pri­ors_as_math­e­mat­i­cal_ob­jects/​​, but I want to pre­sent an­other one that has been use­ful to me, and also show how this view is re­lated to the stan­dard for­mal­ism of prob­a­bil­ity the­ory and Bayesian up­dat­ing, namely the prob­a­bil­ity space.

The sec­ond part of this post will build upon the first, and try to ex­plain the math be­hind Au­mann’s agree­ment the­o­rem. Hal Fin­ney had sug­gested this ear­lier, and I’m tak­ing on the task now be­cause I re­cently went through the ex­er­cise of learn­ing it, and could use a check of my un­der­stand­ing. The last part will give some of my cur­rent thoughts on Au­mann agree­ment.

Prob­a­bil­ity Space

In http://​​en.wikipe­dia.org/​​wiki/​​Prob­a­bil­ity_space, you can see that a prob­a­bil­ity space con­sists of a triple:

  • Ω – a non-empty set – usu­ally called sam­ple space, or set of states

  • F – a set of sub­sets of Ω – usu­ally called sigma-alge­bra, or set of events

  • P – a func­tion from F to [0,1] – usu­ally called prob­a­bil­ity measure

F and P are re­quired to have cer­tain ad­di­tional prop­er­ties, but I’ll ig­nore them for now. To start with, we’ll in­ter­pret Ω as a set of pos­si­ble world-his­to­ries. (To elimi­nate an­thropic rea­son­ing is­sues, let’s as­sume that each pos­si­ble world-his­tory con­tains the same num­ber of ob­servers, who have perfect mem­ory, and are la­beled with unique se­rial num­bers.) Each “event” A in F is for­mally a sub­set of Ω, and in­ter­preted as ei­ther an ac­tual event that oc­curs in ev­ery world-his­tory in A, or a hy­poth­e­sis which is true in the world-his­to­ries in A. (The de­tails of the events or hy­pothe­ses them­selves are ab­stracted away here.)

To un­der­stand the prob­a­bil­ity mea­sure P, it’s eas­ier to first in­tro­duce the prob­a­bil­ity mass func­tion p, which as­signs a prob­a­bil­ity to each el­e­ment of Ω, with the prob­a­bil­ities sum­ming to 1. Then P(A) is just the sum of the prob­a­bil­ities of the el­e­ments in A. (For sim­plic­ity, I’m as­sum­ing the dis­crete case, where Ω is at most countable.) In other words, the prob­a­bil­ity of an ob­ser­va­tion is the sum of the prob­a­bil­ities of the world-his­to­ries that it doesn’t rule out.

A pay­off of this view of the prob­a­bil­ity space is a sim­ple un­der­stand­ing of what Bayesian up­dat­ing is. Once an ob­server sees an event D, he can rule out all pos­si­ble world-his­to­ries that are not in D. So, he can get a pos­te­rior prob­a­bil­ity mea­sure by set­ting the prob­a­bil­ity masses of all world-his­to­ries not in D to 0, and renor­mal­iz­ing the ones in D so that they sum up to 1 while keep­ing the same rel­a­tive ra­tios. You can eas­ily ver­ify that this is equiv­a­lent to Bayes’ rule: P(H|D) = P(D H)/​P(D).

To sum up, the math­e­mat­i­cal ob­jects be­hind Bayesi­anism can be seen as

  • Ω – a set of pos­si­ble world-histories

  • F – in­for­ma­tion about which events oc­cur in which pos­si­ble world-histories

  • P – a set of weights on the world-his­to­ries that sum up to 1

Au­mann’s Agree­ment Theorem

Au­mann’s agree­ment the­o­rem says that if two Bayesi­ans share the same prob­a­bil­ity space but pos­si­bly differ­ent in­for­ma­tion par­ti­tions, and have com­mon knowl­edge of their in­for­ma­tion par­ti­tions and pos­te­rior prob­a­bil­ities of some event A, then their pos­te­rior prob­a­bil­ities of that event must be equal. So what are in­for­ma­tion par­ti­tions, and what does “com­mon knowl­edge” mean?

The in­for­ma­tion par­ti­tion I of an ob­server-mo­ment M di­vides Ω into a num­ber of sub­sets that are non-over­lap­ping, and to­gether cover all of Ω. Two pos­si­ble world-his­to­ries w1 and w2 are placed into the same sub­set if the ob­server-mo­ments in w1 and w2 have the ex­act same in­for­ma­tion. In other words, if w1 and w2 are in the same el­e­ment of I, and w1 is the ac­tual world-his­tory, then M can’t rule out ei­ther w1 or w2. I(w) is used to de­note the el­e­ment of I that con­tains w.

Com­mon knowl­edge is defined as fol­lows: If w is the ac­tual world-his­tory and two agents have in­for­ma­tion par­ti­tions I and J, an event E is com­mon knowl­edge if E in­cludes the mem­ber of the meet I∧J that con­tains w. The op­er­a­tion ∧ (meet) means to take the two sets I and J, form their union, then re­peat­edly merge any of its el­e­ments (which you re­call are sub­sets of Ω) that over­lap un­til it be­comes a par­ti­tion again (i.e., no two el­e­ments over­lap).

It may not be clear at first what this meet op­er­a­tion has to do with com­mon knowl­edge. Sup­pose the ac­tual world-his­tory is w. Then agent 1 knows I(w), so he knows that agent 2 must know one of the el­e­ments of J that over­laps with I(w). And he can rea­son that agent 2 must know that agent 1 knows one of the el­e­ments of I that over­laps with one of these el­e­ments of J. If he car­ries out this in­fer­ence to in­finity, he’ll find that both agents know that the ac­tual world-his­tory is in (I∧J)(w), and both know the other know, and both know the other know the other know, and so on. In other words it is com­mon knowl­edge that the ac­tual world-his­tory is in (I∧J)(w). Since event E oc­curs in ev­ery world-his­tory in (I∧J)(w), it’s com­mon knowl­edge that E oc­curs in the ac­tual world-his­tory.

Proof for the agree­ment the­o­rem then goes like this. Let E be the event that agent 1 as­signs a pos­te­rior prob­a­bil­ity (con­di­tioned on ev­ery­thing it knows) of q1 to event A and agent 2 as­signs a pos­te­rior prob­a­bil­ity of q2 to event A. If E is com­mon knowl­edge at w, then both agents know that P(A | I(v)) = q1 and P(A | J(v)) = q2 for ev­ery v in (I∧J)(w). But this im­plies P(A | (I∧J)(w)) = q1 and P(A | (I∧J)(w)) = q2 and there­fore q1 = q2. (To see this, sup­pose you cur­rently know only (I∧J)(w), and you know that no mat­ter what ad­di­tional in­for­ma­tion I(v) you ob­tain, your pos­te­rior prob­a­bil­ity will be the same q1, then your cur­rent prob­a­bil­ity must already be q1.)

Is Au­mann Agree­ment Over­rated?

Hav­ing ex­plained all of that, it seems to me that this the­o­rem is less rele­vant to a prac­ti­cal ra­tio­nal­ist than I thought be­fore I re­ally un­der­stood it. After look­ing at the math, it’s ap­par­ent that “com­mon knowl­edge” is a much stric­ter re­quire­ment than it sounds. The most ob­vi­ous way to achieve it is for the two agents to sim­ply tell each other I(w) and J(w), af­ter which they share a new, com­mon in­for­ma­tion par­ti­tion. But in that case, agree­ment it­self is ob­vi­ous and there is no need to learn or un­der­stand Au­mann’s the­o­rem.

There are some pa­pers that de­scribe ways to achieve agree­ment in other ways, such as iter­a­tive ex­change of pos­te­rior prob­a­bil­ities. But in such meth­ods, the agents aren’t just mov­ing closer to each other’s be­liefs. Rather, they go through con­voluted chains of de­duc­tion to in­fer what in­for­ma­tion the other agent must have ob­served, given his dec­la­ra­tions, and then up­date on that new in­for­ma­tion. (The pro­cess is similar to the one needed to solve the sec­ond rid­dle on this page.) The two agents es­sen­tially still have to com­mu­ni­cate I(w) and J(w) to each other, ex­cept they do so by ex­chang­ing pos­te­rior prob­a­bil­ities and mak­ing log­i­cal in­fer­ences from them.

Is this re­al­is­tic for hu­man ra­tio­nal­ist wannabes? It seems wildly im­plau­si­ble to me that two hu­mans can com­mu­ni­cate all of the in­for­ma­tion they have that is rele­vant to the truth of some state­ment just by re­peat­edly ex­chang­ing de­grees of be­lief about it, ex­cept in very sim­ple situ­a­tions. You need to know the other agent’s in­for­ma­tion par­ti­tion ex­actly in or­der to nar­row down which el­e­ment of the in­for­ma­tion par­ti­tion he is in from his prob­a­bil­ity dec­la­ra­tion, and he needs to know that you know so that he can de­duce what in­fer­ence you’re mak­ing, in or­der to con­tinue to the next step, and so on. One er­ror in this pro­cess and the whole thing falls apart. It seems much eas­ier to just tell each other what in­for­ma­tion the two of you have di­rectly.

Fi­nally, I now see that un­til the ex­change of in­for­ma­tion com­pletes and com­mon knowl­edge/​agree­ment is ac­tu­ally achieved, it’s ra­tio­nal for even hon­est truth-seek­ers who share com­mon pri­ors to dis­agree. There­fore, two such ra­tio­nal­ists may per­sis­tently dis­agree just be­cause the amount of in­for­ma­tion they would have to ex­change in or­der to reach agree­ment is too great to be prac­ti­cal. This is quite differ­ent from the un­der­stand­ing of Au­mann agree­ment I had be­fore I read the math.