Would like a notion of entropy for credal sets. Diffractor suggests the following:
let C⊂Credal(Ω) be a credal set.
Then the entropy of C is defined as
HDiffractor(C)=suppH(p)
where H(p) denotes the usual Shannon entropy.
I don’t like this since it doesn’t satisfy the natural desiderata below.
Instead, I suggest the following. Let meC∈C denote the (absolute) maximum entropy distribution, i.e.H(meC)=maxp∈CH(p) and let H(C)=Hnew(C)=H(mec).
Desideratum 1: H({p})=H(p)
Desideratum 2: Let A⊂Ω and consider CA:=ConvexHull({δa|a∈A}).
Then H(A):=H(CA)=log|A|.
Remark. Check that these desiderata are compatible where they overlap.
It’s easy to check that the above ‘maxEnt’- suggestion satisfies these desiderata.
Entropy operationally
Entropy is really about stochastic processes more than distributions. Given a distribution p there is an associated stochastic process Xn∈N where Xi is sampled iid from p. The entropy is really about the expected code length of encoding samples from this process.
In the credal set case there are two processes that can be naturally associated with a credal set C . Basically, do you pick a p∈C at the start and then sample according to p (this is what Diffractors entropy refers to) or do you allow the environment to ‘choose’ each round a different q∈C.
In the latter case, you need to pick an encoding that does least badly.
[give more details. check that this makes sense!]
Properties of credal maxEnt entropy
We may now investigate properties of the entropy measure.
H(A∨B)=H(A)+H(B)−H(A∧B)
H(Ac)=log|Ac|=log(|Ω|−|A|)
remark. This is different from the following measure!
"H(A|Ω)"=log(Ω/A)
Remark. If we think of H(A)=H(P(x∈Ω|A)) as denoting the amount of bits we receive when we know that A holds and we sample from Ω uniformly then H(A|Ω)=H(x∈A|x∈Ω) denotes the number of bits we receive when find out that x∈A when we knew x∈Ω.
What about
H(A∧B)?
H(A∧B)=H(P(x∈A∧B|Ω))=...?
we want to do an presumption of independence—mobius/ Euler characteristic expansion
Imprecise Information theory
Would like a notion of entropy for credal sets. Diffractor suggests the following:
let C⊂Credal(Ω) be a credal set.
Then the entropy of C is defined as
HDiffractor(C)=suppH(p)
where H(p) denotes the usual Shannon entropy.
I don’t like this since it doesn’t satisfy the natural desiderata below.
Instead, I suggest the following. Let meC∈C denote the (absolute) maximum entropy distribution, i.e.H(meC)=maxp∈CH(p) and let H(C)=Hnew(C)=H(mec).
Desideratum 1: H({p})=H(p)
Desideratum 2: Let A⊂Ω and consider CA:=ConvexHull({δa|a∈A}).
Then H(A):=H(CA)=log|A|.
Remark. Check that these desiderata are compatible where they overlap.
It’s easy to check that the above ‘maxEnt’- suggestion satisfies these desiderata.
Entropy operationally
Entropy is really about stochastic processes more than distributions. Given a distribution p there is an associated stochastic process Xn∈N where Xi is sampled iid from p. The entropy is really about the expected code length of encoding samples from this process.
In the credal set case there are two processes that can be naturally associated with a credal set C . Basically, do you pick a p∈C at the start and then sample according to p (this is what Diffractors entropy refers to) or do you allow the environment to ‘choose’ each round a different q∈C.
In the latter case, you need to pick an encoding that does least badly.
[give more details. check that this makes sense!]
Properties of credal maxEnt entropy
We may now investigate properties of the entropy measure.
H(A∨B)=H(A)+H(B)−H(A∧B)
H(Ac)=log|Ac|=log(|Ω|−|A|)
remark. This is different from the following measure!
"H(A|Ω)"=log(Ω/A)
Remark. If we think of H(A)=H(P(x∈Ω|A)) as denoting the amount of bits we receive when we know that A holds and we sample from Ω uniformly then H(A|Ω)=H(x∈A|x∈Ω) denotes the number of bits we receive when find out that x∈A when we knew x∈Ω.
What about
H(A∧B)?
H(A∧B)=H(P(x∈A∧B|Ω))=...?
we want to do an presumption of independence—mobius/ Euler characteristic expansion