Alexander Gietelink Oldenziel comments on Alexander Gietelink Oldenziel’s Shortform

Alexander Gietelink Oldenziel 26 Aug 2023 21:52 UTC
3 points
0
Imprecise Information theory
Would like a notion of entropy for credal sets. Diffractor suggests the following:
let $C \subset C r e d a l (Ω)$ be a credal set.
Then the entropy of $C$ is defined as
$HDiffractor(C)=suppH(p)$
where $H (p)$ denotes the usual Shannon entropy.
I don’t like this since it doesn’t satisfy the natural desiderata below.

Instead, I suggest the following. Let $m e_{C} \in C$ denote the (absolute) maximum entropy distribution, i.e. $H (m e_{C}) = m a x_{p \in C} H (p)$ and let $H (C) = H_{n e w} (C) = H (m e_{c})$ .
Desideratum 1: $H ({p}) = H (p)$
Desideratum 2: Let $A \subset Ω$ and consider $C_{A} := C o n v e x H u l l ({δ_{a} | a \in A})$ .
Then $H (A) := H (C_{A}) = log | A |$ .
Remark. Check that these desiderata are compatible where they overlap.
It’s easy to check that the above ‘maxEnt’- suggestion satisfies these desiderata.
Entropy operationally
Entropy is really about stochastic processes more than distributions. Given a distribution $p$ there is an associated stochastic process $X_{n \in N}$ where $X_{i}$ is sampled iid from $p$ . The entropy is really about the expected code length of encoding samples from this process.
In the credal set case there are two processes that can be naturally associated with a credal set $C$ . Basically, do you pick a $p \in C$ at the start and then sample according to $p$ (this is what Diffractors entropy refers to) or do you allow the environment to ‘choose’ each round a different $q \in C$ .
In the latter case, you need to pick an encoding that does least badly.
[give more details. check that this makes sense!]
Properties of credal maxEnt entropy
We may now investigate properties of the entropy measure.
$H (A \lor B) = H (A) + H (B) - H (A \land B)$
$H (A^{c}) = log | A^{c} | = log (| Ω | - | A |)$
remark. This is different from the following measure!
$" H (A | Ω) "= log (Ω / A)$
Remark. If we think of $H (A) = H (P (x \in Ω | A))$ as denoting the amount of bits we receive when we know that $A$ holds and we sample from $Ω$ uniformly then $H (A | Ω) = H (x \in A | x \in Ω)$ denotes the number of bits we receive when find out that $x \in A$ when we knew $x \in Ω$ .
What about
$H (A \land B)$ ?
$H (A \land B) = H (P (x \in A \land B | Ω)) =$ ...?
we want to do an presumption of independence—mobius/ Euler characteristic expansion