Here’s a trick which might be helpful for anybody tackling the problem.
First, note that f(Λ):=(x↦P[X=x|Λ]) is always a sufficient statistic of Λ for X, i.e.
Λ→f(Λ)→X
Now, we typically expect that the lower-order bits of f(Λ) are less relevant/useful/interesting. So, we might hope that we can do some precision cutoff on f(Λ), and end up with an approximate suficient statistic, while potentially reducing the entropy (or some other information content measure) of f(Λ) a bunch. We’d broadcast the cutoff function like this:
Now we’ll show a trick for deriving DKL bounds involving g(Λ).
First note that
E[DKL(P[X|Λ]||P[X|g(Λ)])]≤E[DKL(P[X|Λ]||g(Λ))]
This is a tricky expression, so let’s talk it through. On the left, g(Λ) is treated informationally; it’s just a generic random variable constructed as a generic function of Λ, and we condition on that random variable in the usual way. On the right, the output-value of g is being used as a distribution over X.
The reason this inequality holds is because a Bayes update is the “best” update one can make, as measured by expected DKL. Specifically, if I’m given the value of any function g(Λ), then the distribution Q (as a function of g(Λ)) which minimizes E[DKL(P[X|Λ]||Q)] is P[X|g(Λ)]. Since P[X|g(Λ)] minimizes that expected DKL, any other distribution over X (as a function of g(Λ)) can only do “worse”—including g(Λ) itself, since that’s a distribution over X, and is a function of g(Λ).
Then the final step is to use the properties of whatever precision_cutoff function one chose, to establish that E[DKL(P[X|Λ]||(x↦precision_cutoff(P[X=x|Λ])))] can’t be too far from E[DKL(P[X|Λ]||P[X|Λ])], i.e. 0. That produces an upper bound on E[DKL(P[X|Λ]||P[X|g(Λ)])], where the bound is 0 + (whatever terms came from the precision cutoff).
If anyone else would like to be tagged in comments like this one on this post, please eyeball-react on this comment. Alfred and David, if you would like to not be tagged in the future, please say so.
Here’s a trick which might be helpful for anybody tackling the problem.
First, note that f(Λ):=(x↦P[X=x|Λ]) is always a sufficient statistic of Λ for X, i.e.
Λ→f(Λ)→X
Now, we typically expect that the lower-order bits of f(Λ) are less relevant/useful/interesting. So, we might hope that we can do some precision cutoff on f(Λ), and end up with an approximate suficient statistic, while potentially reducing the entropy (or some other information content measure) of f(Λ) a bunch. We’d broadcast the cutoff function like this:
g(Λ):=precison_cutoff(f(Λ))=(x↦precision_cutoff(P[X=x|Λ]))
Now we’ll show a trick for deriving DKL bounds involving g(Λ).
First note that
E[DKL(P[X|Λ]||P[X|g(Λ)])]≤E[DKL(P[X|Λ]||g(Λ))]
This is a tricky expression, so let’s talk it through. On the left, g(Λ) is treated informationally; it’s just a generic random variable constructed as a generic function of Λ, and we condition on that random variable in the usual way. On the right, the output-value of g is being used as a distribution over X.
The reason this inequality holds is because a Bayes update is the “best” update one can make, as measured by expected DKL. Specifically, if I’m given the value of any function g(Λ), then the distribution Q (as a function of g(Λ)) which minimizes E[DKL(P[X|Λ]||Q)] is P[X|g(Λ)]. Since P[X|g(Λ)] minimizes that expected DKL, any other distribution over X (as a function of g(Λ)) can only do “worse”—including g(Λ) itself, since that’s a distribution over X, and is a function of g(Λ).
Plugging in the definition of g, that establishes
E[DKL(P[X|Λ]||P[X|g(Λ)])]≤E[DKL(P[X|Λ]||(x↦precision_cutoff(P[X=x|Λ])))]
Then the final step is to use the properties of whatever precision_cutoff function one chose, to establish that E[DKL(P[X|Λ]||(x↦precision_cutoff(P[X=x|Λ])))] can’t be too far from E[DKL(P[X|Λ]||P[X|Λ])], i.e. 0. That produces an upper bound on E[DKL(P[X|Λ]||P[X|g(Λ)])], where the bound is 0 + (whatever terms came from the precision cutoff).
@Alfred Harwood @David Johnston
If anyone else would like to be tagged in comments like this one on this post, please eyeball-react on this comment. Alfred and David, if you would like to not be tagged in the future, please say so.