johnswentworth comments on $500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

johnswentworth 6 May 2025 21:37 UTC
4 points
0
Here’s a trick which might be helpful for anybody tackling the problem.
First, note that $f (Λ) := (x \mapsto P [X = x | Λ])$ is always a sufficient statistic of $Λ$ for $X$ , i.e.
$Λ \to f (Λ) \to X$
Now, we typically expect that the lower-order bits of $f (Λ)$ are less relevant/useful/interesting. So, we might hope that we can do some precision cutoff on $f (Λ)$ , and end up with an approximate suficient statistic, while potentially reducing the entropy (or some other information content measure) of $f (Λ)$ a bunch. We’d broadcast the cutoff function like this:
$g (Λ) := precison_cutoff (f (Λ)) = (x \mapsto precision_cutoff (P [X = x | Λ]))$
Now we’ll show a trick for deriving $D_{K L}$ bounds involving $g (Λ)$ .
First note that
$E [D_{K L} (P [X | Λ] | | P [X | g (Λ)])] \leq E [D_{K L} (P [X | Λ] | | g (Λ))]$
This is a tricky expression, so let’s talk it through. On the left, $g (Λ)$ is treated informationally; it’s just a generic random variable constructed as a generic function of $Λ$ , and we condition on that random variable in the usual way. On the right, the output-value of $g$ is being used as a distribution over $X$ .
The reason this inequality holds is because a Bayes update is the “best” update one can make, as measured by expected $D_{K L}$ . Specifically, if I’m given the value of any function $g (Λ)$ , then the distribution $Q$ (as a function of $g (Λ))$ which minimizes $E [D_{K L} (P [X | Λ] | | Q)]$ is $P [X | g (Λ)]$ . Since $P [X | g (Λ)]$ minimizes that expected $D_{K L}$ , any other distribution over $X$ (as a function of $g (Λ)$ ) can only do “worse”—including $g (Λ)$ itself, since that’s a distribution over $X$ , and is a function of $g (Λ)$ .
Plugging in the definition of $g$ , that establishes
$E [D_{K L} (P [X | Λ] | | P [X | g (Λ)])] \leq E [D_{K L} (P [X | Λ] | | (x \mapsto precision_cutoff (P [X = x | Λ])))]$
Then the final step is to use the properties of whatever $precision_cutoff$ function one chose, to establish that $E [D_{K L} (P [X | Λ] | | (x \mapsto precision_cutoff (P [X = x | Λ])))]$ can’t be too far from $E [D_{K L} (P [X | Λ] | | P [X | Λ])]$ , i.e. 0. That produces an upper bound on $E [D_{K L} (P [X | Λ] | | P [X | g (Λ)])]$ , where the bound is 0 + (whatever terms came from the precision cutoff).
- johnswentworth 6 May 2025 21:39 UTC
  2 points
  0
  Parent
  @Alfred Harwood @David Johnston
  If anyone else would like to be tagged in comments like this one on this post, please eyeball-react on this comment. Alfred and David, if you would like to not be tagged in the future, please say so.