Alex Gibson comments on $500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

Alex Gibson 23 Jul 2025 18:02 UTC
9 points
2
I’m confused by this. The KL term we are looking at in the deterministic case is
$D_{KL} (P [X, Λ] | | P [Λ] P [X_{1} | Λ] P [X_{2} | Λ])$ , right?
For simplicity, we imagine we have finite discrete spaces. Then this would blow up if $P [X = (x_{1}, x_{2}), Λ = λ] \neq 0$ , and $P [Λ = λ] P [X_{1} = x_{1} | Λ = λ] P [X_{2} = x_{2} | Λ = λ] = 0$ . But this is impossible, because any of the terms in the product being 0 imply that $P [X = (x_{1}, x_{2}), Λ = λ]$ is $0$ .
Intuitively, we construct an optimal code for encoding the distribution $P [Λ] P [X_{1} | Λ] P [X_{2} | Λ]$ , and the KL divergence measures how many more bits on average we need to encode a message than optimal, if the true distribution is given by $P [X, Λ]$ . Issues occur when but the true distribution $P [X, Λ]$ takes on values which never occur according to $P [Λ] P [X_{1} | Λ] P [X_{2} | Λ]$ , i.e: the optimal code doesn’t account for those values potentially occurring.
Potentially there are subtleties when we have continuous spaces. In any case I’d be grateful if you’re able to elaborate.
- johnswentworth 23 Jul 2025 19:13 UTC
  9 points
  2
  Parent
  Yeah, I’ve since updated that deterministic functions are probably the right thing here after all, and I was indeed wrong in exactly the way you’re pointing out.