J Bostock comments on $500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

J Bostock 2 May 2025 21:01 UTC
4 points
0
OK so some further thoughts on this: suppose we instead just partition the values of $Λ$ directly by something like a clustering algorithm, based on $D_{K L}$ in $P [X | Λ]$ space, and take $Δ (Λ)$ just be the cluster that $λ$ is in:
Assuming we can do it with small clusters, we know that $P [X | Λ] \approx P [X | Δ]$ is pretty small, so $D_{K L} (P [X] | | P [X | Δ])$ is also small.
And if we consider $X_{2} \leftarrow X_{1} \to Λ$ , this tells us that learning $X_{1}$ restricts us to a pretty small region of $P [X_{2}]$ space (since $P [X_{2} | X_{1}] \approx P [X_{2} | X_{1}, Λ]$ ) so $Δ$ should be approximately deterministic in $X_{1}$ . This second part is more difficult to formalize, though.
Edit: The real issue is whether or not we could have lots of $Λ$ values which produce the same distribution over $X_{2}$ but different distributions over $X_{1}$ , and all be pretty likely given $X_{1} = x_{1}$ for some $x_{1}$ . I think this just can’t really happen for probable values of $x_{1}$ , because if these values of $λ$ produce the same distribution over $X_{2}$ , but different distributions over $X_{1}$ , then that doesn’t satisfy $X_{1} \leftarrow X_{2} \to Λ$ , and secondly because if they produced wildly different distributions over $X_{1}$ , then that means they can’t all have high values of $P [X_{1} = x_{1} | Λ = λ]$ , and so they’re not gonna have high values of $P [Λ = λ | X_{1} = x_{1}]$ .