Let’s say every day at the office, we get three boxes of donuts, numbered 1, 2, and 3. I grab a donut from each box, plunk them down on napkins helpfully labeled X1, X2, and X3. The donuts vary in two aspects: size (big or small) and flavor (vanilla or chocolate). Across all boxes, the ratio of big to small donuts remains consistent. However, Boxes 1 and 2 share the same vanilla-to-chocolate ratio, which is different from that of Box 3.
Does the correlation between X1 and X2 imply that there is no natural latent? Is this the desired behavior of natural latents, despite the presence of the common size ratio? (and the commonality that I’ve only ever pulled out donuts; there has never been a tennis ball in any of the boxes!)
If so, why is this what we want from natural latents? If not, how does a natural latent arise despite the internal correlation?
This branch of research is aimed at finding a (nearly) objective way of thinking about the universe. When I imagine the end result, I imagine something that receives a distribution across a bunch of data, and finds a bunch of useful patterns within it. At the moment that looks like finding patterns in data via
find_natural_latent(get_chunks_of_data(data_distribution))
or perhaps showing that
find_top_n(n, (chunks, natural_latent(chunks)) for chunks in
all_chunked_subsets_of_data(data_distribution),
key=lambda chunks, latent: usefulness_metric(latent))
is a (convergent sub)goal of agents. As such, the notion that the donuts’ data is simply poorly chunked—which needs to be solved anyway—makes a lot of sense to me.
I don’t know how to think about the possibilities when it comes to decomposing Xi. Why would it always be possible to decompose random variables to allow for a natural latent? Do you have an easy example of this?
Also, what do you mean by mutual information between Xi, given that there are at least 3 of them? And why would just extracting said mutual information be useless?
If you get the chance to point me towards good resources about any of these questions, that would be great.