A quick intuitive check for whether something is a natural latent over some parts of a system consists of two questions:
Are the parts (approximately) independent given the candidate natural latent?
I first had some trouble checking this condition intuitively. I might still not have got it correctly.
I think one of the main things that got me confused first, is that if I want to reason about natural latents for “a” dog, I need to think about a group of dogs. Even though there are also natural latents for the individual dog (like fur color is a natural latent across the dog’s fur).
Say I check the independence condition for a set of sets of either cats or dogs. So if I look at a single animal’s shoulder height in those sorted cluster, it tells me which of the two clusters it’s in, but once I updated on that information, my guesses for the dog height’s will not be able to improve.
An important example for something that is not a natural latent is the empirical mean in fat tailed distributions for real world sample sizes, while it is in thin-tailed ones. This doesn’t mean that they don’t have natural latents. This fact is what Nassim Taleb is harping on. For Pareto distributions (think: pandemics, earthquakes, wealth), one still has natural latents like the tail index (estimated from plotting the data on a log-log plot by dilettantes like me and more sophisticatedly by real professionals).
I first had some trouble checking this condition intuitively. I might still not have got it correctly. I think one of the main things that got me confused first, is that if I want to reason about natural latents for “a” dog, I need to think about a group of dogs. Even though there are also natural latents for the individual dog (like fur color is a natural latent across the dog’s fur). Say I check the independence condition for a set of sets of either cats or dogs. So if I look at a single animal’s shoulder height in those sorted cluster, it tells me which of the two clusters it’s in, but once I updated on that information, my guesses for the dog height’s will not be able to improve.
An important example for something that is not a natural latent is the empirical mean in fat tailed distributions for real world sample sizes, while it is in thin-tailed ones. This doesn’t mean that they don’t have natural latents. This fact is what Nassim Taleb is harping on. For Pareto distributions (think: pandemics, earthquakes, wealth), one still has natural latents like the tail index (estimated from plotting the data on a log-log plot by dilettantes like me and more sophisticatedly by real professionals).