I don’t understand your objection. What do you mean here?
“If you look at some time, you cannot just use [the xorsum] as a latent for the distribution at that time, because you fail both mediation and redundancy.”
Which distribution? The distribution of is uniform. You do not need any information to perfectly recover this distribution at a particular timestep. Do you mean the joint distribution ? If is very large, then
where the distribution is uniform for the sets of bitstrings that share parity (xorsum) with , and zero for the other half. So parity here seems to be doing exactly the same thing that made energy a natural latent.
Parity is not a latent, because the information is not redundantly expressed/not insensitive.
Furthermore, after conditioning on the parity there will now be some mutual information between one of the bits and all the others, in that it’s the max value of 1 bit (since knowing all the others and the parity allows you to figure out the last). Thus we have the worst KL-error for mediation.
In both of these, I’m talking about the distribution of the bitstring at a certain time.
I’m not sure what your limit is supposed to mean—Y here is not always taking the bits at the same timestep, right? But then, why divide by n?
Why divide by ? This is the relative entropy rate. And actually, I was wrong, it does not go to zero. I’m still not sure I understand your objection though. It feels like it applies equally well to why energy should not be a natural latent?
Consider a set of gas particles trapped in a box with given positions and velocities. There is a little bit of randomness due to the box vibrating a due to its temperature. You can mostly predict where the gas particles will be from one time step to another. The energy though, does not help with this prediction. It completely fails at mediating between time steps or reducing the complexity of the particle positions/velocities. Even if you know a couple of the particle positions/velocities, the energy isn’t going to help you find the rest.
The mediation condition is that when you condition on the latent, the mutual information between any one variable and the joint distribution of all other variables is low. In the case of the energy and temperature, once you know the energy and temperature, all the variables are now independent, and so you get no mutual information. However, with the parity, the rest of the variables let you figure out the last, so we fail mediation.
For redundancy, the energy and temperature is for the most part determined by any (n-1) variable subset, becaues averages. This isn’t true of the parity—the last bit being 50⁄50 means you still have total uncertainty over the parity.
You clearly have some idea of what “mediation” and “redundancy” means for these particular scenarios and why they matter. I still have no clue what you mean by those words, why I should care about these properties, or how they related to the notion of insensitivity.
Ah, I was talking about the conditions for natural latents, the main research program of the post author. See this post for a good math intro containing those definitions.
I now have the definitions, but I still don’t see the relation to insensitivity. Yes, natural latents are natural ontologies, but natural ontologies are not necessarily natural latents.
At the very least, the stochastic redund condition feels like a pretty minimal version of what ‘insensitivity’ could mean. The parity is still pretty maximally insensitive—if you’re trying to reduce your uncertainty about what the parity is, learning about (n-1) bits doesn’t even help you until you learn the last one! I doubt a good definition of “insensitivity” would call the parity insensitive.
What do you mean by “the stochastic redund condition”? Here’s what I feel like you’re doing: you have some unformalized intuitions. It seems to be the case that ‘insensitive’ stuff matches your intuition about redundancy for uncontrived examples. You then went and contrived an example where it didn’t match your intuition.
If I were in your situation, I would conclude, “my intuition is missing something, let me try to formalize this and see where I went wrong.”
I’m still really confused by your opening salvo:
“No, the reason why we should have insensitivity is not quite that.”
What do you mean??? What is “that”, what is “the reason why we should have insensitivity”? I think the reason we should have insensitivity is so the oracle can make predictions.
Also, I’m not going to continue responding. I do not think you have anything here. I think you are just confused, and you have not done the work to figure out what you yourself mean.
I don’t understand your objection. What do you mean here?
“If you look at some time, you cannot just use [the xorsum] as a latent for the distribution at that time, because you fail both mediation and redundancy.”
Which distribution? The distribution of is uniform. You do not need any information to perfectly recover this distribution at a particular timestep. Do you mean the joint distribution ? If is very large, then
where the distribution is uniform for the sets of bitstrings that share parity (xorsum) with , and zero for the other half. So parity here seems to be doing exactly the same thing that made energy a natural latent.
Parity is not a latent, because the information is not redundantly expressed/not insensitive.
Furthermore, after conditioning on the parity there will now be some mutual information between one of the bits and all the others, in that it’s the max value of 1 bit (since knowing all the others and the parity allows you to figure out the last). Thus we have the worst KL-error for mediation.
In both of these, I’m talking about the distribution of the bitstring at a certain time.
I’m not sure what your limit is supposed to mean—Y here is not always taking the bits at the same timestep, right? But then, why divide by n?
Why divide by ? This is the relative entropy rate. And actually, I was wrong, it does not go to zero. I’m still not sure I understand your objection though. It feels like it applies equally well to why energy should not be a natural latent?
The mediation condition is that when you condition on the latent, the mutual information between any one variable and the joint distribution of all other variables is low. In the case of the energy and temperature, once you know the energy and temperature, all the variables are now independent, and so you get no mutual information. However, with the parity, the rest of the variables let you figure out the last, so we fail mediation.
For redundancy, the energy and temperature is for the most part determined by any (n-1) variable subset, becaues averages. This isn’t true of the parity—the last bit being 50⁄50 means you still have total uncertainty over the parity.
You clearly have some idea of what “mediation” and “redundancy” means for these particular scenarios and why they matter. I still have no clue what you mean by those words, why I should care about these properties, or how they related to the notion of insensitivity.
Ah, I was talking about the conditions for natural latents, the main research program of the post author. See this post for a good math intro containing those definitions.
I now have the definitions, but I still don’t see the relation to insensitivity. Yes, natural latents are natural ontologies, but natural ontologies are not necessarily natural latents.
At the very least, the stochastic redund condition feels like a pretty minimal version of what ‘insensitivity’ could mean. The parity is still pretty maximally insensitive—if you’re trying to reduce your uncertainty about what the parity is, learning about (n-1) bits doesn’t even help you until you learn the last one! I doubt a good definition of “insensitivity” would call the parity insensitive.
What do you mean by “the stochastic redund condition”? Here’s what I feel like you’re doing: you have some unformalized intuitions. It seems to be the case that ‘insensitive’ stuff matches your intuition about redundancy for uncontrived examples. You then went and contrived an example where it didn’t match your intuition.
If I were in your situation, I would conclude, “my intuition is missing something, let me try to formalize this and see where I went wrong.”
I’m still really confused by your opening salvo:
“No, the reason why we should have insensitivity is not quite that.”
What do you mean??? What is “that”, what is “the reason why we should have insensitivity”? I think the reason we should have insensitivity is so the oracle can make predictions.
Also, I’m not going to continue responding. I do not think you have anything here. I think you are just confused, and you have not done the work to figure out what you yourself mean.