You’re heading towards redefining correlation to mean causal connection.
Nope. I’m pointing out that “correlated” can mean “there exists a linear statistical correlation” or “there exists mutual information”—but whichever you use, you need to be consistent. And at no point did I say it meant causal connection—I just noted that that’s one way mutual information can develop.
The moral that I was drawing was a practical one: certain very simple relationships between physical variables can generate time series containing no mutual information detectable by any of these methods. This suggests a substantial limitation of their practical applicability.
What you showed is that there is more than one way for two variables to be mutually informative, and if you limit yourself to a linear statistical regression on the simultaneous pairs, you might not find the mutual information. So what? If you know more than just the unordered simultaneous pairs, use that knowledge!
But who denies that learning a function tells you something about its derivative? (which would mean there’s mutual information between the two...)
Specifics, please.
Sure. Let’s use your point about derivatives. I tell you sin(x) = 4⁄5. Have I told you something about cos(x)? (And no it doen’t matter that the cosine can have two values; you’ve still learned something.)
I tell you f(x) = sin(x) + cos(x). Have I told you something about f ′ (x)?
Sure. Let’s use your point about derivatives. I tell you sin(x) = 4⁄5. Have I told you something about cos(x)?
Yes.
I tell you f(x) = sin(x) + cos(x). Have I told you something about f ′ (x)?
Yes.
But in real experiments, you’re not given the underlying function, only observations of some of its values.
So, I tell you a time series for an unknown function f.
What have I told you about f’? What further information would you need to make a numerical calculation of the amount of information you now have about f’?
In the data file I originally linked to, there is not merely no linear relationship, but virtually no relationship whatsoever, discoverable by any means whatever, between the two columns, which tabulate f and f’ for a certain stochastic function f. Mutual information, even in Kolmogorov heaven, is not present.
But in real experiments, you’re not given the underlying function, only observations of some of its values.
Yet you are given their time index values, meaning you have more than the unordered simultaneous pairs you presented in the example.
So, I tell you a time series for an unknown function f. What have I told you about f’? What further information would you need to make a numerical calculation of the amount of information you now have about f’?
You’d need to know the prior you have on the data. The KL divergence between your prior and the function tells you how much information you received by seeing the data. Typically, your probability distribution on the data shifts toward the function, while retaining a small probability mass on the function being very high frequency (higher than the Nyquist frequency on your sampling rate, for the pedants).
In the data file I originally linked to, there is not merely no linear relationship, but virtually no relationship whatsoever, discoverable by any means whatever, between the two columns, which tabulate f and f’ for a certain stochastic function f. Mutual information, even in Kolmogorov heaven, is not present.
No means whatsoever? What about iterating through the possible causal maps and seeing which is consistent? (Probably not a good idea in the general case, but you only have a few variables here.)
Yet you are given their time index values, meaning you have more than the unordered simultaneous pairs you presented in the example.
The example data (here they are again) is a time series, not a set of unordered pairs. (Time is proportional to line number.)
No means whatsoever?
None whatsoever (assuming the random noise that drives the process is truly random, or at least unknowable—guessing the pseudo-RNG algorithm and its seed doesn’t count).
Consider this challenge open to anyone who thinks that there is mutual information between the two columns: calculate it. Prove the validity of the computation by calculating information about the second column given only the first, for a new file generated by the same method.
Nope. I’m pointing out that “correlated” can mean “there exists a linear statistical correlation” or “there exists mutual information”—but whichever you use, you need to be consistent. And at no point did I say it meant causal connection—I just noted that that’s one way mutual information can develop.
What you showed is that there is more than one way for two variables to be mutually informative, and if you limit yourself to a linear statistical regression on the simultaneous pairs, you might not find the mutual information. So what? If you know more than just the unordered simultaneous pairs, use that knowledge!
Sure. Let’s use your point about derivatives. I tell you sin(x) = 4⁄5. Have I told you something about cos(x)? (And no it doen’t matter that the cosine can have two values; you’ve still learned something.)
I tell you f(x) = sin(x) + cos(x). Have I told you something about f ′ (x)?
Yes.
Yes.
But in real experiments, you’re not given the underlying function, only observations of some of its values.
So, I tell you a time series for an unknown function f.
What have I told you about f’? What further information would you need to make a numerical calculation of the amount of information you now have about f’?
In the data file I originally linked to, there is not merely no linear relationship, but virtually no relationship whatsoever, discoverable by any means whatever, between the two columns, which tabulate f and f’ for a certain stochastic function f. Mutual information, even in Kolmogorov heaven, is not present.
Yet you are given their time index values, meaning you have more than the unordered simultaneous pairs you presented in the example.
You’d need to know the prior you have on the data. The KL divergence between your prior and the function tells you how much information you received by seeing the data. Typically, your probability distribution on the data shifts toward the function, while retaining a small probability mass on the function being very high frequency (higher than the Nyquist frequency on your sampling rate, for the pedants).
No means whatsoever? What about iterating through the possible causal maps and seeing which is consistent? (Probably not a good idea in the general case, but you only have a few variables here.)
The example data (here they are again) is a time series, not a set of unordered pairs. (Time is proportional to line number.)
None whatsoever (assuming the random noise that drives the process is truly random, or at least unknowable—guessing the pseudo-RNG algorithm and its seed doesn’t count).
Consider this challenge open to anyone who thinks that there is mutual information between the two columns: calculate it. Prove the validity of the computation by calculating information about the second column given only the first, for a new file generated by the same method.