Shannon mutual information doesn’t really capture my intuitions either. Take a random number X, and a cryptographically strong hash function. Calculate hash(X) and hash(X+1).
Now these variables share lots of mutual information. But if I just delete X, there is no way an agent with limited compute can find or exploit the link. I think mutual information gives false positives, where Pearson info gave false negatives.
So Pearson Correlation ⇒ Actual info ⇒ Shannon mutual info.
So one potential lesson is to keep track of which direction your formalisms deviate from reality in. Are they intended to have no false positives, or no false negatives. Some mathematical approximations, like polynomial time = runnable in practice, fail in both directions but are still useful when not being goodhearted too much.
This is particularly relevant to the secret messages example, since we do in fact use computational-difficulty-based tricks for sending secret messages these days.
Actually the mutual information has some well-defined operational meaning. For example, the maximum rate at which we can transmit a signal through a noisy channel is given by the mutual information between the input and the output of the channel. So it depends on which task you are interested in.
A “channel” that hashes the input has perfect mutual info, but is still fairly useless to transmit messages. The point about mutual info is its the maximum, given unlimited compute. It serves as an upper bound that isn’t always achievable in practice. If you restrict to channels that just add noise, then yeh, mutual info is the stuff.
Shannon mutual information doesn’t really capture my intuitions either. Take a random number X, and a cryptographically strong hash function. Calculate hash(X) and hash(X+1).
Now these variables share lots of mutual information. But if I just delete X, there is no way an agent with limited compute can find or exploit the link. I think mutual information gives false positives, where Pearson info gave false negatives.
So Pearson Correlation ⇒ Actual info ⇒ Shannon mutual info.
So one potential lesson is to keep track of which direction your formalisms deviate from reality in. Are they intended to have no false positives, or no false negatives. Some mathematical approximations, like polynomial time = runnable in practice, fail in both directions but are still useful when not being goodhearted too much.
This is particularly relevant to the secret messages example, since we do in fact use computational-difficulty-based tricks for sending secret messages these days.
Actually the mutual information has some well-defined operational meaning. For example, the maximum rate at which we can transmit a signal through a noisy channel is given by the mutual information between the input and the output of the channel. So it depends on which task you are interested in.
A “channel” that hashes the input has perfect mutual info, but is still fairly useless to transmit messages. The point about mutual info is its the maximum, given unlimited compute. It serves as an upper bound that isn’t always achievable in practice. If you restrict to channels that just add noise, then yeh, mutual info is the stuff.
Yes, it is the relevant quantity in the limit of infinite number of uses of the channel. If you can use it just one time, it does not tell you much.