Nice example—I’m convinced that the bound isn’t always achievable. So the next questions are:
Is there a nice way to figure out how much information we can keep, in any particular problem, when Y is not observed?
Is there some standard way of constructing S which achieves optimal performance in general when Y is not observed?
My guess is that optimal performance would be achieved by throwing away all information contained in X about the distribution P[Y|X]. We always know that distribution just from observing X, and throwing away all info about that distribution should throw out all info about Y, and the minimal map interpretation suggests that that’s the least information we can throw out.
Nice example—I’m convinced that the bound isn’t always achievable. So the next questions are:
Is there a nice way to figure out how much information we can keep, in any particular problem, when Y is not observed?
Is there some standard way of constructing S which achieves optimal performance in general when Y is not observed?
My guess is that optimal performance would be achieved by throwing away all information contained in X about the distribution P[Y|X]. We always know that distribution just from observing X, and throwing away all info about that distribution should throw out all info about Y, and the minimal map interpretation suggests that that’s the least information we can throw out.