Instead of saying ”f(X) contains all information in X relevant to Y”, it would be better to say that, f(X) contains all information in X that is relevant to Y if you don’t condition on anything. Because it may be the case that if you condition on some additional random variable Z, f(X) no longer contains all relevant information.

Example:

Let X1,X2,Z be i.i.d. binary uniform random variables, i.e. each of the variables takes the value 0 with probability 0.5 and the value 1 with probability 0.5. Let X=(X1,X2) be a random variable. Let Y=X1⊕X2⊕Z be another random variable, where ⊕ is the xor operation. Let f be the function f(X)=f((X1,X2))=X2.

Then f contains all information in X that is relevant to Y. But if we know the value of Z, then f no longer contains all information in X that is relevant to Y.

Instead of saying ”f(X) contains all information in X relevant to Y”, it would be better to say that, f(X) contains all information in X that is relevant to Y if you don’t condition on anything. Because it may be the case that if you condition on some additional random variable Z, f(X) no longer contains all relevant information.

Example:

Let X1,X2,Z be i.i.d. binary uniform random variables, i.e. each of the variables takes the value 0 with probability 0.5 and the value 1 with probability 0.5. Let X=(X1,X2) be a random variable. Let Y=X1⊕X2⊕Z be another random variable, where ⊕ is the xor operation. Let f be the function f(X)=f((X1,X2))=X2.

Then f contains all information in X that is relevant to Y. But if we know the value of Z, then f no longer contains all information in X that is relevant to Y.

Good point, thanks.