It’s not clear to me how this “fairness” criteria is supposed to work. If you simply don’t include S among the predictors, then for any given x in X, the classification of x will be ‘independent’ of S in that a counterfactual x’ with the exact same features but different S would be classified the exact same way. OTOH if you’re aiming to have Y be uncorrelated with S even without controlling for X, this essentially requires adding S as a ‘predictor’ too; e.g. consider the Simpson paradox. But this is a weird operationalization of ‘fairness’.
It’s not clear to me how this “fairness” criteria is supposed to work. If you simply don’t include S among the predictors, then for any given x in X, the classification of x will be ‘independent’ of S in that a counterfactual x’ with the exact same features but different S would be classified the exact same way. OTOH if you’re aiming to have Y be uncorrelated with S even without controlling for X, this essentially requires adding S as a ‘predictor’ too; e.g. consider the Simpson paradox. But this is a weird operationalization of ‘fairness’.