I may have been unclear—if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you’d had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.
In other words, if the disallowed data has no predictive power when added to the allowed data, it’s either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.
The main point of these ideas is to be able to demonstrate that a classifying algorithm—which is often nothing more than a messy black box—is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.
I may have been unclear—if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you’d had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.
In other words, if the disallowed data has no predictive power when added to the allowed data, it’s either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.
The main point of these ideas is to be able to demonstrate that a classifying algorithm—which is often nothing more than a messy black box—is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.