Redlining seems to go beyond what’s economically efficient, as far as I can tell (see wikipedia).
Redlining (or more generally, deciding who gets credit) is a great example for this. If you want accurate risk assessment, you must take into account data (income, savings, industry/job stability, other kinds of debt, etc.) that correlates with ethnic averages.
Er, that’s precisely my point here. My idea is to have certain types of data explicitly permitted; in this case I set T to be income. The definition of “fairness” I was aiming for is that once that permitted data is taken into account, there should remain no further discrimination on the part of the algorithm.
This seems a much better idea that the paper’s suggestion of just balancing total fairness (eg willingness to throw away all data that correlates) with accuracy in some undefined way.
I may have been unclear—if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you’d had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.
In other words, if the disallowed data has no predictive power when added to the allowed data, it’s either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.
The main point of these ideas is to be able to demonstrate that a classifying algorithm—which is often nothing more than a messy black box—is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.
Redlining seems to go beyond what’s economically efficient, as far as I can tell (see wikipedia).
Er, that’s precisely my point here. My idea is to have certain types of data explicitly permitted; in this case I set T to be income. The definition of “fairness” I was aiming for is that once that permitted data is taken into account, there should remain no further discrimination on the part of the algorithm.
This seems a much better idea that the paper’s suggestion of just balancing total fairness (eg willingness to throw away all data that correlates) with accuracy in some undefined way.
I may have been unclear—if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you’d had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.
In other words, if the disallowed data has no predictive power when added to the allowed data, it’s either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.
The main point of these ideas is to be able to demonstrate that a classifying algorithm—which is often nothing more than a messy black box—is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.