No nonsense version of the “racial algorithm bias”

In dis­cus­sions of al­gorithm bias, the COMPAS scan­dal has been too of­ten quoted out of con­text. This post gives the facts, and the in­ter­pre­ta­tion, as quickly as pos­si­ble. See this for de­tails.

The fight

The COMPAS sys­tem is a statis­ti­cal de­ci­sion al­gorithm trained on past statis­ti­cal data on Amer­i­can con­victs. It takes as in­puts fea­tures about the con­vict and out­puts a “risk score” that in­di­cates how likely the con­vict would re­offend if re­leased.

In 2016, ProPublica or­ga­ni­za­tion claimed that COMPAS is clearly un­fair for blacks in one way. North­pointe replied that it is ap­prox­i­mately fair in an­other way. ProPublica re­bukes with many statis­ti­cal de­tails that I didn’t read.

The ba­sic para­dox at the heart of the con­tention is very sim­ple and is not a sim­ple “ma­chines are bi­ased be­cause it learns from his­tory and his­tory is bi­ased”. It’s just that there are many kinds of fair­ness, each may sound rea­son­able, but they are not com­pat­i­ble in re­al­is­tic cir­cum­stances. North­pointe chose one and ProPublica chose an­other.

The math

The ac­tual COMPAS gives a risk score from 1-10, but there’s no need. Con­sider the toy ex­am­ple where we have a de­cider (COMPAS, a jury, or a judge) judg­ing whether a group of con­victs would re­offend or not. How well the de­cider is do­ing can be mea­sured in at least three ways:

  • False nega­tive rate = (false nega­tive)/​(ac­tual pos­i­tive)

  • False pos­i­tive rate = (false pos­i­tive)/​(ac­tual nega­tive)

  • Cal­ibra­tion = (true pos­i­tive)/​(test pos­i­tive)

A good de­cider should have false nega­tive rate close to 0, false pos­i­tive rate close to 0, and cal­ibra­tion close to 1.

Vi­su­ally, we can draw a “square” with four blocks: a square with 4 blocks

  • false nega­tive rate = the “height” of the false nega­tive block,

  • false pos­i­tive rate = the “height” of the false pos­i­tive block,

  • cal­ibra­tion = (true pos­i­tive block)/​(to­tal area of the yel­low blocks)

Now con­sider black con­victs and white con­victs. Now we have two squares. Since they have differ­ent re­offend rates for some rea­son, the cen­tral ver­ti­cal line of the two squares are differ­ent.

two squares, one for White, one for Black The de­cider tries to be fair by mak­ing sure that the false nega­tive rate and false pos­i­tive rates are the same in both squares, but then it will be forced to make the cal­ibra­tion in the Whites lower than the cal­ibra­tion in the Blacks.

Then sup­pose the de­cider try to in­crease the cal­ibra­tion in the Whites, then the de­cider must some­how de­crease the false nega­tive rate of Whites, or the false pos­i­tive rate of Whites.

In other words, when the base rates are differ­ent, it’s im­pos­si­ble to have equal fair­ness mea­sures in:

  • false nega­tive rate

  • false pos­i­tive rate

  • calibration

Oh, for­got to men­tion, even when base rates are differ­ent, there’s a way to have equal fair­ness mea­sures in all three of those… But that re­quires the de­cider to be perfect: Its false pos­i­tive rate and false nega­tive rate must both be 0, and its cal­ibra­tion must be 1. This is un­re­al­is­tic.

In the jar­gon of fair­ness mea­sure­ment, “equal false nega­tive rate and false pos­i­tive rate” is “par­ity fair­ness”; “equal cal­ibra­tion” is just “cal­ibra­tion fair­ness”. Par­ity fair­ness and cal­ibra­tion fair­ness can be straight­for­wardly gen­er­al­ized for COMPAS, which uses a 1-10 scor­ing scale, or in­deed any nu­mer­i­cal risk score.

It’s some straight­for­ward alge­bra to prove that in this gen­eral case, par­ity fair­ness and cal­ibra­tion fair­ness are in­com­pat­i­ble when the base rates are differ­ent, and the de­cider is not perfect.

The fight, af­ter-math

North­pointe showed that COMPAS is ap­prox­i­mately fair in cal­ibra­tion for Whites and Blacks. ProPublica showed that COMPAS is un­fair in par­ity.

The les­son is that there are in­com­pat­i­ble fair­nesses. To figure out which to ap­ply—that is a differ­ent ques­tion.