Book Review: Weapons of Math Destruction

Epistemic Status: Minus One Million Points

Shortness Status: Long (this is a proposed new norm I want to try out, in the sense of ‘apologies for writing a long letter, I did not have enough time to write a shorter one.’ By contrast, Against Facebook was longer in words, but would be short.)

Weapons of Math Destruction is an easy read, but a frustrating one.

The book claims to be about the misuse of big data and machine learning to guide decisions, how that harms people and leads to bad outcomes, and how to fix it. The distortions of aiming at and rewarding what we are measuring rather than what we actually want worries me more and more. It is one of the biggest issues of our age, so I was excited to read a new take on it, even if I expected to already know the bulk of the facts and ideas presented.

There is some of that in the book, and those parts provide some useful information, although if you are reading this you likely already know a lot of it.

What the book is actually mostly about on its surface, alas, is how bad and unfair it is to be a Bayesian. There are two reasons, in her mind, why using algorithms to be a Bayesian is just awful.

The first objection is that probabilistic algorithms are probabilistic. It is just awful that predictive algorithms are used to decide who should get or keep a job, or get cheaper credit, or see certain advertisements, because the algorithm might be wrong. Look at this example of someone the algorithm got wrong! Look at this reason the algorithm got it wrong! Look how wrong it is! Clearly we need to rely on humans, who get things wrong more often, but do so in a less systematic fashion so we can’t prove exactly why any given human got something wrong.

The second objection is that algorithms rank people and options likely to be better above people and options likely to be worse. It is just awful that an algorithm notices that people who have bad credit, or live in a certain zip code, or shop at a certain store, or share some other trait, are either a worse or a better business proposition. You see, this is not fair and probably makes you a racist. This is because the people who are ranked either worse or better tend to be poor, and/or they tend to be not white, and that is just awful. If the resulting system gives them more attention in some way – say, by marketing to them to sell them things they might want and offering them discounts, or providing them with more government attention – then you are taking advantage of them, being a predator and destroying their lives, which you should at least have the common decency to do without an algorithm. If the resulting system gives them less attention in some way, by not marketing to them and charging them more, or by providing them with less government attention – then you are discriminating against them by denying them opportunities and services, which is not fair. Once again, you could at least have the decency to do this without an algorithm so no one can be sure exactly how you made your decisions. And again, since this is likely correlated with race, that also makes you a racist. Which is of course just awful.

These evil algorithms are sneaky. If you give them race as an input, they’ll pick up on the correlations involved and show how racist they (and you) are. Since you of course are not racist, you hide that data (but of course she will blame you for that too, since if you hide that data, we can’t see just how racist you truly are). So instead the evil algorithms notice things that are correlated with race, like income or zip code, and use those instead. So then you try to hide those, and then the algorithms get even sneakier and start picking up on other things that correlate in less direct ways. Or even worse, perhaps they do a good job of figuring out the actual answer, and the actual answer happens to be correlated with some trait you have to be fair to and therefore the algorithm is just awful.

She also doesn’t like it when humans make similar decisions without the use of algorithms, but somehow that makes it better, because you can’t point to the rule that did it. Besides, did you really expect the humans to ignore data they have and act against their own interests? Well, yes, she and similar people do expect this, or at least think not doing so is just awful, but they understand that there are limits.

She never uses the term, but basically she is arguing against disparate impact when compared against completely random decisions – that in the end, for a given set of groups, if a system does not result in equal outcomes for that group, it is not fair to that group, and for some groups this is just awful and means we need to ban the system, and force people to use worse systems instead that are bad proxies for what you are trying to measure. Then you complain about how the proxies are.

That is not the most charitable way of describing the argument being made, but I do not think it is a straw man, either. This is what the author explicitly claims to believe.

II.

In something that is not a coincidence, the way I react to such arguments was brought home by Sarah’s excellent recent post, The Face of the Ice. There are man versus man stories, where we are competing for resources including social and sexual status, and then there are man versus nature stories where we are talking about survival. When dealing with potentially big issues, ones that can threaten our very survival, the temptation is to refuse to realize or admit that, and instead focus on the man versus man aspects and how you or the groups you like are being treated unfairly and how that is just awful.

Thus, people talk about the unemployment that will be caused by self-driving cars instead of thinking, whoa, this will transform our entire society and way of life and supercharge our ability to move around both people and goods and maybe we should be both super excited and super worried about that for bigger reasons. People see that we are developing artificial intelligence… and worry about whether it will be racist or sexist, or our plans for income redistribution, rather than what to do with orders of magnitude more wealth and whether it will wipe out the human race because we are made of atoms it could use for something else, and also wipe out all utility in the universe. Which are questions I spend a lot of time on, since they seem rather important. But if you admit that the problems are that big, you would have to stop playing zero sum status games.

III.

The good news is that the author does provide some good starting points to thinking about some of the real problems of big data. Rather than discard the facts that do not fit her narrative, she to her credit shares them anyway. She then tends to move on without noticing the implications of those thoughts, but my standards are low enough that I consider that a win. She also has the strange habit of noting that the thing she is talking about isn’t really a ‘weapon of math destruction’ but it has the potential to be one if things went a little farther in the direction they are headed.

One could even engage in a Straussian reading of the book. In this reading, the real problem is the distortions and destructive games that result from big data algorithms. The constant warnings about the poor are real enough, and point out real problems we should address, but are more important as illustrations of how important it will be for us to get good treatment from the algorithms. At its most basic level, you are poor, so the algorithm treats you badly, and you fix that by not being poor. Not being poor is a good idea anyway, so that works out well. If we start using more and more convoluted proxies? We might have a much bigger problem.

(The unspoken next line is, of course, that if we use these proxies as optimization targets or training data for true artificial intelligence, that would be infinitely worse, but I do not think she gave such issues any thought whatsoever.)

This is why her best discussion is about college rankings. She makes the case that it is primarily the US News & World Report college rankings, and the choices those rankings made, that have caused the explosion in tuition and the awful red queen’s race between different colleges. While I am not fully convinced, she did convince me that the rankings are a much more important cause than I realized.

My abstraction of her story is simple. Before the ratings, everyone knew vaguely what the best universities were (e.g. Harvard and Yale), and by looking carefully one could figure out vaguely how good a school was, but it was very difficult to know much beyond that. The world silently cried out for a rating system, and US News & World Report made the first credible attempt at creating such a system. They chose a whole bunch of things that one could reasonably assume would correlate with quality, such as selectivity of admissions and the accomplishments of graduates, along with a few things that one could at least hope would be correlated with quality, especially if you were measuring and thus controlling for other factors, such as graduation rates. Then, to make sure the ratings had a shot at looking reasonable rather than weird, they included a survey they sent out to colleges.

What they did not include was the cost of tuition, because higher tuition correlates with higher quality, and they wanted the ‘high quality’ colleges like Harvard to come out on top, not whatever state university turned out to be the best value for your dollar.

The result of this was a credible list that students and potential faculty and those evaluating students and faculty could use to evaluate institutional quality. Eventually, the ratings evolved to include less weight on the surveys and more on various measurements. Students used the guide as a key input in choosing where to go to college, which was reasonable since their alternative measurements were terrible. Those evaluating those students also used the guide, especially since admission rates were a key input, so going to a top rated college became an advantage in and of itself, even if the rating wasn’t based on anything.

Since everyone in the system was now using the ratings as a key input in their evaluations, colleges then started devoting a lot of attention to moving up in those ratings, and other similar ratings that came later. A lot of that effort meant improving the quality of the university, especially at first. Some places (she uses the example of TCU) invested in athletics to attract better students and move up. Others worked to make sure students did what they needed to do in order to graduate, or helped their students find good jobs, or even just tried to improve the quality of the education their kids got.

Then there were those who tried to pass the test without learning the material. Some tried to get more applicants to look more selective. I had a personal experience with this. Stanford University sent me a nice card congratulating me on my qualifying for the USAMO, and asked me to consider their fine educational institution. This was before I had started my ongoing war with San Francisco, so I would have welcomed a chance to go to that institution, but my high school only allowed us to apply to seven colleges, and my GPA was substantially below the lowest GPA anyone at my high school had ever had while being accepted to Stanford. My rough math put my chances of admission at 0%, so I had no intent of wasting one of my precious seven slots on them instead of a place I might actually gain admission.

My parents did not understand this. All they saw was that Stanford had asked me to apply, and Stanford was awesome so I was applying to Stanford, whether I liked it or not. This led to me having an admissions officer from Stanford on the phone, telling her that both of us knew Stanford was never going to accept me, and would she please just tell my parents that for the love of God. I didn’t want to plead my case for admission because I knew I had none, I knew that and she knew that, but of course revealing this doesn’t help Stanford so she kept saying that of course every application is carefully considered and we hope you can welcome you to the class of 2001.

This was the first time someone from San Francisco decided to act superficially nice while screwing up my life for the tiniest possible gain to themselves. It was not the last.

In any case, this problem has since gotten much worse. At least back then I knew my safety schools would accept me, whereas now schools that notice you are ‘too good’ for them will reject you, because you’re not going to attend anyway, so why not improve their numbers instead of holding out the vain hope that they were the only place not to notice your criminal record, or worse, your sexist Facebook post? Thus the game gets ever more frustrating and complicated, and punishes even more those who refuse to play it.

All of these games cost money to play, but you know what the schools aren’t being rated on? That’s right, tuition! So they are free to spend ever more money on all the things, and charge accordingly, and the students see this as another sign of a quality institution. She doesn’t mention student loans, which massively contribute to this problem. This is consistent with her other blind spots, since student loans are good and increased tuition is bad, but that story does not conflict with the story being told here, and I did update in favor of tulip ratings mattering more and tulip subsidies mattering less.

Would a lot of that have happened anyway? Certainly, especially given that other ratings would have come out instead. But it seems logical that when a decision can be distilled down into a pretty good number that considers some but not all factors, then people will focus on gaming that number, and the factors that don’t improve that number will be ignored even if they matter more. Goodhart’s Demon will win the day.

IV.

Other sections are less convincing, but I will go over them quickly.

She talks about getting a job, and how bad it is that there are algorithms evaluating people. Even more than elsewhere in the book, it felt like she was writing the bottom line. This resulted in some confused arguments that she knew were not good, but that she used either because she believes her conclusion or because you should be using a Straussian reading.

The first argument against algorithms in employment is that sometimes they miss out on a good employee. While obviously true, this isn’t saying much, since every other method known to man does this, and most do it far more often, so this objection is like calling self-driving cars unsafe because they might kill people 10% as often as human drivers, instead of human drivers who I am confident do it 100% as often.

The second argument is that the algorithms are used in many different places, so different decisions will be correlated, and those who score poorly won’t be able to find a job at all, whereas in the old method different places used different systems so you could just keep applying and eventually someone would take a liking to you and give you a chance. This does point to the paradox that it seems like it is easier to get a job if everyone’s ratings are different, despite the fact that the same number of people end up with jobs, so it cannot be easier in general to find a job, rather than increasing the returns to perseverance: The randomized ratings make it harder to find a job on the first try, because you face more other applicants that will be rated highly (since they do not automatically find jobs due to the random factor). However, if you apply a lot more than others, your chances go up, whereas if every job uses a common application, more tries does not help you much, and a low scorer is drawing dead.

In some sense this change is good, since it means less time wasted with job applications and results in better matching, but in another sense it is bad because it cuts out the signal of how much the applicant cares. Having to apply for lots of jobs in order to find one means that those who want jobs the most will get the jobs (or the better jobs) since they will send the costly signal of applying more often, whereas in the algorithmic world, that confers no advantage, so those who need a job the most could be shut out by those who don’t care much. Costly signals can be good! So there’s at least some argument here, if it is too hard for the algorithm to measure how much you want the job.

The problem of a mistake-making algorithm is also self-correcting in a free market. If the algorithm makes mistakes, which of course it does, and enough of your competition follow its recommendations, you can get great employees at discount prices with high motivation by having humans look carefully to find the good employees the algorithm is turning down. This is especially true if the algorithm is using proxies for race, class, sex or other such categories (argument three that she uses) since those are known to throw out a lot of great people. She answers her own third objection by pointing out that the old system of ‘get a friend to recommend you’ is overall more discriminatory in the bad sense, on every axis both good and bad, than any algorithm being used in the wild.

Her talk about what happens on the job is similar. Yes, these algorithms make mistakes and sometimes evaluate good teachers as bad teachers. Yes, some of them have tons of noise in them. But what is the alternative? If these systems are not on average improvements why are corporations (and governments) using them more and more? The argument she relies on, that sometimes the algorithms make dumb mistakes, is very weak. Humans make really, really dumb mistakes all the time.

What she does not mention in either section, but is the real issue with such things, is that the system will be gamed and that gaming it might take over people’s lives. This is even more glaring due to her using teachers as an example, as teaching to the test is rapidly taking over all of primary and secondary education (or so I am told). Teaching was already a thankless job, and it seems like it is becoming more and more of a hell every year.

If there is an algorithm that will determine who can get hired for entry-level jobs, how long will it take before people learn what it is looking for? How long after that do they start sculpting their resumes and answers to that algorithm? How long after that do they start to post on Facebook what the system wants to see, take the classes it wants them to take, buy the products the algorithm wants to see them buy? Where does it end? Do we all end up consulting a get-hired strategy guide before we choose a pizza place, unless we already have a job, in which case we consult the get-promoted guide?

Then how does the algorithm respond to that action, and how do we respond in kind? How deep does this go?

Those questions terrify me. They don’t keep me up at night, because I belong to the General Mathis school of not letting things keep me up at night (this is why I had to quit the online game of Advanced Civilization), but they are a reasonable choice if you need something to do that for you.

She also notes that a lot of this involves using increasingly convoluted and strange measures, such as mysterious ‘e-scores’ and personality tests, that do not correlate all that well with results, and which she assumes tend to be discriminatory. She contrasts this to IQ tests and credit scores, which are much better predictors and tend to discriminate less and be more ‘fair’ because they only measure what you have done and what you can do, rather than what category of person your past signals that you belong to. She then demands that we do something about this outrage.

I agree with her that IQ tests and credit scores sound way better. It is a real shame that we decided to make it illegal to use them in hiring decisions. So if we want better measures, there’s a solution. I don’t think she is going to like it.

The section on insurance brings up the paradox of insurance. As the purchaser, you have a bunch of knowledge about how likely you are to need insurance. As the insurer, the company has some information it can use to estimate how likely you are to need it, and how much it will cost them when you do. There are then two problems. The first is that if many people only buy when your hidden information says you will need the insurance, and/or when you intend to engage in behaviors that make the insurance more valuable, then it becomes very hard to sell anyone insurance. That’s classic and she doesn’t talk much about it, because it is the consumer benefiting at the expense of a corporation, but if there was a big data algorithm that the consumer could use to decide how much insurance to buy, what would that do to the insurance market? What would happen if it was illegal for the seller of insurance to look at it, or the calculation required too much private data? Could insurance markets collapse? Is this in our future?

Instead she talks about problem two, which is if the insurer uses the information they know to decide who is likely to need insurance, they might start charging different amounts to different people. This would result in people being effectively charged money for their life histories and decisions, which is of course just awful. If poor people cost more to insure, for example (and she says that in many cases this is true), they might have to pay more. As you might guess, I am not sympathetic. This sounds like people paying for the additional costs that their decisions and lifestyles create. This should result in people making better decisions. If this has bad distributional consequences, which it might, the right answer is progressive taxation and redistribution (to the extent that you find this desirable).

Again, she misses that the real problem would be if people started trying to change the outcome of the algorithm and whether the system would be robust enough to get them to do this via ‘do thing that actually decreases expected insurance payouts and is socially good’ rather than ‘do thing that manipulates the system but does not actually accomplish anything.’ She does hint at this a bit when she talks about wellness systems put in place by employers, and how they are sometimes imposing stupid lifestyle costs on employees, but she thinks of this as corporations trying to steal wages by charging some employees more fees, rather than as corporations trying to use algorithms to improve employee health, and the problems that result from that disaster.

This pattern is quite frustrating, as she keeps touching on important and interesting questions, only to pull back to focus on less interesting and less important ones.

One real concern she does point out is that some insurance companies use their systems to figure out who is likely to do more comparison shopping, and give higher prices to those likely to do less comparison shopping. Humans do this all the time, of course, but that does not make it a good thing. When an algorithm makes something easier to do, it can increase the harm and force us to confront something that wasn’t previously worth confronting. If everyone does this to you, and all the companies raise your prices by 10%, you’re paying 10% more no matter how much you shop around. Then again, it would be very much to a company’s advantage to have a way for you to tell them that no really you did comparison shop, since figuring out what that signal is represents a costly signal that you will actually put in the work to comparison shop, so this equilibrium also seems unstable, which makes me worry about it less. There’s also the issue of comparison websites, which also credibly signal that the user is doing comparison shopping.

Finding credit, another of her sections, is another place where we are already at the phase of everyone gaming the system all the time. When I moved out to Denver, I couldn’t get any credit. This made me quite angry, since I had always paid all my bills, but it turns out that the algorithms think that if you have not borrowed money, you might not pay borrowed money back. As a human, I think that if you never borrow money, it’s a great sign that you don’t need to, so of course you’ll pay it back (and thought this was obvious logic, and that the way you convince the bank to give you a loan is to prove that you don’t need one).

As a result, I had to get a pre-paid credit card so that I could explicitly owe someone money and then pay them back, even though I didn’t really ever owe anyone anything, so that I could then get a regular credit card with a tiny limit, so I could actually owe someone money for real, and pay that back, and so on in a cycle until a few years later when I get periodic new credit card offers in the mail with giant credit lines. We pay our bills on time in large part to protect our credit ratings, and also do other things to help our credit ratings. In this case, the system seems stable. If we decide that group of things X gives you a high credit rating, then the willingness to do lots of X is a great sign that you are worthy of credit even if X has nothing to do with anything! If you take the time to make sure your credit report looks good, I do in fact trust you to pay your bills.

This is an example of a great outcome, and it would be good to put more thought into how we got there. A strong argument she could make, but does not make (at least explicitly) is that we got there because credit ratings exclude lots of data they could use, but choose not to thus giving people control over those ratings in important ways, and preventing those ratings from intruding on the rest of our lives. Of course, the right way to respond to this is to allow people to use credit ratings for more things, thus crowding out other measures that use data we would rather not involve, instead of banning credit scores, which invites the use of whatever data we can find.

The sections on online advertising and civic life did not seem to raise any new and interesting concerns, so I’m going to skip over them, other than to echo her and issue the periodic public service announcement that for profit universities are almost all scams or near scams, you should never, ever, ever use them, and anything that gives them access to potential victims is scum and deserves to burn in hell.

I would say that given my expectations, the book was about a 50th percentile result. That’s not a disaster, but it is a failure, because book utility has a huge fat positive tail. Given you have read this far, I can’t recommend that you read the book, since I do not think you would get much more out of reading the whole thing. If you are especially interested, though, it is a quick and mostly painless read and does have some useful facts in it I glossed over, so you could do a lot worse. I certainly do worse with my time reasonably often.