# Less Competition, More Meritocracy?

Anal­y­sis of the pa­per: Less Com­pe­ti­tion, More Mer­i­toc­racy (hat tip: Marginal Revolu­tion: Can Less Com­pe­ti­tion Mean More Mer­i­toc­racy?)

Epistemic Sta­tus: Con­sider the horse as if it was not a three me­ter sphere

Eco­nomic pa­pers that use math to prove things can point to in­ter­est­ing po­ten­tial re­sults and rea­sons to ques­tion one’s in­tu­itions. What is frus­trat­ing is the failure to think out­side of those mod­els and proofs, an­a­lyz­ing the prac­ti­cal im­pli­ca­tions.

In this par­tic­u­lar pa­per, the cen­tral idea is that when risk is un­limited and free, ratch­et­ing up com­pe­ti­tion dra­mat­i­cally in­creases risk taken. This in­tro­duces suffi­cient noise that adding more com­peti­tors can make the av­er­age win­ner less skil­led. At the mar­gin, adding ad­di­tional similar com­peti­tors to a very large pool has zero im­pact. Ad­ding com­peti­tors with less ex­pected promise makes things worse.

This can ap­ply in the real world. The pa­per pro­vides a good ex­am­ple of a very good in­sight that is then proven ‘too much,’ and which does not then ques­tion or vary its as­sump­tions in the ways I would find most in­ter­est­ing.

#### I. The Ba­sic Model and its Cen­tral Point

Pre­sume some num­ber of job open­ings. There are weak can­di­dates and strong can­di­dates. Each can­di­date knows if they are strong or weak, but not how many other can­di­dates are strong, nor do those run­ning the con­test know how many are strong.

The goal of the com­pe­ti­tion is to se­lect as many strong can­di­dates as pos­si­ble. Or for­mally, to max­i­mize [num­ber of strong se­lected – num­ber of weak se­lected], which is the same thing if the num­ber of can­di­dates is fixed, but is im­por­tantly differ­ent later when the num­ber of se­lected can­di­dates can vary. Each can­di­date performs and is given a score, and for an N-slot com­pe­ti­tion, the high­est N scores are picked.

By de­fault, strong can­di­dates score X and weak can­di­dates score Y, X>Y, but each can­di­date can also take on as much risk as they wish, with any de­sired dis­tri­bu­tion of scores, so long as their score never goes be­low zero.

The pa­per then does as­sumes re­flex­ive equil­ibrium, does math and proves a bunch of things that hap­pen next. The math checks out; I du­pli­cated the re­sults in­tu­itively.

There are two types of equil­ibrium.

In the first type, con­ces­sion equil­ibria, strong can­di­dates take no risk and are al­most always cho­sen. Weak can­di­dates take risk to try and beat other weak can­di­dates, but at­tempt­ing to beat strong can­di­dates isn’t worth­while. This al­lows strong can­di­dates to take zero risk.

In the sec­ond type, challenge equil­ibria, weak can­di­dates at­tempt to be cho­sen over strong can­di­dates, forc­ing strong can­di­dates to take risk.

If I am a weak can­di­date, I can be at least (Y/​X) as likely as a strong can­di­date to be se­lected by copy­ing their strat­egy with prob­a­bil­ity (Y/​X) and scor­ing 0 oth­er­wise. This seems close to op­ti­mal in a challenge equil­ibria.

Ad­ding more can­di­dates, strong or weak, risks shift­ing from a con­ces­sion to a challenge equil­ibria. Each ad­di­tional can­di­date, of any strength, makes challenge a bet­ter op­tion rel­a­tive to con­ces­sion.

If com­pe­ti­tion is ‘in­suffi­ciently in­tense’ then we get a con­ces­sion equil­ibria. We suc­cess­fully iden­tify ev­ery strong can­di­date, at the cost of ac­cept­ing some weak ones. If com­pe­ti­tion is ‘too in­tense’ we lose that. The ex­tra can­di­date that tips us over the edge makes things much worse. After that, quan­tity does not mat­ter, only the ra­tio of weak can­di­dates to strong.

Even if search is free, and you con­tinue to sam­ple from the same pool, hit­ting the thresh­old hurts you, and fur­ther ex­pan­sion does noth­ing. In­ter­view­ing one mil­lion peo­ple for ten jobs, a tenth of which are strong, is not bet­ter than ten thou­sand, or even one hun­dred. Ninety might be bet­ter.

Since costs are never zero (and rarely nega­tive), and the pool usu­ally de­grades as it ex­pands, this ar­gues strongly for limited com­pe­ti­tions with weaker se­lec­tion crite­ria, in­clud­ing via var­i­ous hacks to the sys­tem.

#### II. What To Do, and What This Im­plies, If This Holds

The pa­per does a good job an­a­lyz­ing what hap­pens if its con­di­tions hold.

If one has a fixed set of po­si­tions to fill (win­ners to pick) and wants to pick the max­i­mum num­ber of strong can­di­dates, with no cost to ex­pand­ing the pool of can­di­dates, the ideal case is to pick the max­i­mum num­ber of strong can­di­dates that main­tains a con­ces­sion equil­ibrium. With no con­trol (by as­sump­tion) over who you se­lect or how to se­lect them, this is the same as pick­ing the max­i­mum num­ber of can­di­dates that main­tains a con­ces­sion equil­ibrium, no mat­ter what de­crease in qual­ity you might get while ex­pand­ing the pool.

The tip­ping point makes this a Price Is Right style situ­a­tion. Get as close to the num­ber as pos­si­ble with­out go­ing over. Go­ing over is quite bad, worse than a sub­stan­tial un­der­shoot.

One can think of prob­a­bly not in­ter­view­ing enough strong can­di­dates, and prob­a­bly hiring some weak can­di­dates, as the price you must pay to be al­lowed to sort strong can­di­dates from weak can­di­dates – you need to ‘pay off’ the weak ones to not try and fool the sys­tem. An ex­tra benefit is that even as you fill all the slots, you know who is who, which can be valuable in­for­ma­tion in the fu­ture. Even if you’re stuck with them, bet­ter to know that.

A similar dy­namic comes if choos­ing how many can­di­dates to se­lect from a fixed pool, or when choos­ing both can­di­date and pool sizes.

If one at­tempts to only have slots for strong can­di­dates, un­der un­limited free risk tak­ing, you guaran­tee a challenge equil­ibria. Your best bet will there­fore prob­a­bly be to pick enough can­di­dates from the pool to cre­ate a con­ces­sion equil­ibrium, just like choos­ing a smaller can­di­date pool.

The pa­per con­sid­ers hiring a weak can­di­date as a −1, and hiring a strong can­di­date as a +1. The con­clu­sions don’t vary much if this changes, since there are lots of other nu­mer­i­cal knobs left un­speci­fied that can can­cel this out. But it is worth not­ing that in most cases the ra­tio is far less fa­vor­able than that. The de­fault is that one good hire is far less good than one bad hire is bad. True bad hires are rather ter­rible (as op­posed to all right but less than the best).

Thus, when the pa­per points out that it is some­times im­pos­si­ble to re­li­ably break 50% strong can­di­dates un­der re­al­is­tic con­di­tions, no mat­ter how many peo­ple are in­ter­viewed and how many slots are given out, they un­der­es­ti­mate the chance that the sys­tem breaks down en­tirely into no con­test at all, and no pro­duc­tion.

What is the best we can do, if all as­sump­tions hold?

The min­i­mum por­tion of weak can­di­dates ac­cepted scales lin­early with their pres­ence in the pool, and with how strongly they perform rel­a­tive to strong can­di­dates. Thus we set the pool size such that this fills out the pool with some mar­gin of er­ror.

That is best if we set the pool size but noth­ing else. The pa­per con­sid­ers col­lege ad­mis­sions. A col­lege is ad­vised to solve for which can­di­dates are above a fixed thresh­old, then choose at ran­dom from those above the thresh­old (which is a sug­ges­tion one would only make in a pa­per with zero search costs, since once you have enough wor­thy can­di­dates you can stop search­ing, but shrug.) Thus, we can always choose to ar­bi­trar­ily limit the pool.

In prac­tice, at­tempt­ing this would change the pool of ap­pli­cants. In a way you won’t like. You are more at­trac­tive to weak can­di­dates and less at­trac­tive to strong ones. Weak can­di­dates flood in to ‘take their shot,’ caus­ing a vi­cious cy­cle of rep­u­ta­tion and pool de­cay. You’ve not a good reach school or a safe school for a strong can­di­date, so why bother? If other col­leges copy you, stu­dents re­spond by in­vest­ing less in be­com­ing strong and more in send­ing out all the ap­pli­ca­tions, and the re­main­ing strong can­di­dates re­main at risk.

True re­flex­ive equil­ibria al­most never ex­ist, given the pos­si­ble an­gles of re­sponse, and differ­ences be­tween peo­ple’s knowl­edge, prefer­ences and cog­ni­tion.

#### III. Re­lax Reflec­tive Equilibrium

Even if it is com­mon knowl­edge that only two can­di­date strengths ex­ist, and all can­di­dates of each type are iden­ti­cal (which they aren’t), they will get differ­ent in­for­ma­tion and re­act differ­ently, de­stroy­ing re­flex­ive equil­ibrium.

Play­ers will not ex­pect all oth­ers to jump with cer­tainty be­tween equil­ibria at some size thresh­old. Be­cause they won’t. Which cre­ates differ­ent equil­ibria.

Some play­ers don’t know game the­ory, or don’t pay at­ten­tion to strat­egy. Those play­ers, as a group, lose. Smart game the­ory always has the edge.

An in­tu­ition pump: Learn­ing game the­ory is costly, so the equil­ibrium re­quires it to pay off. Com­pare to the effi­cient mar­ket hy­poth­e­sis.

Some weak can­di­dates will always at­tempt to pass as strong can­di­dates. There is a grad­ual shift from most not do­ing so to al­most ev­ery­one do­ing so. More weak can­di­dates steadily take on more risk. Even­tu­ally most of them mostly take on large risk to do their im­pres­sion of a strong can­di­date. Strong can­di­dates slowly start tak­ing more risk more of­ten as they sense their po­si­tion be­com­ing un­safe.

Zero risk isn’t sta­ble any­way with­out con­tin­u­ous skill lev­els. Strong can­di­dates no­tice that ex­actly zero risk puts them be­hind can­di­dates who take on ex­tra tail risk to get ep­silon above them. Zero risk is a de­fault strat­egy, so beat­ing that baseline is wise.

Now those do­ing this try to out­bid each other, un­til strong can­di­dates lose to weak can­di­dates at least some­times. This risk will cap out very low if strong can­di­dates con­sider the risk of los­ing at around their av­er­age perfor­mance to also be minus­cule, but it will have to ex­ist. Other­wise, there’s an al­most free ac­tion in mak­ing one’s poor perfor­mances worse, since they are already los­ing to al­most all other strong can­di­dates, and do­ing that al­lows one to make their stronger perfor­mances bet­ter and/​or more likely.

The gen­er­al­iza­tion of this rule is that when­ever you in­tro­duce a pos­si­ble out­come into the sys­tem, and provide any net benefit to any­one if they do things that make the out­come more likely, there is now a chance that the out­come hap­pens. Even if the out­come is ‘di­vorce,’ ‘gov­ern­ment de­fault,’ ‘forced liqui­da­tion,’ ‘we both drive off the cliff’ or ‘nu­clear war.’ It prob­a­bly also isn’t ep­silon. While risk is near ep­silon, tak­ing ac­tions that in­crease risk will look es­sen­tially free, so un­til the risk is big enough to mat­ter it will keep in­creas­ing. There­fore, ev­ery risk isn’t only pos­si­ble. Every risk will mat­ter. Given enough time, some­one will mis­calcu­late, and Mur­phy’s Law en­sues.

Fu­ture post: Pos­si­ble bad out­comes are re­ally bad.

Step­ping back, the right strat­egy for each com­peti­tor will be to guess the perfor­mance lev­els that effi­ciently trans­late into wins, mak­ing sure to max­i­mally by­pass lev­els oth­ers are likely to naively se­lect (such as zero risk strate­gies), and gen­er­ally play like they’re in a vari­a­tion of the game of Blotto.

A lot of these re­sults are driven by dis­crete skill lev­els, so let’s get rid of those next.

#### IV. Allow Con­tin­u­ous Skill Levels

Sup­pose in­stead of two skill lev­els, each player has their own skill level, and a rough and noisy idea where they lie in the dis­tri­bu­tion.

Each player has re­sources to dis­tribute across prob­a­bil­ity. Suc­cess is in­creas­ing as a func­tion of perfor­mance. Think­ing play­ers aim for perfor­mance lev­els they be­lieve are effi­cient, and do not waste re­sources on perfor­mance lev­els that mat­ter less.

All com­peti­tors also know that the chance of win­ning with low perfor­mance is al­most zero. The value of ad­di­tional perfor­mance prob­a­bly grad­u­ally in­creases (pos­i­tive sec­ond deriva­tive) un­til it peaks at an in­flec­tion point, and then starts to de­cline as suc­cess starts to ap­proach prob­a­bil­ity one. There may be ad­di­tional quirky places in the dis­tri­bu­tion where ex­tra perfor­mance is es­pe­cially valuable. This ex­act curve won’t be known to any­one, differ­ent play­ers will have differ­ent guesses partly based on their own abil­ities, and abil­ity lev­els are con­tin­u­ous.

A suffi­ciently strong can­di­date, who ex­pects their av­er­age perfor­mance to be above the in­flec­tion point, should take no risk. A weaker can­di­date should ap­prox­i­mate the in­flec­tion point, and risk oth­er­wise scor­ing a zero perfor­mance to reach that point. Sim­ple.

If the dis­tri­bu­tion of skill lev­els is bumpy, what hap­pens then? We have strong can­di­dates and weak can­di­dates (e.g. let’s say col­lege grad­u­ates and high school grad­u­ates, or some have worked in the field and some haven’t, or what­ever) so there’s a two-peak dis­tri­bu­tion of skill lev­els. Un­less peo­ple are badly mis­in­formed, we’ll still get a nor­mal-look­ing dis­tri­bu­tion. If the two groups calcu­late very differ­ent ex­pected thresh­olds, we’ll see two peaks.

In gen­eral, but not always, enough play­ers will mis­calcu­late or com­pete for the ‘ev­ery­one failed’ con­di­tion that try­ing to do so is a los­ing play. Oc­ca­sion­ally there will be good odds to hop­ing enough oth­ers aim too high and miss.

Rather than have a challenge and a con­ces­sion equil­ibrium, we have a thresh­old equil­ibrium. Every­one has a noisy es­ti­mate of the thresh­old they need. Those ca­pa­ble of re­li­ably hit­ting the thresh­old take no risk, and usu­ally make it. Those not ca­pa­ble of re­li­ably hit­ting the thresh­old risk ev­ery­thing to make the thresh­old as of­ten as pos­si­ble.

Note that this equil­ibrium holds, al­though it may con­tain no one above the fi­nal thresh­old. If ev­ery­one aims for what they think is good-enough perfor­mance, aiming for less is al­most worth­less, and aiming for much more is mostly pointless, and thresh­old ad­justs so that the ex­pected num­ber of thresh­old perfor­mances is very close to the num­ber of slots.

More com­pe­ti­tion raises the thresh­old, forc­ing com­peti­tors to take on more risk, un­til ev­ery­one is us­ing the same thresh­old strat­egy and suc­cess is purely pro­por­tional to skill. Thus, in a large pool, we once again have ex­pand­ing the pool as a bad idea if it weak­ens av­er­age skill, even if search and par­ti­ci­pa­tion costs for all are free.

In a small pool, the strongest can­di­dates are ‘wast­ing’ some of their skill on less effi­cient out­comes be­yond their best es­ti­mate of the thresh­old.

This ends up be­ing similar to the challenge case, ex­cept that there is no in­flec­tion point where things sud­denly get worse. You never ex­pect to lose from ex­pand­ing the pool while main­tain­ing qual­ity. In­stead, things slowly get bet­ter as you waste less work at the top of the curve, so the value of adding more similar can­di­dates quickly ap­proaches zero.

The new in­tu­ition is, given low enough search costs, we should add equally strong po­ten­tial can­di­dates un­til we are con­fi­dent ev­ery­one is tak­ing risk, rather than stop­ping just short of caus­ing stronger can­di­dates to take risk. If par­ti­ci­pa­tion is costly to you and/​or the can­di­dates, you should likely stop short of that point.

The key in­tu­itive ques­tion to ask is, if a can­di­date was the type of per­son you want, would they be so far ahead of the game as to be ob­vi­ously bet­ter than the cur­rent ex­pected marginal win­ner? Would they be able to crush a much big­ger pool, and thus be effec­tively wast­ing lots of effort? If and only if that’s true, there’s prob­a­bly benefit to ex­pand­ing your search, so you get more such peo­ple, and it’s a ques­tion of whether it is worth the cost.

The other strong in­tu­ition is that once your marginal ap­pli­cant pool is lower in av­er­age qual­ity than your av­er­age pool, that will always be a high cost, so fo­cus on qual­ity over quan­tity.

This sug­gests an­other course of ac­tion…

#### V. Multi-Stage Process

Our model tells us that av­er­age qual­ity of win­ners is, given a large pool, a func­tion of the av­er­age qual­ity of our base pool.

But we have a huge ad­van­tage: This whole pro­cess is free.

Given that, it seems like we should be able to be a bit more clever and com­plex, and do bet­ter.

We can im­prove if we can get a pool of can­di­dates that has a higher av­er­age qual­ity than our origi­nal can­di­date pool, but which is large enough to get us into a similar equil­ibrium. Each can­di­date’s suc­cess is pro­por­tional to their skill level, so our av­er­age out­come im­proves.

We already have a se­lec­tion pro­cess that does this. We know our win­ners will be on av­er­age bet­ter than our can­di­dates. So why not use that to our ad­van­tage?

Sup­pose we did a multi-stage com­pe­ti­tion. Be­fore, we would have had 10 ap­pli­cants for 1 slot. Ex­pand­ing that to 100 ap­pli­cants won’t do us any good di­rectly, be­cause of risk tak­ing. But run­ning 10 com­pe­ti­tions with 10 peo­ple each, then pit­ting those 10 win­ners against each other, will im­prove things for us.

By us­ing this tac­tic mul­ti­ple times, we can do quite a bit bet­ter. Weaker can­di­dates will al­most never sur­vive mul­ti­ple rounds.

What hap­pened here?

We cheated. We forced can­di­dates to take ob­serv­able, un­cor­re­lated risks in each differ­ent round. We de­stroyed the rule that risk tak­ing is free and easy, and as­sumed that a lucky re­sult in round 1 won’t help you in round 2.

If a low-skill per­son can per­ma­nently mimic in all ways a high-skill per­son, and we ob­serve that suc­cess, they are high skill now! A wor­thy win­ner. If they can’t, then they fall back down to Earth on fur­ther ob­ser­va­tion. This should make clear why the idea of un­limited cheap and ex­actly con­trol­led risk is profoundly bizarre. A test that works that way is a rather strange test.

So is a test that costs noth­ing to ad­minister. You get what you pay for.

The risk is that risk-tak­ing takes the form of ‘guess the right ap­proach to the test­ing pro­cess’ and thus test scores are cor­re­lated with­out hav­ing to link back to skill.

This is definitely a thing.

Dur­ing one all-day job in­ter­view, I made sev­eral fun­da­men­tal in­ter­view-skill mis­takes that hurt me in mul­ti­ple ses­sions. If I had fixed those mis­takes, I would have done much bet­ter all day, but would not have been much more skil­led at what they were test­ing for. A more rigor­ous or multi-step pro­cess could have only done so much. To get bet­ter in­for­ma­tion, they would have had to add a differ­ent kind of test. That would risk in­tro­duc­ing bad noise.

This seems typ­i­cal of similar con­tests and test­ing meth­ods de­signed to find strong can­di­dates.

A more re­al­is­tic model would in­tro­duce costs to par­ti­ci­pa­tion in the search pro­cess, for all par­ties. You’d have an­other trade-off be­tween hav­ing noise be cor­re­lated ver­sus min­i­miz­ing its size, mak­ing more rounds of anal­y­sis pro­gres­sively less use­ful.

Ad­ding more can­di­dates to the pool now clearly is good at first and then turns in­creas­ingly nega­tive.

#### VI. Pric­ing Peo­ple Out

There are two re­al­is­tic com­pli­ca­tions that can help us a lot.

The first is pric­ing peo­ple out. En­ter­ing a con­test is rarely free. I have been for­tu­nate that my last two job in­ter­views were at Valve Soft­ware and Jane Street Cap­i­tal. Both were ex­cep­tional com­pa­nies look­ing for ex­cep­tional peo­ple, and I came away from both in­ter­views feel­ing like I’d had a very fun and very ed­u­ca­tional ex­pe­rience, in ad­di­tion to lev­el­ing up my in­ter­view skills. So those par­tic­u­lar in­ter­views felt free or bet­ter. But most are not.

Most are more like when I ap­plied to col­leges. Each ad­di­tional col­lege meant a bunch of ex­tra work plus an ap­pli­ca­tion fee. Har­vard does not want to ad­mit a weak can­di­date. If we ig­nore the mo­ti­va­tion to show that you have lots of ap­pli­ca­tions, Har­vard would pre­fer that weak can­di­dates not ap­ply. It wastes time, and there’s a non-zero chance one will gain ad­mis­sion by ac­ci­dent. If Har­vard taxes ap­pli­ca­tions, by re­quiring ad­di­tional effort or rais­ing the fee, they will drive weak ap­pli­cants away and strengthen their pool, im­prov­ing the fi­nal se­lec­tions.

Har­vard also does this by mak­ing Har­vard hard. A suffi­ciently weak can­di­date should not want to go to Har­vard, be­cause they will pre­dictably flunk out. Mak­ing Har­vard harder, the way MIT is hard, would make their pool higher qual­ity once word got out.

We can think of some forms of haz­ing, or other bad ex­pe­riences for win­ners of com­pe­ti­tions, partly as a way to dis­cour­age weak can­di­dates from ap­ply­ing, and also partly as an ad­di­tional test to drive them out.

Ideally we also re­duce risk taken.

A can­di­date has un­cer­tainly in how strong they are, and how much they would benefit from the prize. If be­ing a stronger can­di­date is cor­re­lated with benefit­ing from win­ning, a cor­rect strat­egy be­comes to take less or no risk. If tak­ing a big risk causes me to win when I would oth­er­wise lose, I won a prize I don’t want. If tak­ing a big risk causes me to lose, I lost a prize I did want. That pushes me heav­ily to­wards low­er­ing my will­ing­ness to take risk, which in turn low­ers the com­pe­ti­tion level and en­courages me to take less risk still. Ex­cel­lent.

#### VII. Tak­ing Ex­tra Risk is Hard

Avoid­ing risk is also hard.

In the real world, there is a ‘nat­u­ral’ amount of risk in any ac­tivity. One is con­tin­u­ously offered op­tions with vary­ing risk lev­els.

Some of these choices are big, some small. Some­times the risky play is ‘bet­ter’ in an ex­pected value sense, some­times worse.

True max-min strate­gies that avoid even min­i­mal risks de­cline even small risks that would can­cel out over time. This is ex­pen­sive.

If one wants to max­i­mize risk at all costs, one ends up do­ing the more risky thing ev­ery time and takes bad gam­bles. This is also ex­pen­sive.

It is a hard prob­lem to get the best out­come given one’s de­sired level of risk, or to max­i­mize the chance of ex­ceed­ing some perfor­mance thresh­old, even with no op­po­nent. In games with an op­po­nent who wants to beat you and thus has the op­po­site in­cen­tives of yours (think foot­ball) it gets harder still. Real world perfor­mances are no­to­ri­ously ter­rible.

There are two ba­sic types of situ­a­tions with re­spect to risk.

Type one is where adding risk is ex­pen­sive. There is a nat­u­ral best route to work or line of play. There are other strate­gies that over­all are worse, but have big­ger up­side, such as tak­ing on par­tic­u­lar down­side tail risks in ex­change for tiny pay­offs, or hop­ing for a lucky re­sult. In the driv­ing ex­am­ple, one might take an on av­er­age slower route that has vari­able amounts of traf­fic, or one might drive faster and risk an ac­ci­dent or speed­ing ticket.

Available risk is limited. If I am two hours away by car, I might be able to do some­thing reck­less and maybe get there in an hour and forty five min­utes, but if I have to get there in an hour, it’s not go­ing to hap­pen.

I can hope to ever over­come only a limited skill bar­rier. If we are rac­ing in the In­di­anapo­lis 500, I might try to win the race by skip­ping a pit stop, or pass­ing more ag­gres­sively to make up ground, or choos­ing a car that is slightly faster but has more en­g­ine trou­ble. But if my car com­bined with my driv­ing skill is sub­stan­tially slower than yours (where sub­stan­tially means a minute over sev­eral hours) and your car doesn’t crash or die, I will never beat you.

If I had taken the math Olympiad exam (the USAMO) an­other hun­dred times, I might have got­ten a non-zero score some­times, but I was never get­ting onto the team. Pe­riod.

In these situ­a­tions, re­duc­ing risk be­yond the ‘nat­u­ral’ level may not even be pos­si­ble. If it is, it will be in­creas­ingly ex­pen­sive.

Type two is where gi­ant risks are the de­fault, then sac­ri­fices are made to con­tain those risks. Gam­blers who do not pay at­ten­tion to risk will always go broke. To be a win­ning gam­bler, one can ei­ther be lucky and re­tain large risk, or one can be skil­led and pay a lot of at­ten­tion to con­tain­ing risk. In the long term, con­tain­ing risk, in­clud­ing con­tain­ing risk by ceas­ing to play at all, is the only op­tion.

Com­peti­tors in type two situ­a­tions must be eval­u­ated ex­plic­itly on their risk man­age­ment, or on very long term re­sults, or any eval­u­a­tion is worth­less. If you are test­ing for good gam­blers and only have one day, you pay some at­ten­tion to re­sults but more at­ten­tion to the logic be­hind choices and siz­ing. Tests that do oth­er­wise get es­sen­tially ran­dom re­sults, and fol­low the pat­tern where re­duc­ing the ap­pli­cant pool im­proves the qual­ity of the win­ners.

Another note is that the risks com­peti­tors take can be cor­re­lated across com­peti­tors in many situ­a­tions. If you need a suffi­ciently high rank rather than a high raw score, those who take risks should seek to take un­cor­re­lated risks. Thus, in stock mar­ket or gam­bling com­pe­ti­tions, the pri­mary skill of­ten is in do­ing some­thing no one else would think to do, rather than in pick­ing a high ex­pected value choice. Some­times that’s what real risk means.

#### VIII. Cen­tral Responses

There are also four ad­di­tional re­sponses by those run­ning the com­pe­ti­tion, that are worth con­sid­er­ing.

The first re­sponse is to ob­serve a com­peti­tor’s level of risk tak­ing and test op­ti­miza­tion, and pe­nal­ize too much (or too lit­tle). This is of­ten quite easy. Every­one knows what a safe an­swer to ‘what is your great­est weak­ness’ looks like, bet size in simu­la­tions is trans­par­ent, and so on. If you re­spond to things go­ing badly early on with tak­ing a lot of risk, rather than be­ing re­spon­si­ble, will you do that with the com­pany’s money?

A good ad­mis­sions officer at a col­lege mostly knows in­stantly which es­says had pro­fes­sional help and which re­sumes are based on statis­ti­cal anal­y­sis, ver­sus who lived their best life and then ap­plied to col­lege.

A good com­pe­ti­tion de­sign gives you the op­por­tu­nity to mea­sure these con­sid­er­a­tions.

Such con­tests should be anti-in­duc­tive, if done right, with the re­ally sneaky play­ers play­ing on higher meta lev­els. Like ev­ery­thing else.

The sec­ond re­sponse is to vary the num­ber of win­ners based on how well com­peti­tors do. This is the de­fault.

If I in­ter­view three job ap­pli­cants and all of them show up hung over, I need to be pretty des­per­ate to take the one who was less hung over, rather than call in more can­di­dates to­mor­row. If I find three great can­di­dates for one job, I’ll do my best to find ways to hire all three.

Another vari­a­tion is that I have an in­sider I know well as the de­fault win­ner, and the ap­pli­ca­tion pro­cess is to see if I can do bet­ter than that, and to keep the in­sider and the com­pany hon­est, so again it’s mostly about cross­ing a bar.

The third re­sponse is that of­ten there isn’t even a ‘batch’ of ap­pli­ca­tions. There is only a se­ries of per­ma­nent yes/​no de­ci­sions un­til the po­si­tion is filled. This is the clas­sic prob­lem of find­ing a spouse or a sec­re­tary, where you can’t eas­ily go back once you re­ject some­one. Once you have a sense of the dis­tri­bu­tion of op­tions, you’re effec­tively look­ing for ‘good enough’ at ev­ery step, and that re­quire­ment doesn’t move much un­til time starts run­ning out.

Thus, most con­tests that care mostly about find­ing a wor­thy win­ner are closer to thresh­old re­quire­ments than they look. This makes it very difficult to cre­ate a con­ces­sion equil­ibrium. If you show up and aren’t good enough to beat con­tin­u­ing to search, your chances are very, very bad. If you show up and are are good enough to beat con­tin­u­ing to search, your chances are very good. The right strat­egy be­comes ei­ther to aim at this thresh­old, or if the field is large you might need to aim higher. You can never keep the field small enough to keep the low-skill play­ers hon­est.

The fourth re­sponse is to pun­ish suffi­ciently poor perfor­mance. This can be as mild as in-the-mo­ment so­cial em­bar­rass­ment – Si­mon mock­ing as­pirants in Amer­i­can Idol. It can be as se­ri­ous as ‘you’re fired,’ ei­ther from the same com­pany (you re­vealed you’re not good enough for your cur­rent job, or your up­side is limited), or from an­other com­pany (how dare you try to jump ship!). In fic­tion a failed ap­pli­ca­tion can be lethal. Even mild re­tal­i­a­tion is very effec­tive in im­prov­ing av­er­age qual­ity (and limit­ing the size) of the tal­ent pool.

#### IX. Prac­ti­cal Conclusions

We don’t purely want the best per­son for the job. We want a se­lec­tion pro­cess that bal­ances search costs, for all con­cerned, with find­ing the best per­son and per­haps get­ting your ap­pli­cants to im­prove their skill.

A weaker ver­sion of the pa­per’s core take-away heuris­tic seems to hold up un­der more anal­y­sis: There is a limit to how far ex­pand­ing a search helps you at all, even be­fore costs.

Rule 1: Pool qual­ity on the mar­gin usu­ally mat­ters more than quan­tity.

Bad ap­pli­cants that can make it through are more bad than they ap­pear. Ex­pand­ing the pool’s quan­tity at the ex­pense of av­er­age qual­ity, once your sup­ply of can­di­dates isn’t woe­fully in­ad­e­quate, is usu­ally a bad move.

Rule 2: Once your ap­pli­ca­tion pool prob­a­bly in­cludes enough iden­ti­fi­able top-qual­ity can­di­dates to fill all your slots, up to your abil­ity to differ­en­ti­ate, stop look­ing.

A larger pool will make your search more ex­pen­sive and difficult for both you and them, add more re­gret be­cause choices are bad, and won’t make you more likely to choose wisely.

Note that this is a later stop­ping point than the pa­per recom­mends. The pa­per says you should stop be­fore you fill all your slots, such that weak ap­pli­cants are en­couraged not to rep­re­sent them­selves as strong can­di­dates.

Also note that this rule has two ad­di­tional re­quire­ments. It re­quires the good can­di­dates be iden­ti­fi­able, since if some of them will blow it or you’ll blow notic­ing them, that doesn’t help you. It also re­quires that there not be out­liers wait­ing to be dis­cov­ered, that you would rec­og­nize if you saw them.

Another, similar heuris­tic that is also good is, make the com­pe­ti­tion just in­tense enough that wor­thy can­di­dates are wor­ried they won’t get the job. Then stop.

Rule 3: Weak can­di­dates must ei­ther be driven away, or re­warded for re­veal­ing them­selves. If weak can­di­dates can suc­cess­fully fake be­ing strong, it is worth a lot to en­sure that this strat­egy is pun­ished.

Good pun­ish­ments in­clude ap­pli­ca­tion fees, giv­ing up other op­por­tu­ni­ties or jobs, long or stress­ful com­pe­ti­tions, and pun­ish­ments for failure rang­ing from mild in-the-room so­cial dis­ap­proval or be­ing made to feel dumb, up to ma­jor re­tal­i­a­tion.

Another great pun­ish­ment is to give less re­wards to suc­cess if it is by a low skil­led per­son. If their prize is some­thing they can’t use – they’ll flunk out, or get fired quickly, or similar – then they will be less in­clined to ap­ply.

Re­ward for par­ti­ci­pa­tion is prob­a­bil­ity of suc­cess times re­ward for suc­cess, while cost is mostly fixed. Tilt this enough and your bad-ap­pli­cant prob­lem clears up.

Fail to tilt this enough, and you have a big lemon prob­lem on mul­ti­ple lev­els. Weak com­peti­tors will choose your com­pe­ti­tion over oth­ers, giv­ing strong ap­pli­cants less rea­son to bother both in terms of chance of win­ning, and de­sire to win. Who wants to win only to be among a bunch of fak­ers who got lucky? That’s no fun and it’s no good for your rep­u­ta­tion ei­ther.

It will be difficult to pun­ish weak can­di­dates for fak­ing be­ing strong ver­sus pun­ish­ing them in gen­eral. But if you can do it, that’s great.

The flip side is that we can re­ward them for be­ing hon­est. That will of­ten be eas­ier.

Prevent­ing a re­bel­lion of the less skil­led is a con­straint on mechanism de­sign. We must ei­ther ap­pease them, or wipe them out.

Rule 4: Suffi­ciently hard, high stakes com­pe­ti­tions that are vuln­er­a­ble to gam­ing and/​or re­source in­vest­ment are highly toxic re­source mon­sters.

This is get­ting away from the pa­per’s points, since the pa­per doesn’t deal with re­source costs to par­ti­ci­pa­tion or search, but it seems quite im­por­tant.

In some cases, we want these highly toxic re­source mon­sters. We like that ev­ery mem­ber of area sports team puts the rest of their life mostly on hold and fo­cuses on win­ning sport­ing events. The test is ex­actly what we want them to ex­cel at. We also get to use the trick of test­ing them in dis­crete steps, via differ­ent games and por­tions of games, to pre­vent ‘risk’ from play­ing too much of a fac­tor.

In most cases, where the match be­tween test prepa­ra­tion, suc­cess­ful test strate­gies and de­sired skills is not so good, this highly toxic re­source mon­ster is very, very bad.

Con­sider school, or more gen­er­ally child­hood. The more we re­ward good perfor­mance on a test, and pun­ish failure, the more re­sources are eaten al­ive by the test. In the ex­treme, all of most child’s ex­pe­riences and re­sources, and even those of their par­ents, be­come eaten. From dis­cus­sions I’ve had, much of high school in China has some­thing re­mark­ably close to this, as ev­ery­thing is dropped for years to cram for a life-chang­ing col­lege en­trance exam.

Rule 5: Re­wards must be able to step out­side of a strict scor­ing mechanism.

Any scor­ing mechanism is vuln­er­a­ble to gam­ing and to risk tak­ing, and to Good­hart’s Law. To avoid ev­ery­one’s mo­ti­va­tion, po­ten­tially their en­tire life and be­ing, be­ing sub­verted, we need to be re­ward­ing and pun­ish­ing from the out­side look­ing in on what is hap­pen­ing. This has to carry enough weight to be com­pet­i­tive with the prizes them­selves.

Con­sider this metaphor.

If the real value of many jour­neys is the friends you made along the way, that can be true in both di­rec­tions. Often one’s friends, ex­pe­riences and les­sons end up dwarfing in im­por­tance the prize or mo­ti­va­tion one started out with; fre­quently we need a McGuffin and re­stric­tions that breed cre­ativity and fo­cus to al­low co­or­di­na­tion, more than any prize.

It also works the other way. The value of your friends can be that they mo­ti­vate and help you to be wor­thy of friend­ship, to do and ac­com­plish things. The rea­son we took the jour­ney the right way was so that we would make friends along it. This pre­vents us from fal­ling to Good­hart’s Law. We don’t nar­row in on check­ing off a box. Even in a pure com­pe­ti­tion, like a Magic tour­na­ment, we know the style points mat­ter, and we know that it mat­ters whether we think the style points mat­ter, and so on.

The ex­is­tence of the so­cial, of var­i­ous lev­els and lay­ers, the abil­ity to step out­side the game, and the worry about un­known un­knowns, is what guards sys­tems from break­down un­der the pres­sure of met­rics. Given any util­ity func­tion we know about, how­ever well de­signed, and suffi­cient op­ti­miza­tion pres­sure, things end badly. You need to pre­serve the value of un­known un­knowns.

This leads us to:

Rule 6: Too much knowl­edge by po­ten­tial com­peti­tors can be very bad.

The more com­peti­tors do the ‘nat­u­ral’ thing, that max­i­mizes their ex­pected out­put, the bet­ter off we usu­ally are. The less they know about how they are be­ing eval­u­ated, on what lev­els, with what thresh­old of suc­cess, the less they can game the sys­tem, and the less suc­cess de­pends on gam­ing skill or luck.

All the truly per­verse out­comes came from sce­nar­ios where com­peti­tors knew they were des­per­a­does, and tak­ing huge risks was not ac­tu­ally risky for them.

Hav­ing a high thresh­old is only bad if com­peti­tors know about it. If they don’t know, it can’t hurt you. If they sus­pect a high thresh­old, but they don’t know, that miti­gates a lot of the dam­age. In many cases, the com­peti­tor is bet­ter served by play­ing to suc­ceed in the wor­lds where the thresh­old is low, and ac­cept los­ing when the thresh­old is un­ex­pect­edly high, which means do­ing ex­actly what you want. More un­cer­tainty also makes the choices of oth­ers less cer­tain, which makes situ­a­tions harder to game effec­tively.

Power hides in­for­ma­tion. Power does not re­veal its in­ten­tions. This is known, and the dy­nam­ics ex­plored here are part of why. You want peo­ple op­ti­miz­ing for things you won’t even be aware of, or don’t care about, but which they think you might be aware of and care about. You want to avoid them try­ing too hard to game the things you do look at, which would also be bad. You make those in your power worry at ev­ery step that if they try any­thing, or fail in any way, it could be what costs them. You cause peo­ple to want to curry fa­vor. You also al­low your­self to al­ter the re­sults, if they’re about to come out ‘wrong’. The more you re­veal about how you work, the less power you have. In this case, the power to find wor­thy win­ners.

This is in ad­di­tion to the fact that some con­sid­er­a­tions that mat­ter are not legally al­lowed to be con­sid­ered, and that law­suits might fly, and other rea­sons why de­ci­sion mak­ers en­sure that no one knows what they were think­ing.

Thus we must work even harder to re­ward those who ex­plain them­selves and thereby help oth­ers, and who re­al­ize that the key hard thing is, as Hag­bard Celine re­minds us, to avoid power.

But still get things done.

• Note: I got about a third through this, and… had a strong sense that this was about some­thing im­por­tant that was worth my time un­der­stand­ing, but some­thing about the de­scrip­tion/​ex­am­ples made that hard to do.

(This does leave the ar­ti­cle in a state where, I pre­dict, to un­der­stand it, I’d have to in­vest effort, and the in­vested effort would in fact im­prove my un­der­stand­ing. But this feels more ac­ci­den­tal than op­ti­mal)

I think part of it is just that the names are fairly un­in­tu­itive. Con­ces­sion equil­ibria and Challenge Equil­ibria don’t neatly map into what­ever they’re sup­posed to be about in my head. I have an eas­ier time un­der­stand­ing jar­gon if I know why a name was cho­sen.

• In that par­tic­u­lar case, I would have cho­sen differ­ent names that likely would have res­onated bet­ter, but felt it was im­por­tant not to change the pa­per’s cho­sen la­bels, even though they seemed not great. That might have been an er­ror.

Their ex­pla­na­tion is that the ques­tion is, will the weaker can­di­dates con­cede that they are weaker than strong ones and let the strong ones all win, or will they challenge the stronger can­di­dates.

Sugges­tions for other ways to make this more clear are ap­pre­ci­ated. I’d like to be able to write things like this in a way that peo­ple ac­tu­ally read and benefit from.

• I think sim­ply ex­plain­ing that in the OP would have helped

• Dat­a­point: I got the point about challenge equil­ibria be­ing the place where ev­ery­one has to start fight­ing and tak­ing risks. How­ever I thought that ‘con­ces­sion’ referred to the em­ploy­ers mak­ing con­ces­sions to weaker can­di­dates, by hiring some. I sup­pose the pa­per’s ex­pla­na­tion makes more sense.

• I al­most gave up halfway through, for much the same rea­sons, but this some­how felt im­por­tant, the way some se­quences/​codex posts felt im­por­tant at the time, so I pow­ered through. I definitely will need a sec­ond pass on some of the large in­fer­en­tial steps, but over­all this felt long-term valuable.

• Pro­moted to cu­rated: I’ve ap­plied the ideas in this post to a va­ri­ety of do­mains since I first read it, and I think it was quite use­ful in a lot of them (ex­am­ples of ques­tions I was think­ing about: “How much more progress should we ex­pect in Science given that at least 10x more re­sources are available for re­cruit­ing sci­en­tists?” and “How much does the size of the EA and Ra­tion­al­ity com­mu­nity de­ter­mine the qual­ity of peo­ple work­ing at or­ga­ni­za­tions in the com­mu­nity?”).

I do think I had to read this post at least twice to re­ally grasp any of the core points, and am still strug­gling with some of them. I think this was par­tially the re­sult of try­ing to trans­late a math­e­mat­i­cal econ pa­per into a post with­out any equa­tions, which is always a re­ally big challenge, but I also think I could have benefit­ted from a longer ini­tial sec­tion that just sum­ma­rized the econ pa­per, and then a sep­a­rate sec­tion that com­mented on it. As it stood, I think I ended up some­what con­fused about which points were cov­ered in the econ pa­per, and which ones were your points.

But over­all, I think this post changed my mind on some im­por­tant ideas, which is one of the most valuable things a post can do. Thanks a lot for writ­ing it.

• Much ado about noth­ing, I think this is the most quotable thing you’ve ever writ­ten.

Ap­pease or wipe out, per­verse des­per­a­does, etc etc.

Any­ways — ex­cep­tional piece. Feels like clas­si­cal Zvi deep anal­y­sis as ap­plied to high-lev­er­age non-con­structed sce­nar­ios. Or rather, how to turn a draft into con­structed, with­out par­ti­ci­pants know­ing. One mar­vels over what type of win rate would be pos­si­ble if this can be suc­cess­fully ex­e­cuted....

• Yeah. To be our best selves, we need the right amount of challenge, not too lit­tle and not too much. I would also add that we need the right fluc­tu­a­tion of challenge: it shouldn’t be too steady over time, or too rare and spiky. Other­wise you get dis­tor­tions in be­hav­ior: cram­ming for an exam, or cheat­ing, or break­ing down from stress. More­over, the op­ti­mal amount and fluc­tu­a­tion of challenge is differ­ent for ev­ery per­son and ev­ery task. I no­ticed that when try­ing to teach peo­ple stuff, and then started ap­ply­ing it to my­self as well.

• Fi­nance is full of such hid­den risks. Start a fund, take in­sane risks, maybe gen­er­ate out­size re­turns ⇒ profit by grow­ing funds un­der man­age­ment and take a % of that.

If not, try, try again. Google “in­cu­ba­tor funds”. Taleb’s Fooled by Ran­dom­ness has many ex­am­ples.

But if you are tak­ing risks, won’t peo­ple see it and shun you? Prob­a­bly not. It is very hard to see risk af­ter the event. It is not too hard to “stuff the risk into the tails”. There are even con­slu­tants who will help you do this.

Even with­out cheat­ing, when a test is very stringent, then an alarm­ing frac­tion of the ap­par­ent top perform­ers may have just had a lucky day.

• I’ll echo the other com­menters in say­ing this was in­ter­est­ing and valuable, but also (per­haps nec­es­sar­ily) left me to cross some sig­nifi­cant in­fer­en­tial gaps. The biggest for me were in go­ing from game-de­scrip­tions to equil­ibria. Maybe this is just a thing that can’t be made in­tu­itive to peo­ple who haven’t solved it out? But I think that, e.g., graphs of the kinds of dis­tri­bu­tions you get in differ­ent cases would have helped me, at least.

I also had to think for a bit about what as­sump­tions you were mak­ing here:

A more rigor­ous or multi-step pro­cess could have only done so much. To get bet­ter in­for­ma­tion, they would have had to add a differ­ent kind of test. That would risk in­tro­duc­ing bad noise.

A very naive model says ad­di­tional tests → un­cor­re­lated noise → less noise in the av­er­age.

More re­al­is­ti­cally, we can as­sume that some di­men­sions of qual­ity are eas­ier to Good­hart than oth­ers, and you don’t know which are which be­fore­hand. But then, how do you know your ini­tial choice of test isn’t Good­hart-y? And even if the Good­hart noise is much larger than the true vari­a­tion in skill, it seems like you can ag­gre­gate scores in a way that would al­low you to make use of the in­for­ma­tion from the differ­ent tests with­out be­ing bam­boo­zled. (Depend­ing on your use-case, you could take the av­er­age of a con­cave func­tion of the scores, or use quan­tiles, or take the min score, etc.)

In re­al­ity, though, you usu­ally have some idea what di­men­sions are im­por­tant for the job. Maybe it’s some­thing like PCA, with the noise/​sig­nal ra­tio of di­men­sions de­creas­ing as you go down the list of com­po­nents. Then that de­crease, plus marginal costs of more tests, means that there is some nat­u­ral stop­ping point. I guess that makes sense, but it took a bit for me to get there. Is that what you were think­ing?

• For sec­tion III. it would be re­ally helpful to con­cretely work through what hap­pens in the ex­am­ples of di­vorce, nu­clear war, gov­ern­ment de­fault, etc. What’s a plau­si­ble thought pro­cess of the agents in­volved?

My cur­rent model is some­thing like “my mar­riage is worse than I find tol­er­able, so I have noth­ing to loose. Now that di­vorce is le­gal, I might as well gam­ble my sav­ings in the cas­ino. If I win we could move to a bet­ter home and maybe save the re­la­tion­ship, if I lose we’ll get di­vorced.”

Peo­ple who have noth­ing to lose start tak­ing risks which fill up the merely pos­si­bly bad out­comes un­til they start mat­ter­ing.

• First, I did not get though the en­tire post. That said, some thoughts that oc­curred to me.

• Un­der what con­di­tions is does this gen­er­al­ize? I was try­ing to ap­ply it to my world. If I am hiring part of the prob­lem is not about hard skills but soft skill—and those will differ a bit based on what team the new per­son will be part of.

• Good game the­ory always wins seems to have di­rected at the can­di­dates but what hap­pens when the game mas­ter (lack of a bet­ter term) has poor game the­ory or doesn’t think such be­hav­ior good? Again, from a cor­po­rate hiring stand­point that might be a real situ­a­tion. In terms of con­sumers shop­ping around that might also ap­ply (the av­er­age ad­ver­tiser is a bet­ter game the­o­rists than the av­er­age con­sumer). Do the con­clu­sions still hold here?

• Closely re­lated to the first bul­let, just what mar­ket set­tings were con­sid­ered for the anal­y­sis.

• I re­call an old pa­per that asked if duopoly was more com­pet­i­tive than the stan­dard atom­istic com­pe­ti­tion in the Econ 101 pure com­pe­ti­tion model. That was largely driven by search costs and asym­met­ric in­for­ma­tion prob­lems. Would the­ory here be com­ple­men­tary?

My gut re­ac­tion is there is some value and truth here but that it should not be taken too se­ri­ously. Con­sider it an area of con­sid­er­a­tion and el­e­ment of a solu­tion rather than a solu­tion to any prob­lem of get­ting the best out of the messy so­cial in­sti­tu­tions that me­di­ate our ac­tivi­ties and greatly in­fluence the col­lec­tive/​ag­gre­gate re­sults.

• Small edit to my own com­ment. I ne­glected to point out my com­ments and as­sess­ment were re­ally about the what I un­der­stood to be the po­si­tion of the pa­per un­der re­view and not about the anal­y­sis per se.