Goodhart Taxonomy

Good­hart’s Law states that “any ob­served statis­ti­cal reg­u­lar­ity will tend to col­lapse once pres­sure is placed upon it for con­trol pur­poses.” How­ever, this is not a sin­gle phe­nomenon. I pro­pose that there are (at least) four differ­ent mechanisms through which proxy mea­sures break when you op­ti­mize for them.

The four types are Re­gres­sional, Causal, Ex­tremal, and Ad­ver­sar­ial. In this post, I will go into de­tail about these four differ­ent Good­hart effects us­ing math­e­mat­i­cal ab­strac­tions as well as ex­am­ples in­volv­ing hu­mans and/​or AI. I will also talk about how you can miti­gate each effect.

Through­out the post, I will use to re­fer to the true goal and use to re­fer to a proxy for that goal which was ob­served to cor­re­late with and which is be­ing op­ti­mized in some way.


Quick Reference

  • Re­gres­sional Good­hart—When se­lect­ing for a proxy mea­sure, you se­lect not only for the true goal, but also for the differ­ence be­tween the proxy and the goal.

    • Model: When is equal to , where is some noise, a point with a large value will likely have a large value, but also a large value. Thus, when is large, you can ex­pect to be pre­dictably smaller than .

    • Ex­am­ple: height is cor­re­lated with bas­ket­ball abil­ity, and does ac­tu­ally di­rectly help, but the best player is only 6′3″, and a ran­dom 7′ per­son in their 20s would prob­a­bly not be as good

  • Causal Good­hart—When there is a non-causal cor­re­la­tion be­tween the proxy and the goal, in­ter­ven­ing on the proxy may fail to in­ter­vene on the goal.

    • Model: If causes (or if and are both caused by some third thing), then a cor­re­la­tion be­tween and may be ob­served. How­ever, when you in­ter­vene to in­crease through some mechanism that does not in­volve , you will fail to also in­crease .

    • Ex­am­ple: some­one who wishes to be taller might ob­serve that height is cor­re­lated with bas­ket­ball skill and de­cide to start prac­tic­ing bas­ket­ball.

  • Ex­tremal Good­hart—Wor­lds in which the proxy takes an ex­treme value may be very differ­ent from the or­di­nary wor­lds in which the cor­re­la­tion be­tween the proxy and the goal was ob­served.

    • Model: Pat­terns tend to break at sim­ple joints. One sim­ple sub­set of wor­lds is those wor­lds in which is very large. Thus, a strong cor­re­la­tion be­tween and ob­served for nat­u­rally oc­cur­ing val­ues may not trans­fer to wor­lds in which is very large. Fur­ther, since there may be rel­a­tively few nat­u­rally oc­cur­ing wor­lds in which is very large, ex­tremely large may co­in­cide with small val­ues with­out break­ing the statis­ti­cal cor­re­la­tion.

    • Ex­am­ple: the tallest per­son on record, Robert Wad­low, was 8′11″ (2.72m). He grew to that height be­cause of a pi­tu­itary di­s­or­der, he would have strug­gled to play bas­ket­ball be­cause he “re­quired leg braces to walk and had lit­tle feel­ing in his legs and feet.”

  • Ad­ver­sar­ial Good­hart—When you op­ti­mize for a proxy, you provide an in­cen­tive for ad­ver­saries to cor­re­late their goal with your proxy, thus de­stroy­ing the cor­re­la­tion with your goal.

    • Model: Con­sider an agent with some differ­ent goal . Since they de­pend on com­mon re­sources, and are nat­u­rally op­posed. If you op­ti­mize as a proxy for , and knows this, is in­cen­tivized to make large val­ues co­in­cide with large val­ues, thus stop­ping them from co­in­cid­ing with large val­ues.

    • Ex­am­ple: as­piring NBA play­ers might just lie about their height.


Re­gres­sional Goodhart

When se­lect­ing for a proxy mea­sure, you se­lect not only for the true goal, but also for the differ­ence be­tween the proxy and the goal.

Ab­stract Model

When is equal to , where is some noise, a point with a large value will likely have a large value, but also a large value. Thus, when is large, you can ex­pect to be pre­dictably smaller than .

The above de­scrip­tion is when is meant to be an es­ti­mate of . A similar effect can be seen when is only meant to be cor­re­lated with by look­ing at per­centiles. When a sam­ple is cho­sen which is a typ­i­cal mem­ber of the top per­cent of all val­ues, it will have a lower value than a typ­i­cal mem­ber of the top per­cent of all val­ues. As a spe­cial case, when you se­lect the high­est value, you will of­ten not se­lect the high­est value.

Examples

Ex­am­ples of Re­gres­sional Good­hart are ev­ery­where. Every time some­one does some­thing that is any­thing other than the thing that max­i­mizes their goal, you could view it as them op­ti­miz­ing some kind of proxy (and the ac­tion to max­i­mize the proxy is not the same as the ac­tion to max­i­mize the goal).

Re­gres­sion to the Mean, Win­ner’s Curse, and Op­ti­mizer’s Curse are all ex­am­ples of Re­gres­sional Good­hart, as is the Tails Come Apart phe­nomenon.

Re­la­tion­ship with Other Good­hart Phenomena

Re­gres­sional Good­hart is by far the most be­nign of the four Good­hart effects. It is also the hard­est to avoid, as it shows up ev­ery time the proxy and the goal are not ex­actly the same.

Mitigation

When fac­ing only Re­gres­sional Good­hart, you still want to choose the op­tion with the largest proxy value. While the proxy will be an over­es­ti­mate it will still be bet­ter in ex­pec­ta­tion than op­tions with a smaller proxy value. If you have con­trol over what prox­ies to use, you can miti­gate Re­gres­sional Good­hart by choos­ing prox­ies that are more tightly cor­re­lated with your goal.

If you are not just try­ing to pick the best op­tion, but also try­ing to have an ac­cu­rate pic­ture of what the true value will be, Re­gres­sional Good­hart may cause you to over­es­ti­mate the value. If you know the ex­act re­la­tion­ship be­tween the proxy and the goal, you can ac­count for this by just calcu­lat­ing the ex­pected goal value for a given proxy value. If you have ac­cess to a sec­ond proxy with an er­ror in­de­pen­dent from the er­ror in the first proxy, you can use the first proxy to op­ti­mize, and the sec­ond proxy to get an ac­cu­rate ex­pec­ta­tion of the true value. (This is what hap­pens when you set aside some train­ing data to use for test­ing.)


Causal Goodhart

When there is a non-causal cor­re­la­tion be­tween the proxy and the goal, in­ter­ven­ing on the proxy may fail to in­ter­vene on the goal.

Ab­stract Model

If causes (or if and are both caused by some third thing), then a cor­re­la­tion be­tween and may be ob­served. How­ever, when you in­ter­vene to in­crease through some mechanism that does not in­volve , you will fail to also in­crease V.

Examples

Hu­mans of­ten avoid naive Causal Good­hart er­rors, and most ex­am­ples I can think of sound ob­nox­ious (like eat­ing caviar to be­come rich). One pos­si­ble ex­am­ple is a hu­man who avoids doc­tor vis­its be­cause not be­ing told about health is a proxy for be­ing healthy. (I do not know enough about hu­mans to know if Causal Good­hart is ac­tu­ally what is go­ing on here.)

I also can­not think of a good AI ex­am­ple. Most AI is not in act­ing in the kind of en­vi­ron­ment where Causal Good­hart would be a prob­lem, and when it is act­ing in that kind of en­vi­ron­ment Causal Good­hart er­rors are eas­ily avoided.

Most of the time the phrase “Cor­re­la­tion does not im­ply cau­sa­tion” is used it is point­ing out that a pro­posed policy might be sub­ject to Causal Good­hart.

Re­la­tion­ship with Other Good­hart Phenomena

You can tell the differ­ence be­tween Causal Good­hart and the other three types be­cause Causal Good­hart goes away when just sam­ple a world with large proxy value, rather than in­ter­vene to cause the proxy to hap­pen.

Mitigation

One way to avoid Causal Good­hart is to only sam­ple from or choose be­tween wor­lds ac­cord­ing to their proxy val­ues, rather than caus­ing the proxy. This clearly can­not be done in all situ­a­tions, but it is use­ful to note that there is a class of prob­lems for which Causal Good­hart can­not cause prob­lems. For ex­am­ple, con­sider choos­ing be­tween al­gorithms based on how well they do on some test in­puts, and your goal is to choose an al­gorithm that performs well on ran­dom in­puts. The fact that you choose an al­gorithm does not effect its perfor­mance, and you don’t have to worry about Causal Good­hart.

In cases where you ac­tu­ally change the proxy value, you can try to in­fer the causal struc­ture of the vari­ables us­ing statis­ti­cal meth­ods, and check that the proxy ac­tu­ally causes the goal be­fore you in­ter­vene on the proxy.


Ex­tremal Goodhart

Wor­lds in which the proxy takes an ex­treme value may be very differ­ent from the or­di­nary wor­lds in which the cor­re­la­tion be­tween the proxy and the goal was ob­served.

Ab­stract Model

Pat­terns tend to break at sim­ple joints. One sim­ple sub­set of wor­lds is those wor­lds in which is very large. Thus, a strong cor­re­la­tion be­tween and ob­served for nat­u­rally oc­cur­ing val­ues may not trans­fer to wor­lds in which is very large. Fur­ther, since there may be rel­a­tively few nat­u­rally oc­cur­ing wor­lds in which is very large, ex­tremely large may co­in­cide with small val­ues with­out break­ing the statis­ti­cal cor­re­la­tion.

Examples

Hu­mans evolve to like sug­ars, be­cause sug­ars were cor­re­lated in the an­ces­tral en­vi­ron­ment (which has fewer sug­ars) with nu­tri­tion and sur­vival. Hu­mans then op­ti­mize for sug­ars, have way too much, and be­come less healthy.

As an ab­stract math­e­mat­i­cal ex­am­ple, let and be two cor­re­lated di­men­sions in a mul­ti­vari­ate nor­mal dis­tri­bu­tion, but we cut off the nor­mal dis­tri­bu­tion to only in­clude the ball of points in which for some large . This ex­am­ple rep­re­sents a cor­re­la­tion be­tween and in nat­u­rally oc­cur­ring points, but also a bound­ary around what types of points are fea­si­ble that need not re­spect this cor­re­la­tion. Imag­ine you were to sam­ple points and take the one with the largest value. As you in­crease , at first, this op­ti­miza­tion pres­sure lets you find bet­ter and bet­ter points for both and , but as you in­crease to in­finity, even­tu­ally you sam­ple so many points that you will find a point near . When enough op­ti­miza­tion pres­sure was ap­plied, the cor­re­la­tion be­tween and stopped mat­ter­ing, and in­stead the bound­ary of what kinds of points were pos­si­ble at all de­cided what kind of point was se­lected.

Many ex­am­ples of ma­chine learn­ing al­gorithms do­ing bad be­cause of overfit­ting are a spe­cial case of Ex­tremal Good­hart.

Re­la­tion­ship with Other Good­hart Phenomena

Ex­tremal Good­hart differs from Re­gres­sional Good­hart in that Ex­tremal Good­hart goes away in sim­ple ex­am­ples like cor­re­lated di­men­sions in a mul­ti­vari­ate nor­mal dis­tri­bu­tion, but Re­gres­sional Good­hart does not.

Mitigation

Quan­tiliza­tion and Reg­u­lariza­tion are both use­ful for miti­gat­ing Ex­tremal Good­hart effects. In gen­eral, Ex­tremal Good­hart can be miti­gated by choos­ing an op­tion with a high proxy value, but not so high as to take you to a do­main dras­ti­cally differ­ent from the one in which the proxy was learned.


Ad­ver­sar­ial Goodhart

When you op­ti­mize for a proxy, you provide an in­cen­tive for ad­ver­saries to cor­re­late their goal with your proxy, thus de­stroy­ing the cor­re­la­tion with your goal.

Ab­stract Model

Con­sider an agent with some differ­ent goal . Since they de­pend on com­mon re­sources, and are nat­u­rally op­posed. If you op­ti­mize as a proxy for , and knows this, is in­cen­tivized to make large val­ues co­in­cide with large val­ues, thus stop­ping them from co­in­cid­ing with large val­ues.

Examples

When you use a met­ric to choose be­tween peo­ple, but then those peo­ple learn what met­ric you use and game that met­ric, this is an ex­am­ple of Ad­ver­sar­ial Good­hart.

Ad­ver­sar­ial Good­hart is the mechanism be­hind a su­per­in­tel­li­gent AI mak­ing a Treach­er­ous Turn. Here, is do­ing what the hu­mans want for­ever. is do­ing what the hu­mans want in the train­ing cases where the AI does not have enough power to take over, and is what­ever the AI wants to do with the uni­verse.

Ad­ver­sar­ial Good­hart is also be­hind the ma­lig­nancy of the uni­ver­sal prior, where you want to pre­dict well for­ever (), so hy­pothe­ses might pre­dict well for a while (), so that they can ma­nipu­late the world with their fu­ture pre­dic­tions ().

Re­la­tion­ship with Other Good­hart Phenomena

Ad­ver­sar­ial Good­hart is the pri­mary mechanism be­hind the origi­nal Good­hart’s Law.

Ex­tremal Good­hart can hap­pen even with­out any ad­ver­saries in the en­vi­ron­ment. How­ever, Ad­ver­sar­ial Good­hart may take ad­van­tage of Ex­tremal Good­hart, as an ad­ver­sary can more eas­ily ma­nipu­late a small num­ber of wor­lds with ex­treme proxy val­ues, than it can ma­nipu­late all of the wor­lds.

Mitigation

Suc­ces­fully avoid­ing Ad­ver­sar­ial Good­hart prob­lems is very difficult in the­ory, and we un­der­stand very lit­tle about how to do this. In the case of non-su­per­in­tel­li­gent ad­ver­saries, you may be able to avoid Ad­ver­sar­ial Good­hart by keep­ing your prox­ies se­cret (for ex­am­ple, not tel­ling your em­ploy­ees what met­rics you are us­ing to eval­u­ate them). How­ever, this is un­likely to scale to deal­ing with su­per­in­tel­li­gent ad­ver­saries.

One tech­nique that might help in miti­gat­ing Ad­ver­sar­ial Good­hart is to choose a proxy that is so sim­ple and op­ti­mize so hard that ad­ver­saries have no or min­i­mal con­trol over the world which max­i­mizes that proxy. (I want to epha­size that this is not a good plan for avoid­ing Ad­ver­sar­ial Good­hart; it is just all I have.)

For ex­am­ple, say you have a com­pli­cated goal that in­cludes want­ing to go to Mars. If you use a com­pli­cated search pro­cess to find a plan that is likely to get you to Mars, ad­ver­saries in your search pro­cess may sug­gest a plan that in­volves build­ing a su­per­in­tel­li­gence that gets you to Mars, but also kills you.

On the other hand, if you use the proxy of get­ting to Mars as fast as pos­si­ble and op­ti­mize very hard, then (maybe) ad­ver­saries can’t add bag­gage to a pro­posed plan with­out be­ing out se­lected by a plan with­out that bag­gage. Bulid­ing a su­per­in­tel­li­gence maybe takes more time than just hav­ing the plan tell you how to build a rocket quickly. (Note that the plan will likely in­clude things like ac­cel­er­a­tion that hu­mans can’t han­dle and nanobots that don’t turn off, so Ex­tremal Good­hart will still kill you.)