Another view of quantilizers: avoiding Goodhart’s Law

Good­hart’s law states:

Any ob­served statis­ti­cal reg­u­lar­ity will tend to col­lapse once pres­sure is placed upon it for con­trol pur­poses.

One way of fram­ing this is that, when you are solv­ing some op­ti­miza­tion prob­lem, a met­ric that is cor­re­lated with a de­sired ob­jec­tive will of­ten stop be­ing cor­re­lated with the ob­jec­tive when you look at the ex­treme val­ues of the met­ric. For ex­am­ple, al­though the num­ber of pa­per­clips a pa­per­clip fac­tory pro­duces tends to be cor­re­lated with how use­ful the fac­tory is for its owner’s val­ues, a pa­per­clip fac­tory that pro­duces an ex­tremely high num­ber of pa­per­clips is likely to be quite bad for its owner’s val­ues.

Let’s try to for­mal­ize this. Sup­pose you are find­ing some that op­ti­mizes some un­known ob­jec­tive func­tion , and you have some es­ti­mate which you be­lieve to ap­prox­i­mate . Speci­fi­cally, you have a guaran­tee that, for some base dis­tri­bu­tion , does not in­cor­rectly es­ti­mate much on av­er­age:

We might sup­pose that we only want to take ac­tions if our ex­pected is above zero; oth­er­wise, it would be bet­ter to do noth­ing.

Given this, how do you pick an to guaran­tee a good ob­jec­tive value across all pos­si­ble ob­jec­tive func­tions ? Naively, you might pick ; how­ever, if this has a low prob­a­bil­ity un­der , then it is pos­si­ble for to be much higher than with­out caus­ing to over­es­ti­mate much on av­er­age.

If is cho­sen ad­ver­sar­i­ally, the op­ti­miza­tion prob­lem to solve is: where is the prob­a­bil­ity that the agent takes an ac­tion at all, and is the ac­tion dis­tri­bu­tion if it takes an ac­tion. Equiv­a­lently, since the most ad­ver­sar­ial val­ues will not ever be above : Define :

In fact, when , the solu­tion to this op­ti­miza­tion prob­lem is a -quan­tilizer with util­ity func­tion and base dis­tri­bu­tion , for some . The proof can be found in the “Op­ti­mal­ity of quan­tiliz­ers un­der the cost con­straint” sec­tion of the post about quan­tiliz­ers. will be set to 1 if and only if this quan­tilizer is guaran­teed pos­i­tive util­ity.

This pro­vides an­other view of what quan­tiliz­ers are do­ing. In effect, they are treat­ing the “util­ity func­tion” as an es­ti­mate of the true util­ity func­tion that tends to be ac­cu­rate on av­er­age across the base dis­tri­bu­tion , and con­ser­va­tively op­ti­miz­ing given ad­ver­sar­ial un­cer­tainty about the true util­ity func­tion .