Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects—Clarifying Thoughts, Part 1)

This is the first post in a small se­quence I’m writ­ing on “Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts” (I have re-or­ga­nized to make part 2, “Re­vis­it­ing What Op­ti­miza­tion Means” sep­a­rate.)

Re­lated to: How does Gra­di­ent Des­cent In­ter­act with Good­hart?, Con­struct­ing Good­hart, Selec­tion vs Control

Next Posts: Re­vis­it­ing What Op­ti­miza­tion Means with Selec­tion vs. Con­trol, then Ap­ply­ing Overop­ti­miza­tion to Selec­tion vs. Control

Introduction

Good­hart’s law comes in a few fla­vors, as origi­nally pointed out by Scott, and for­mal­ized a bit more in our joint pa­per. When dis­cussing that pa­per, or af­ter­wards, we strug­gled with some­thing Abram Dem­ski clar­ified re­cently, which is the differ­ence be­tween se­lec­tion and con­trol. This mat­ters for for­mal­iz­ing what hap­pens, es­pe­cially when ask­ing about how Good­hart oc­curs in spe­cific types of op­ti­miz­ers, as Scott asked re­cently.

Epistemic Sta­tus: This is for de-con­fus­ing my­self, and has been helpful. I’m pre­sent­ing what I am fairly con­fi­dent I un­der­stand well for the con­tent writ­ten so far, but I’m un­clear about use­ful­ness for oth­ers, or how clear it comes across. I think that there’s more to say af­ter this post, and this will have a few more parts if peo­ple are in­ter­ested. (I spent a month get­ting to this point, and de­cided to post and get feed­back rather than finish a book first.)

In the first half of the post, I’ll re­view Abram’s se­lec­tion/​con­trol dis­tinc­tion, and sug­gest how it re­lates to ac­tual de­sign. I’ll also ar­gue that there is a bit of a con­tinuum be­tween the two cases, and that we should add an ad­di­tion ex­treme case to the ty­pol­ogy, di­rect solu­tion. The sec­ond sec­tion will re­visit what op­ti­miza­tion means, and try to note a few differ­ent things that could hap­pen and go wrong with Good­hart-like overop­ti­miza­tion.

The third sec­tion will talk about Good­hart in this con­text us­ing the new un­der­stand­ing—try­ing to more fully ex­plain why Good­hart effects in se­lec­tion and con­trol fun­da­men­tally differs. After this, Part 4 will re­visit Mesa-op­ti­miz­ers, and .

Thoughts on how se­lec­tion and con­trol are used in tandem

In this sec­tion, I’ll dis­cuss the two types of op­ti­miz­ers Abram dis­cussed; se­lec­tion, and con­trol, and in­tro­duce a third, sim­pler op­ti­mizer, di­rect solu­tion. I’m also go­ing to men­tion where em­bed­ded agents are differ­ent, be­cause that’s closely re­lated to se­lec­tion ver­sus con­trol, and talk about where mesa-op­ti­miz­ers ex­ist.

Start­ing with the (heav­ily overused) ex­am­ple of rock­ets, I want to re­visit Abram’s cat­e­go­riza­tion of al­gorith­mic op­ti­miza­tion ver­sus con­trol. There are sev­eral stages in­volved with get­ting rock­ets to go where we want. The first is to de­sign the rocket, which in­volves op­ti­miza­tion, which I’ll dis­cuss in two stages, the sec­ond is to test, which in­volves op­ti­miza­tion and con­trol in tan­dem, and the third is to ac­tu­ally guide the rocket we built in flight, which is purely con­trol.

Ini­tially, de­sign­ing rocket is pure op­ti­miza­tion. We might start by build­ing sim­plified math­e­mat­i­cal mod­els to figure out the ba­sic de­sign con­straints—if a rocket is bring­ing peo­ple to the moon, we may de­cide the goal is a rocket and a lan­der, rather than a sin­gle com­pos­ite. We may de­cide that cer­tain classes of tra­jec­tory /​ flight paths are go­ing to be used. This is all a set of math­e­mat­i­cal ex­er­cises, and prob­a­bly in­volves only mul­ti­ply differ­en­tiable mod­els that can be di­rectly solved to find an op­ti­mum. This is in many ways a third cat­e­gory of “op­ti­miz­ing,” in Abram’s model, be­cause there is not even a need for look­ing over the search space. I’ll call this di­rect solu­tion, since we just pick the op­ti­mum based on the setup.

After get­ting a bit closer to ac­tual de­sign, we need to simu­late rocket de­signs and paths, and op­ti­mize the simu­lated solu­tion. This lets you do clever things like build a rocket with a suffi­cient but not ex­ces­sive amount of fuel (hope­fully with a mar­gin of er­ror.) If we’re smart, we op­ti­mize with sev­eral in­tended uses and vari­able fac­tors in mind, to make sure our de­sign is suffi­ciently ro­bust. (If we’re not care­ful enough to in­clude all rele­vant fac­tors, we ig­nore some fac­tor that turns out will mat­ter, like the re­la­tion­ship be­tween tem­per­a­ture of the O-rings and their brit­tle­ness, and our de­sign fails in those con­di­tions.) This is all op­ti­miz­ing over a search space. The cost of the search is still com­par­a­tively low—not as low as di­rect solu­tion, and we may use gra­di­ent de­scent, ge­netic al­gorithms, simu­lated an­neal­ing, or other strate­gies. The com­mon­al­ity be­tween these solu­tions is that they simu­late points in the search space, per­haps along with the gra­di­ents at that point.

After we set­tle on a de­sign, we build an ac­tual rocket, and then we test it. This moves back and forth be­tween the very high cost ap­proach of build­ing phys­i­cal ob­jects and test­ing them—of­ten to de­struc­tion—and simu­la­tion. After each test, we prob­a­bly re-run the simu­la­tion to make sure any mod­ifi­ca­tions are still near the op­ti­mum we found, or we re­fine the simu­la­tions to re-op­ti­mize and pick the next de­sign to build.

Lastly, we build a fi­nal de­sign, and launch the rocket. The con­trol sys­tem is cer­tainly a mesa-op­ti­mizer with re­gards to the rocket de­sign pro­cess. For a rocket, this con­trol is closer to di­rect op­ti­miza­tion than simu­la­tion, be­cause the cost of eval­u­a­tion needs to be low enough for real-time con­trol. The mesa-op­ti­mizer would, in this case, use sim­plified physics to fire the main and guidance rock­ets to stay near the pre-cho­sen path. It’s prob­a­bly not al­lowed to pick a new path—it can’t de­cide that the bet­ter solu­tion is to or­bit twice in­stead of once be­fore land­ing. (Hu­mans may de­cide this, then hand the mesa-op­ti­mizer new pa­ram­e­ters.) We tightly con­strain the mesa-op­ti­mizer, since in a cer­tain sense it’s dumber than the de­sign op­ti­mizer that chose what to op­ti­mize for.

For a more com­plex sys­tem, we may need a com­plex mesa-op­ti­mizer to guide the already de­signed sys­tem. Even for a more com­plex rocket, we may al­low the mesa-op­ti­mizer to mod­ify the model used for op­ti­miz­ing, at least in minor ways—it may dy­nam­i­cally eval­u­ate fac­tors like the rocket effi­ciency, and de­cide that it’s get­ting 98% of the ex­pected thrust, so it will plan to use that mod­ified pa­ram­e­ter in the sys­tem model used to mesa-op­ti­mize. Giv­ing a mesa-op­ti­mizer more con­trol is dan­ger­ous, but per­haps nec­es­sary to al­low it to nav­i­gate a com­plex sys­tem.

Now that we’ve de­con­fused why op­ti­miza­tion is split be­tween se­lec­tion and con­trol, I can in­tro­duce part 2: What does op­ti­miza­tion mean?