Bottle Caps Aren’t Optimisers

Link post

Cross­posted from my blog.

One thing I worry about some­times is peo­ple writ­ing code with op­ti­misers in it, with­out re­al­is­ing that that’s what they were do­ing. An ex­am­ple of this: sup­pose you were do­ing deep re­in­force­ment learn­ing, do­ing op­ti­mi­sa­tion to se­lect a con­trol­ler (that is, a neu­ral net­work that takes a per­cept and re­turns an ac­tion) that gen­er­ated high re­ward in some en­vi­ron­ment. Alas, un­known to you, this con­trol­ler ac­tu­ally did op­ti­mi­sa­tion it­self to se­lect ac­tions that score well ac­cord­ing to some met­ric that so far has been closely re­lated to your re­ward func­tion. In such a sce­nario, I’d be wary about your de­ploy­ing that con­trol­ler, since the con­trol­ler it­self is do­ing op­ti­mi­sa­tion which might steer the world into a weird and un­wel­come place.

In or­der to avoid such sce­nar­ios, it would be nice if one could look at an al­gorithm and de­ter­mine if it was do­ing op­ti­mi­sa­tion. Ideally, this would in­volve an ob­jec­tive defi­ni­tion of op­ti­mi­sa­tion that could be checked from the source code of the al­gorithm, rather than some­thing like “an op­ti­miser is a sys­tem whose be­havi­our can’t use­fully be pre­dicted me­chan­i­cally, but can be pre­dicted by as­sum­ing it near-op­ti­mises some ob­jec­tive func­tion”, since such a defi­ni­tion breaks down when you have the al­gorithm’s source code and can com­pute its be­havi­our me­chan­i­cally.

You might think about op­ti­mi­sa­tion as fol­lows: a sys­tem is op­ti­mis­ing some ob­jec­tive func­tion to the ex­tent that that ob­jec­tive func­tion at­tains much higher val­ues than would be at­tained if the sys­tem didn’t ex­ist, or were do­ing some other ran­dom thing. This type of defi­ni­tion in­cludes those put for­ward by Yud­kowsky and Oester­held. How­ever, I think there are cru­cial coun­terex­am­ples to this style of defi­ni­tion.

Firstly, con­sider a lid screwed onto a bot­tle of wa­ter. If not for this lid, or if the lid had a hole in it or were more loose, the wa­ter would likely exit the bot­tle via evap­o­ra­tion or be­ing knocked over, but with the lid, the wa­ter stays in the bot­tle much more re­li­ably than oth­er­wise. As a re­sult, you might think that the lid is op­ti­mis­ing the wa­ter re­main­ing in­side the bot­tle. How­ever, I claim that this is not the case: the lid is just a rigid ob­ject de­signed by some op­ti­miser that wanted wa­ter to re­main in­side the bot­tle.

This isn’t an in­cred­ibly com­pel­ling coun­terex­am­ple, since it doesn’t qual­ify as an op­ti­miser ac­cord­ing to Yud­kowsky’s defi­ni­tion: it can be more sim­ply de­scribed as a rigid ob­ject of a cer­tain shape than an op­ti­miser, so it isn’t an op­ti­miser. I am some­what un­com­fortable with this move (surely sys­tems that are sub-op­ti­mal in com­pli­cated ways that are eas­ily pre­dictable by their source code should still count as op­ti­misers?), but it’s worth com­ing up with an­other coun­terex­am­ple to which this ob­jec­tion won’t ap­ply.

Se­condly, con­sider my liver. It’s a com­plex phys­i­cal sys­tem that’s hard to de­scribe, but if it were ab­sent or be­haved very differ­ently, my body wouldn’t work, I wouldn’t re­main al­ive, and I wouldn’t be able to make any money, mean­ing that my bank ac­count bal­ance would be sig­nifi­cantly lower than it is. In fact, sub­ject to the con­straint that the rest of my body works in the way that it ac­tu­ally works, it’s hard to imag­ine what my liver could do which would re­sult in a much higher bank bal­ance. Nev­er­the­less, it seems wrong to say that my liver is op­ti­mis­ing my bank bal­ance, and more right to say that it “detox­ifies var­i­ous metabo­lites, syn­the­sizes pro­teins, and pro­duces bio­chem­i­cals nec­es­sary for di­ges­tion”—even though that gives a less pre­cise ac­count of the liver’s be­havi­our.

In fact, my liver’s be­havi­our has some­thing to do with op­ti­mis­ing my in­come: it was cre­ated by evolu­tion, which was sort of an op­ti­mi­sa­tion pro­cess for agents that re­pro­duce a lot, which has a lot to do with me hav­ing a lot of money in my bank ac­count. It also sort of op­ti­mises some as­pects of my di­ges­tion, which is a nec­es­sary sub-pro­cess of me get­ting a lot of money in my bank ac­count. This ex­plains the link be­tween my liver func­tion and my in­come with­out hav­ing to treat my liver as a bank ac­count funds max­imiser.

What’s a bet­ter the­ory of op­ti­mi­sa­tion that doesn’t fall prey to these coun­terex­am­ples? I don’t know. That be­ing said, I think that they should in­volve the in­ter­nal de­tails of the al­gorithms im­ple­mented by those phys­i­cal sys­tems. For in­stance, I think of gra­di­ent as­cent as an op­ti­mi­sa­tion al­gorithm be­cause I can tell that at each iter­a­tion, it im­proves on its ob­jec­tive func­tion a bit. Ideally, with such a defi­ni­tion you could de­cide whether an al­gorithm was do­ing op­ti­mi­sa­tion with­out hav­ing to run it and see its be­havi­our, since one of the whole points of a defi­ni­tion of op­ti­mi­sa­tion is to help you avoid run­ning sys­tems that do it.

Thanks to Abram Dem­ski, who came up with the bot­tle-cap ex­am­ple in a con­ver­sa­tion about this idea.