The Blue-Minimizing Robot

Imag­ine a robot with a tur­ret-mounted cam­era and laser. Each mo­ment, it is pro­grammed to move for­ward a cer­tain dis­tance and perform a sweep with its cam­era. As it sweeps, the robot con­tin­u­ously an­a­lyzes the av­er­age RGB value of the pix­els in the cam­era image; if the blue com­po­nent passes a cer­tain thresh­old, the robot stops, fires its laser at the part of the world cor­re­spond­ing to the blue area in the cam­era image, and then con­tinues on its way.

Watch­ing the robot’s be­hav­ior, we would con­clude that this is a robot that de­stroys blue ob­jects. Maybe it is a sur­gi­cal robot that de­stroys can­cer cells marked by a blue dye; maybe it was built by the Depart­ment of Home­land Se­cu­rity to fight a group of ter­ror­ists who wear blue uniforms. What­ever. The point is that we would an­a­lyze this robot in terms of its goals, and in those terms we would be tempted to call this robot a blue-min­i­mizer: a ma­chine that ex­ists solely to re­duce the amount of blue ob­jects in the world.

Sup­pose the robot had hu­man level in­tel­li­gence in some side mod­ule, but no ac­cess to its own source code; that it could learn about it­self only through ob­serv­ing its own ac­tions. The robot might come to the same con­clu­sions we did: that it is a blue-min­i­mizer, set upon a holy quest to rid the world of the scourge of blue ob­jects.

But now stick the robot in a room with a holo­gram pro­jec­tor. The holo­gram pro­jec­tor (which is it­self gray) pro­jects a holo­gram of a blue ob­ject five me­ters in front of it. The robot’s cam­era de­tects the pro­jec­tor, but its RGB value is harm­less and the robot does not fire. Then the robot’s cam­era de­tects the blue holo­gram and zaps it. We ar­range for the robot to en­ter this room sev­eral times, and each time it ig­nores the pro­jec­tor and zaps the holo­gram, with­out effect.

Here the robot is failing at its goal of be­ing a blue-min­i­mizer. The right way to re­duce the amount of blue in the uni­verse is to de­stroy the pro­jec­tor; in­stead its beams flit harm­lessly through the holo­gram.

Again, give the robot hu­man level in­tel­li­gence. Teach it ex­actly what a holo­gram pro­jec­tor is and how it works. Now what hap­pens? Ex­actly the same thing—the robot ex­e­cutes its code, which says to scan the room un­til its cam­era reg­isters blue, then shoot its laser.

In fact, there are many ways to sub­vert this robot. What if we put a lens over its cam­era which in­verts the image, so that white ap­pears as black, red as green, blue as yel­low, and so on? The robot will not shoot us with its laser to pre­vent such a vi­o­la­tion (un­less we hap­pen to be wear­ing blue clothes when we ap­proach) - its en­tire pro­gram was de­tailed in the first para­graph, and there’s noth­ing about re­sist­ing lens al­ter­a­tions. Nor will the robot cor­rect it­self and shoot only at ob­jects that ap­pear yel­low—its en­tire pro­gram was de­tailed in the first para­graph, and there’s noth­ing about cor­rect­ing its pro­gram for new lenses. The robot will con­tinue to zap ob­jects that reg­ister a blue RGB value; but now it’ll be shoot­ing at any­thing that is yel­low.

The hu­man-level in­tel­li­gence ver­sion of the robot will no­tice its vi­sion has been in­verted. It will know it is shoot­ing yel­low ob­jects. It will know it is failing at its origi­nal goal of blue-min­i­miza­tion. And maybe if it had pre­vi­ously de­cided it was on a holy quest to rid the world of blue, it will be deeply hor­rified and ashamed of its ac­tions. It will won­der why it has sud­denly started to de­vi­ate from this quest, and why it just can’t work up the will to de­stroy blue ob­jects any­more.

The robot goes to Quirinus Quir­rell, who ex­plains that robots don’t re­ally care about min­i­miz­ing the color blue. They only care about sta­tus and power, and pre­tend to care about min­i­miz­ing blue in or­der to im­press po­ten­tial al­lies.

The robot goes to Robin Han­son, who ex­plains that there are re­ally mul­ti­ple agents within the robot. One of them wants to min­i­mize the color blue, the other wants to min­i­mize the color yel­low. Maybe the two of them can make peace, and agree to min­i­mize yel­low one day and blue the next?

The robot goes to Anna Sala­mon, who ex­plains that robots are not au­to­mat­i­cally strate­gic, and that if it wants to achieve its goal it will have to learn spe­cial tech­niques to keep fo­cus on it.

I think all of these ex­pla­na­tions hold part of the puz­zle, but that the most fun­da­men­tal ex­pla­na­tion is that the mis­take be­gan as soon as we started call­ing it a “blue-min­i­miz­ing robot”. This is not be­cause its util­ity func­tion doesn’t ex­actly cor­re­spond to blue-min­i­miza­tion: even if we try to as­sign it a pon­der­ous func­tion like “min­i­mize the color rep­re­sented as blue within your cur­rent vi­sual sys­tem, ex­cept in the case of holo­grams” it will be a case of overfit­ting a curve. The robot is not max­i­miz­ing or min­i­miz­ing any­thing. It does ex­actly what it says in its pro­gram: find some­thing that ap­pears blue and shoot it with a laser. If its hu­man han­dlers (or it­self) want to in­ter­pret that as goal di­rected be­hav­ior, well, that’s their prob­lem.

It may be that the robot was cre­ated to achieve a spe­cific goal. It may be that the Depart­ment of Home­land Se­cu­rity pro­grammed it to at­tack blue-uniformed ter­ror­ists who had no ac­cess to holo­gram pro­jec­tors or in­ver­sion lenses. But to as­sign the goal of “blue min­i­miza­tion” to the robot is a con­fu­sion of lev­els: this was a goal of the Depart­ment of Home­land Se­cu­rity, which be­came a lost pur­pose as soon as it was rep­re­sented in the form of code.

The robot is a be­hav­ior-ex­ecu­tor, not a util­ity-max­i­mizer.

In the rest of this se­quence, I want to ex­pand upon this idea. I’ll start by dis­cussing some of the foun­da­tions of be­hav­iorism, one of the ear­liest the­o­ries to treat peo­ple as be­hav­ior-ex­ecu­tors. I’ll go into some of the im­pli­ca­tions for the “easy prob­lem” of con­scious­ness and philos­o­phy of mind. I’ll very briefly dis­cuss the philo­soph­i­cal de­bate around elimi­na­tivism and a few elimi­na­tivist schools. Then I’ll go into why we feel like we have goals and prefer­ences and what to do about them.