Knowledge, manipulation, and free will

Thanks to Re­becca Gor­man for co-de­vel­op­ing this idea

On the 26th of Septem­ber 1983, Stanis­lav Petrov ob­served the early warn­ing satel­lites re­port­ing the launch of five nu­clear mis­siles to­wards the Soviet Union. He de­cided to di­s­obey or­ders and not pass on the mes­sage to higher com­mand, which could eas­ily have re­sulted in a nu­clear war (since the so­viet nu­clear po­si­tion was “launch on warn­ing”).

Now, did Petrov have free will when he de­cided to save the world?

Main­tain­ing free will when knowl­edge increases

I don’t in­tend to go into the sub­tle philo­soph­i­cal de­bate on the na­ture of free will. See this post for a good re­duc­tion­ist ac­count. In­stead, con­sider the fol­low­ing sce­nar­ios:

  1. The stan­dard Petrov in­ci­dent.

  2. The stan­dard Petrov in­ci­dent, ex­cept that it is still on­go­ing and Petrov hasn’t reached a de­ci­sion yet.

  3. The stan­dard Petrov in­ci­dent, af­ter it was over, ex­cept that we don’t yet know what his fi­nal de­ci­sion was.

  4. The stan­dard Petrov in­ci­dent, ex­cept that we know that, if Petrov had had eggs that morn­ing (in­stead of por­ridge[1]), he would have made a differ­ent de­ci­sion.

  5. The same as sce­nario 4., ex­cept that some en­tity de­liber­ately gave Petrov por­ridge that morn­ing, aiming to de­ter­mine his de­ci­sion.

  6. The stan­dard Petrov in­ci­dent, ex­cept that a guy with a gun held Petrov hostage and forced him not to pass on the re­port.

There is an in­ter­est­ing con­trast be­tween sce­nar­ios 1, 2, and 3. Clearly, 1 and 3 only differ in our knowl­edge of the in­ci­dent. It does not seem that Petrov’s free will should de­pend on the de­gree of knowl­edge of some other per­son.

Sce­nar­ios 1 and 2 only differ in time: in one case the de­ci­sion is made, in the sec­ond it is yet to be made. If we say that Petrov has free will, what­ever that is, in sce­nario 2, then it seems that in sce­nario 1, we have to say that he “had” free will. So what­ever our feel­ing on free will, it seems that know­ing the out­come doesn’t change whether there was free will or not.

That in­tu­ition is challenged by sce­nario 4. It’s one thing to know that Petrov’s de­ci­sion was de­ter­minis­tic (or de­ter­minis­tic-stochas­tic if there’s a true ran­dom el­e­ment to it). It’s an­other to know the spe­cific causes of the de­ci­sion.

And it’s yet an­other thing if the spe­cific causes have been in­fluence to ma­nipu­late the out­come, as in sce­nario 5. Again, all we have done here is add knowl­edge—we know the causes of Petrov’s de­ci­sion, and we know that his break­fast was cho­sen with that out­come in mind. But some­one has to de­cide what Petrov had that morn­ing[2]; why does it mat­ter that it was done for a spe­cific pur­pose?

Maybe this whole free will thing isn’t im­por­tant, af­ter all? But it’s clear in sce­nario 6 that some­thing is wrong. Even though Petrov has just as much free will, in the philo­soph­i­cal sense—be­fore, he could choose to pass on the warn­ing or not, now he can equally choose to not pass on the mes­sage or die. This sug­gests that free will is some­thing that is de­ter­mined by out­side fea­tures, not just in­ter­nal ones. This is re­lated to the con­cept of co­er­cion and its philo­soph­i­cal anal­y­sis.

What free will we’d want from an AI

Sce­nar­ios 5 and 6 are prob­le­matic: call them ma­nipu­la­tion and co­er­cion, re­spec­tively. We might not want the AI to guaran­tee us free will, but we do want it to avoid ma­nipu­la­tion and co­er­cion.

Co­er­cion is prob­a­bly the eas­iest to define, and hence avoid. We feel co­er­cion when its im­posed on us, when our op­tions nar­row. Any rea­son­ably al­igned AI should avoid that. There re­mains the prob­lem of when we don’t re­al­ise that our op­tions are nar­row­ing—but that seems to be a case of ma­nipu­la­tion, not co­er­cion.

So, how do we avoid ma­nipu­la­tion? Just giv­ing Petrov eggs is not ma­nipu­la­tion, if the AI doesn’t know the con­se­quences of do­ing so. Nor does it be­come ma­nipu­la­tion if the AI sud­denly learns those con­se­quences—knowl­edge doesn’t re­move free will or cause ma­nipu­la­tion. And, in­deed, it would be fool­ish to try and con­strain an AI by re­strict­ing its knowl­edge.

So it seems we must ac­cept that:

  1. The AI will likely know ahead of time what de­ci­sion we will reach in cer­tain cir­cum­stances.

  2. The AI will also know how to in­fluence that de­ci­sion.

  3. In many cir­cum­stances, the AI will have to in­fluence that de­ci­sion, sim­ply be­cause it has to do cer­tain ac­tions (or in­ac­tions). A but­ler AI will have to give Petrov break­fast, or make him go hun­gry (which will have its own con­se­quences), even if it knows the con­se­quences of its own de­ci­sion.

So “no ma­nipu­la­tion” or “main­tain­ing hu­man free will” seems to re­quire a form of in­differ­ence: we want the AI to know how its ac­tions af­fect our de­ci­sions, but not take that in­fluence into ac­count when choos­ing those ac­tions.

It will be im­por­tant to define ex­actly what we mean by that.


  1. I have no idea what Petrov ac­tu­ally had for break­fast, that day or any other. ↩︎

  2. Even if Petrov him­self de­cided what to have for break­fast, he choose among the op­tions that were pos­si­ble for him that morn­ing. ↩︎