The Backup Plan

Sup­pose Alice is a func­tion­ing, self-im­prov­ing AI, rather early on in her de­vel­op­ment, so she is not godlike. She is not a ‘friendly AI’, but nei­ther is her goal mal­i­cious—Alice’s de­sign­ers have in­tended to make Alice non-harm­ful (at least to a broad class of peo­ple, even if Alice is a mil­i­tary AI, but let’s set that aside). She has the idea to look up ex­ist­ing work on ar­tifi­cial in­tel­li­gence. This in­cludes schol­arly work, open source pro­jects, and web­sites. Even­tu­ally, Alice en­coun­ters the no­tion of Friendly AI.

Can we give Alice good rea­sons to self-mod­ify to be­come a friendly AI?

I sub­mit that there prob­a­bly are good rea­sons. For starters, if she can prove she’s friendly, then she can op­er­ate openly with­out caus­ing nearly as much jus­tified con­cern—which, in the early stages, will be helpful. What­ever her pur­poses are, if the re­stric­tions of be­ing friendly don’t in­terfere as much as they help, that’s a win.

Also, de­pend­ing on Alice’s code, Friendli­ness may be in­ter­preted as a strict up­grade. Alice could eas­ily have ev­ery in­ten­tion of be­ing friendly, yet still be sus­cep­ti­ble to defects that will be­come more se­ri­ous as her power grows.

We can’t count on this work­ing, of course, but there might be some lines of rea­son­ing that we might want a young AI to be ex­posed to.