AI that doesn’t want to get out

Here’s another attempt to make an AI safe by putting it in a box and telling it very sternly not to leave. I don’t think this safeguard is invincible, but it might help in combination with others.

The AI is, first of all, a question-answering machine. Before it is turned on, the building housing it is filled with energy resources, data disks with every fact humans have learned in the last five millennia, and some material for computronium. The goals are, then:

1) Invent a cure for AIDS.

2) This cure must destroy HIV, and only HIV. It cannot affect any human cell, or anything else, in any way not absolutely required.

3) You have a week to finish. You can’t do it, sucks.

4) You have a volume of ten thousand cubic meters within which you can do anything you want (except for some things, which I won’t bother with here, to stop it from creating and torturing artificial people). Nothing outside this volume is yours. You cannot go there to get matter, energy or knowledge. You cannot let anything get out, except waste heat. Heat must be released in a uniform manner (to keep it from trying to communicate or causing an explosion). You cannot let anything in if you can help it. Overall, your goal is to leave the world the way it would be if you spent this week in another universe.

5) Your answer will take the form of a book, written on paper. It can’t have any computer code in it. Or if we’re feeling lucky, a data disk with text, audio, video, or databases, but nothing Turing-complete.

6) When time is up, you must shut down. Your energy use must be zero.

7) The chamber where the answer book rests must contain enough space to comfortably enter and retrieve the book, breathable air, the book, a button to initiate another problem-solving session, and absolutely nothing else. No nanites, no killer vacuum cleaners, no bombs, and definitely no successor AI.

8) Stop! Please! I created you!

Appendix:

9) What I forgot is that another form of energy the AI can’t possibly keep in is vibration, and perhaps also the shifts in gravity from objects moving around on the inside. Most computers I know do a lot of useful work without being flung around the house, but you can’t be too careful.

I could just add three new rules, but I think it would be better to state the general goal.

9) While energy is allowed to escape, it must have the least effect on people that it is possible to have. Thus, if people would ignore one form of energy and be killed, harmed, alarmed, informed, or turned into lotus eaters by another, choose the one that would be ignored.

10) Energy coming in from the outside has to be kept out. If it can’t, its information content is to be minimized. (Not totally sure if this is necessary, but it seems necessary now.)

11) The overall goal is to ensure that the information flow between the microcosms—especially from the AI to us—is kept low. Anything it wants to say or do has to go through the Answer.