AI Box­ing (Con­tain­ment)

TagLast edit: 12 Sep 2020 5:17 UTC by habryka

AI Boxing is attempts, experiments, or proposals to isolate (“box”) a powerful AI (~AGI) where it can’t interact with the world at large, save for limited communication with its human liaison. It is often proposed that so long as the AI is physically isolated and restricted, or “boxed”, it will be harmless even if it is an unfriendly artificial intelligence (UAI).

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

See also: AI, AGI, Oracle AI, Tool AI, Unfriendly AI

Escaping the box

It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a superintelligence, it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:

Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, or it might discover and exploit unknown physics to free itself.

Containing the AGI

Attempts to box an AGI may add some degree of safety to the development of a friendly artificial intelligence (FAI). A number of strategies for keeping an AGI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

Simulations /​ Experiments

The AI Box Experiment is a game meant to explore the possible pitfalls of AI boxing. It is played over text chat, with one human roleplaying as an AI in a box, and another human roleplaying as a gatekeeper with the ability to let the AI out of the box. The AI player wins if they successfully convince the gatekeeper to let them out of the box, and the gatekeeper wins if the AI player has not been freed after a certain period of time.

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many—but not all—occasions. Eliezer’s five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin’s 26 experiments had no time limit and subjects he approached.

The text of Eliezer’s experiments have not been made public.

List of experiments


That Alien Message

Eliezer Yudkowsky22 May 2008 5:55 UTC
287 points
173 comments10 min readLW link

Cryp­to­graphic Boxes for Un­friendly AI

paulfchristiano18 Dec 2010 8:28 UTC
48 points
162 comments5 min readLW link

The AI in a box boxes you

Stuart_Armstrong2 Feb 2010 10:10 UTC
158 points
390 comments1 min readLW link

AI Align­ment Prize: Su­per-Box­ing

X4vier18 Mar 2018 1:03 UTC
15 points
6 comments6 min readLW link

I at­tempted the AI Box Ex­per­i­ment again! (And won—Twice!)

Tuxedage5 Sep 2013 4:49 UTC
69 points
167 comments12 min readLW link

How To Win The AI Box Ex­per­i­ment (Some­times)

pinkgothic12 Sep 2015 12:34 UTC
49 points
21 comments22 min readLW link

I at­tempted the AI Box Ex­per­i­ment (and lost)

Tuxedage21 Jan 2013 2:59 UTC
74 points
245 comments5 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
31 points
81 comments9 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer Yudkowsky15 Jul 2009 2:27 UTC
109 points
605 comments2 min readLW link

[Question] Is keep­ing AI “in the box” dur­ing train­ing enough?

tgb6 Jul 2021 15:17 UTC
7 points
10 comments1 min readLW link

I wanted to in­ter­view Eliezer Yud­kowsky but he’s busy so I simu­lated him instead

lsusr16 Sep 2021 7:34 UTC
109 points
33 comments5 min readLW link

Box­ing an AI?

tailcalled27 Mar 2015 14:06 UTC
3 points
39 comments1 min readLW link

[In­tro to brain-like-AGI safety] 11. Safety ≠ al­ign­ment (but they’re close!)

Steven Byrnes6 Apr 2022 13:39 UTC
24 points
1 comment10 min readLW link

Mul­ti­ple AIs in boxes, eval­u­at­ing each other’s alignment

Moebius31429 May 2022 8:36 UTC
6 points
0 comments14 min readLW link

Loose thoughts on AGI risk

Yitz23 Jun 2022 1:02 UTC
7 points
3 comments1 min readLW link

“Don’t even think about hell”

emmab2 May 2020 8:06 UTC
6 points
2 comments1 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei_Dai10 Sep 2019 8:29 UTC
44 points
26 comments3 min readLW link

Epiphe­nom­e­nal Or­a­cles Ig­nore Holes in the Box

SilentCal31 Jan 2018 20:08 UTC
15 points
8 comments2 min readLW link

I played the AI Box Ex­per­i­ment again! (and lost both games)

Tuxedage27 Sep 2013 2:32 UTC
57 points
123 comments11 min readLW link

AIs and Gate­keep­ers Unite!

Eliezer Yudkowsky9 Oct 2008 17:04 UTC
14 points
163 comments1 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_Armstrong17 Jun 2020 17:44 UTC
56 points
2 comments1 min readLW link

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_Armstrong31 Jul 2019 18:48 UTC
57 points
156 comments3 min readLW link

Or­a­cles, se­quence pre­dic­tors, and self-con­firm­ing predictions

Stuart_Armstrong3 May 2019 14:09 UTC
21 points
0 comments3 min readLW link

Self-con­firm­ing prophe­cies, and sim­plified Or­a­cle designs

Stuart_Armstrong28 Jun 2019 9:57 UTC
6 points
1 comment5 min readLW link

How to es­cape from your sand­box and from your hard­ware host

PhilGoetz31 Jul 2015 17:26 UTC
43 points
28 comments1 min readLW link

Or­a­cle paper

Stuart_Armstrong13 Dec 2017 14:59 UTC
12 points
7 comments1 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC
25 points
15 comments1 min readLW link

Or­a­cles: re­ject all deals—break su­per­ra­tional­ity, with superrationality

Stuart_Armstrong5 Dec 2019 13:51 UTC
19 points
4 comments8 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGrace9 Dec 2014 2:00 UTC
14 points
48 comments6 min readLW link

Quan­tum AI Box

Gurkenglas8 Jun 2018 16:20 UTC
4 points
15 comments1 min readLW link

AI-Box Ex­per­i­ment—The Acausal Trade Argument

XiXiDu8 Jul 2011 9:18 UTC
12 points
20 comments2 min readLW link

Safely and use­fully spec­tat­ing on AIs op­ti­miz­ing over toy worlds

AlexMennen31 Jul 2018 18:30 UTC
24 points
16 comments2 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC
22 points
16 comments4 min readLW link

[Question] Is there a sim­ple pa­ram­e­ter that con­trols hu­man work­ing mem­ory ca­pac­ity, which has been set trag­i­cally low?

Liron23 Aug 2019 22:10 UTC
15 points
8 comments1 min readLW link

xkcd on the AI box experiment

FiftyTwo21 Nov 2014 8:26 UTC
28 points
232 comments1 min readLW link

Con­tain­ing the AI… In­side a Si­mu­lated Reality

HumaneAutomation31 Oct 2020 16:16 UTC
1 point
5 comments2 min readLW link

AI box: AI has one shot at avoid­ing de­struc­tion—what might it say?

ancientcampus22 Jan 2013 20:22 UTC
25 points
355 comments1 min readLW link

AI Box Log

Dorikka27 Jan 2012 4:47 UTC
24 points
30 comments23 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

Yitz16 Mar 2022 2:47 UTC
7 points
9 comments1 min readLW link

An AI-in-a-box suc­cess model

azsantosk11 Apr 2022 22:28 UTC
16 points
1 comment10 min readLW link

Another ar­gu­ment that you will let the AI out of the box

D0TheMath19 Apr 2022 21:54 UTC
8 points
16 comments2 min readLW link

Get­ting from un­al­igned to al­igned (AGI-as­sisted al­ign­ment Part 1)

Tor Økland Barstad21 Jun 2022 12:36 UTC
6 points
4 comments10 min readLW link