AI Boxing (Containment)

TagLast edit: 4 May 2020 23:52 UTC by Ruby

AI Boxing is attempts, experiments, or proposals to isolate (“box”) an unaligned AI where it can’t interact with the world at large and cause harm. See also: AI

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

That Alien Message

Eliezer Yudkowsky22 May 2008 5:55 UTC

362 points

174 comments10 min readLW link

Cryptographic Boxes for Unfriendly AI

paulfchristiano18 Dec 2010 8:28 UTC

71 points

162 comments5 min readLW link

The AI in a box boxes you

Stuart_Armstrong2 Feb 2010 10:10 UTC

169 points

389 comments1 min readLW link

How it feels to have your mind hacked by an AI

blaked12 Jan 2023 0:33 UTC

357 points

220 comments17 min readLW link

The case for training frontier AIs on Sumerian-only corpus

Alexandre Variengien, Charbel-Raphaël and Jonathan Claybrough

15 Jan 2024 16:40 UTC

130 points

15 comments3 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer Yudkowsky15 Jul 2009 2:27 UTC

130 points

614 comments2 min readLW link

I attempted the AI Box Experiment (and lost)

Tuxedage21 Jan 2013 2:59 UTC

79 points

245 comments5 min readLW link

[Question] Boxing

Zach Stein-Perlman2 Aug 2023 23:38 UTC

6 points

1 comment1 min readLW link

I attempted the AI Box Experiment again! (And won—Twice!)

Tuxedage5 Sep 2013 4:49 UTC

76 points

168 comments12 min readLW link

How To Win The AI Box Experiment (Sometimes)

pinkgothic12 Sep 2015 12:34 UTC

55 points

21 comments22 min readLW link

[Question] AI Box Experiment: Are people still interested?

Double31 Aug 2022 3:04 UTC

30 points

13 comments1 min readLW link

Loose thoughts on AGI risk

Yitz23 Jun 2022 1:02 UTC

7 points

3 comments1 min readLW link

AI Alignment Prize: Super-Boxing

X4vier18 Mar 2018 1:03 UTC

16 points

6 comments6 min readLW link

LOVE in a simbox is all you need

jacob_cannell28 Sep 2022 18:25 UTC

63 points

72 comments44 min readLW link 1 review

[Question] Is keeping AI “in the box” during training enough?

tgb6 Jul 2021 15:17 UTC

7 points

10 comments1 min readLW link

I wanted to interview Eliezer Yudkowsky but he’s busy so I simulated him instead

lsusr16 Sep 2021 7:34 UTC

111 points

33 comments5 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC

28 points

81 comments9 min readLW link

My take on Jacob Cannell’s take on AGI safety

Steven Byrnes28 Nov 2022 14:01 UTC

71 points

15 comments30 min readLW link 1 review

Thoughts on “Process-Based Supervision”

Steven Byrnes17 Jul 2023 14:08 UTC

74 points

4 comments23 min readLW link

Side-channels: input versus output

davidad12 Dec 2022 12:32 UTC

44 points

16 comments2 min readLW link

[Question] Why do so many think deception in AI is important?

Prometheus13 Jan 2024 8:14 UTC

23 points

12 comments1 min readLW link

Boxing an AI?

tailcalled27 Mar 2015 14:06 UTC

3 points

39 comments1 min readLW link

Multiple AIs in boxes, evaluating each other’s alignment

Moebius31429 May 2022 8:36 UTC

8 points

0 comments14 min readLW link

I Am Scared of Posting Negative Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC

63 points

28 comments1 min readLW link

[Intro to brain-like-AGI safety] 11. Safety ≠ alignment (but they’re close!)

Steven Byrnes6 Apr 2022 13:39 UTC

35 points

1 comment10 min readLW link

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)

Peter S. Park, NickyP and Stephen Fowler

10 Aug 2022 18:14 UTC

28 points

30 comments11 min readLW link

Safely and usefully spectating on AIs optimizing over toy worlds

AlexMennen31 Jul 2018 18:30 UTC

24 points

16 comments2 min readLW link

Analysing: Dangerous messages from future UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC

22 points

16 comments4 min readLW link

[Question] Is there a simple parameter that controls human working memory capacity, which has been set tragically low?

Liron23 Aug 2019 22:10 UTC

17 points

8 comments1 min readLW link

Self-shutdown AI

jan betley21 Aug 2023 16:48 UTC

13 points

2 comments2 min readLW link

xkcd on the AI box experiment

FiftyTwo21 Nov 2014 8:26 UTC

28 points

234 comments1 min readLW link

Containing the AI… Inside a Simulated Reality

HumaneAutomation31 Oct 2020 16:16 UTC

1 point

9 comments2 min readLW link

AI box: AI has one shot at avoiding destruction—what might it say?

ancientcampus22 Jan 2013 20:22 UTC

25 points

355 comments1 min readLW link

AI Box Log

Dorikka27 Jan 2012 4:47 UTC

23 points

30 comments23 min readLW link

[Question] Danger(s) of theorem-proving AI?

Yitz16 Mar 2022 2:47 UTC

8 points

8 comments1 min readLW link

An AI-in-a-box success model

azsantosk11 Apr 2022 22:28 UTC

16 points

1 comment10 min readLW link

Another argument that you will let the AI out of the box

Garrett Baker19 Apr 2022 21:54 UTC

8 points

16 comments2 min readLW link

Pivotal acts using an unaligned AGI?

Simon Fischer21 Aug 2022 17:13 UTC

28 points

3 comments7 min readLW link

Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad21 Jun 2022 12:36 UTC

13 points

7 comments9 min readLW link

Anthropomorphic AI and Sandboxed Virtual Universes

jacob_cannell3 Sep 2010 19:02 UTC

4 points

124 comments5 min readLW link

Sandboxing by Physical Simulation?

moridinamael1 Aug 2018 0:36 UTC

12 points

4 comments1 min readLW link

Making it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC

15 points

5 comments22 min readLW link

Dissected boxed AI

Nathan112312 Aug 2022 2:37 UTC

−8 points

2 comments1 min readLW link

An Uncanny Prison

Nathan112313 Aug 2022 21:40 UTC

3 points

3 comments2 min readLW link

Gatekeeper Victory: AI Box Reflection

Double and DaemonicSigil

9 Sep 2022 21:38 UTC

6 points

6 comments9 min readLW link

How to Study Unsafe AGI’s safely (and why we might have no choice)

Punoxysm7 Mar 2014 7:24 UTC

10 points

47 comments5 min readLW link

Smoke without fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC

51 points

22 comments4 min readLW link

Another problem with AI confinement: ordinary CPUs can work as radio transmitters

RomanS14 Oct 2022 8:28 UTC

35 points

1 comment1 min readLW link

(news.softpedia.com)

Decision theory does not imply that we get to have nice things

So8res18 Oct 2022 3:04 UTC

165 points

58 comments26 min readLW link 2 reviews

Prosaic misalignment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC

42 points

3 comments5 min readLW link

I’ve updated towards AI boxing being surprisingly easy

Noosphere8925 Dec 2022 15:40 UTC

8 points

20 comments2 min readLW link

[Question] Oracle AGI—How can it escape, other than security issues? (Steganography?)

RationalSieve25 Dec 2022 20:14 UTC

3 points

6 comments1 min readLW link

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

Christopher King20 Feb 2023 15:11 UTC

16 points

15 comments1 min readLW link

ChatGPT getting out of the box

qbolec16 Mar 2023 13:47 UTC

6 points

3 comments1 min readLW link

Planning to build a cryptographic box with perfect secrecy

Lysandre Terrisse31 Dec 2023 9:31 UTC

39 points

6 comments11 min readLW link

An AI, a box, and a threat

jwfiredragon7 Mar 2024 6:15 UTC

8 points

0 comments6 min readLW link

Disproving and partially fixing a fully homomorphic encryption scheme with perfect secrecy

Lysandre Terrisse26 May 2024 14:56 UTC

16 points

1 comment18 min readLW link

How to safely use an optimizer

Simon Fischer28 Mar 2024 16:11 UTC

47 points

21 comments7 min readLW link

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

“Don’t even think about hell”

emmab2 May 2020 8:06 UTC

6 points

2 comments1 min readLW link

Information-Theoretic Boxing of Superintelligences

JustinShovelain and Elliot_Mckernon

30 Nov 2023 14:31 UTC

30 points

0 comments7 min readLW link

Protecting against sudden capability jumps during training

nikola2 Dec 2023 4:22 UTC

8 points

0 comments2 min readLW link

Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei Dai10 Sep 2019 8:29 UTC

48 points

26 comments3 min readLW link

Epiphenomenal Oracles Ignore Holes in the Box

SilentCal31 Jan 2018 20:08 UTC

15 points

8 comments2 min readLW link

I played the AI Box Experiment again! (and lost both games)

Tuxedage27 Sep 2013 2:32 UTC

60 points

123 comments11 min readLW link

AIs and Gatekeepers Unite!

Eliezer Yudkowsky9 Oct 2008 17:04 UTC

14 points

163 comments1 min readLW link

Results of $1,000 Oracle contest!

Stuart_Armstrong17 Jun 2020 17:44 UTC

60 points

2 comments1 min readLW link

Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong31 Jul 2019 18:48 UTC

59 points

154 comments3 min readLW link

Oracles, sequence predictors, and self-confirming predictions

Stuart_Armstrong3 May 2019 14:09 UTC

22 points

0 comments3 min readLW link

Self-confirming prophecies, and simplified Oracle designs

Stuart_Armstrong28 Jun 2019 9:57 UTC

10 points

1 comment5 min readLW link

How to escape from your sandbox and from your hardware host

PhilGoetz31 Jul 2015 17:26 UTC

43 points

28 comments1 min readLW link

Oracle paper

Stuart_Armstrong13 Dec 2017 14:59 UTC

12 points

7 comments1 min readLW link

[FICTION] Unboxing Elysium: An AI’S Escape

Super AGI10 Jun 2023 4:41 UTC

−14 points

4 comments14 min readLW link

Breaking Oracles: superrationality and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC

25 points

15 comments1 min readLW link

Oracles: reject all deals—break superrationality, with superrationality

Stuart_Armstrong5 Dec 2019 13:51 UTC

20 points

4 comments8 min readLW link

A way to make solving alignment 10.000 times easier. The shorter case for a massive open source simbox project.

AlexFromSafeTransition21 Jun 2023 8:08 UTC

2 points

16 comments14 min readLW link

Superintelligence 13: Capability control methods

KatjaGrace9 Dec 2014 2:00 UTC

14 points

48 comments6 min readLW link

Quantum AI Box

Gurkenglas8 Jun 2018 16:20 UTC

4 points

15 comments1 min readLW link

AI-Box Experiment—The Acausal Trade Argument

XiXiDu8 Jul 2011 9:18 UTC

14 points

20 comments2 min readLW link

Ruby 12 Sep 2020 5:11 UTC
2 points
from the original talk page

Talk:AI boxing
If an SF reference is not considered a faux pas, this reminds me of John Barnes ( https://en.wikipedia.org/wiki/John_Barnes_%28author%29 ) “Meme Wars”. The way One True infected humanity is, if possible, an obvious attack vector for a sufficiently powerful AI. -- Resuna (talk) 10:20, 27 November 2014 (AEDT)

AI Box­ing (Con­tain­ment)

AI Boxing (Containment)