AI Boxing (Containment)

TagLast edit: 4 May 2020 23:52 UTC by Ruby

AI Boxing is attempts, experiments, or proposals to isolate (“box”) an unaligned AI where it can’t interact with the world at large and cause harm. See also: AI

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

How to safely use an optimizer

Simon Fischer28 Mar 2024 16:11 UTC

47 points

21 comments7 min readLW link

An AI, a box, and a threat

jwfiredragon7 Mar 2024 6:15 UTC

5 points

0 comments6 min readLW link

The case for training frontier AIs on Sumerian-only corpus

Alexandre Variengien, Charbel-Raphaël and Jonathan Claybrough

15 Jan 2024 16:40 UTC

127 points

14 comments3 min readLW link

[Question] Why do so many think deception in AI is important?

Prometheus13 Jan 2024 8:14 UTC

23 points

12 comments1 min readLW link

Planning to build a cryptographic box with perfect secrecy

Lysandre Terrisse31 Dec 2023 9:31 UTC

37 points

6 comments11 min readLW link

Protecting against sudden capability jumps during training

nikola2 Dec 2023 4:22 UTC

8 points

0 comments2 min readLW link

Information-Theoretic Boxing of Superintelligences

JustinShovelain and Elliot_Mckernon

30 Nov 2023 14:31 UTC

30 points

0 comments7 min readLW link

Self-shutdown AI

jan betley21 Aug 2023 16:48 UTC

13 points

2 comments2 min readLW link

[Question] Boxing

Zach Stein-Perlman2 Aug 2023 23:38 UTC

6 points

1 comment1 min readLW link

Thoughts on “Process-Based Supervision”

Steven Byrnes17 Jul 2023 14:08 UTC

74 points

4 comments23 min readLW link

A way to make solving alignment 10.000 times easier. The shorter case for a massive open source simbox project.

AlexFromSafeTransition21 Jun 2023 8:08 UTC

2 points

16 comments14 min readLW link

[FICTION] Unboxing Elysium: An AI’S Escape

Super AGI10 Jun 2023 4:41 UTC

−14 points

4 comments14 min readLW link

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

ChatGPT getting out of the box

qbolec16 Mar 2023 13:47 UTC

6 points

3 comments1 min readLW link

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

Christopher King20 Feb 2023 15:11 UTC

16 points

15 comments1 min readLW link

I Am Scared of Posting Negative Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC

63 points

27 comments1 min readLW link

How it feels to have your mind hacked by an AI

blaked12 Jan 2023 0:33 UTC

354 points

219 comments17 min readLW link

[Question] Oracle AGI—How can it escape, other than security issues? (Steganography?)

RationalSieve25 Dec 2022 20:14 UTC

3 points

6 comments1 min readLW link

I’ve updated towards AI boxing being surprisingly easy

Noosphere8925 Dec 2022 15:40 UTC

8 points

20 comments2 min readLW link

Side-channels: input versus output

davidad12 Dec 2022 12:32 UTC

44 points

16 comments2 min readLW link

Prosaic misalignment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC

40 points

2 comments5 min readLW link

My take on Jacob Cannell’s take on AGI safety

Steven Byrnes28 Nov 2022 14:01 UTC

71 points

15 comments30 min readLW link 1 review

Decision theory does not imply that we get to have nice things

So8res18 Oct 2022 3:04 UTC

165 points

58 comments26 min readLW link 2 reviews

Another problem with AI confinement: ordinary CPUs can work as radio transmitters

RomanS14 Oct 2022 8:28 UTC

35 points

1 comment1 min readLW link

(news.softpedia.com)

Smoke without fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC

51 points

22 comments4 min readLW link

LOVE in a simbox is all you need

jacob_cannell28 Sep 2022 18:25 UTC

63 points

72 comments44 min readLW link 1 review

Gatekeeper Victory: AI Box Reflection

Double and DaemonicSigil

9 Sep 2022 21:38 UTC

6 points

6 comments9 min readLW link

[Question] AI Box Experiment: Are people still interested?

Double31 Aug 2022 3:04 UTC

30 points

13 comments1 min readLW link

Pivotal acts using an unaligned AGI?

Simon Fischer21 Aug 2022 17:13 UTC

28 points

3 comments7 min readLW link

An Uncanny Prison

Nathan112313 Aug 2022 21:40 UTC

3 points

3 comments2 min readLW link

Dissected boxed AI

Nathan112312 Aug 2022 2:37 UTC

−8 points

2 comments1 min readLW link

How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)

Peter S. Park, NickyP and Stephen Fowler

10 Aug 2022 18:14 UTC

28 points

30 comments11 min readLW link

Making it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC

15 points

5 comments22 min readLW link

Loose thoughts on AGI risk

Yitz23 Jun 2022 1:02 UTC

7 points

3 comments1 min readLW link

Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad21 Jun 2022 12:36 UTC

13 points

7 comments9 min readLW link

Multiple AIs in boxes, evaluating each other’s alignment

Moebius31429 May 2022 8:36 UTC

8 points

0 comments14 min readLW link

Another argument that you will let the AI out of the box

Garrett Baker19 Apr 2022 21:54 UTC

7 points

16 comments2 min readLW link

An AI-in-a-box success model

azsantosk11 Apr 2022 22:28 UTC

16 points

1 comment10 min readLW link

[Intro to brain-like-AGI safety] 11. Safety ≠ alignment (but they’re close!)

Steven Byrnes6 Apr 2022 13:39 UTC

35 points

1 comment10 min readLW link

[Question] Danger(s) of theorem-proving AI?

Yitz16 Mar 2022 2:47 UTC

8 points

8 comments1 min readLW link

I wanted to interview Eliezer Yudkowsky but he’s busy so I simulated him instead

lsusr16 Sep 2021 7:34 UTC

111 points

33 comments5 min readLW link

[Question] Is keeping AI “in the box” during training enough?

tgb6 Jul 2021 15:17 UTC

7 points

10 comments1 min readLW link

Containing the AI… Inside a Simulated Reality

HumaneAutomation31 Oct 2020 16:16 UTC

1 point

9 comments2 min readLW link

Results of $1,000 Oracle contest!

Stuart_Armstrong17 Jun 2020 17:44 UTC

60 points

2 comments1 min readLW link

“Don’t even think about hell”

emmab2 May 2020 8:06 UTC

6 points

2 comments1 min readLW link

Oracles: reject all deals—break superrationality, with superrationality

Stuart_Armstrong5 Dec 2019 13:51 UTC

20 points

4 comments8 min readLW link

Breaking Oracles: superrationality and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC

25 points

15 comments1 min readLW link

Analysing: Dangerous messages from future UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC

22 points

16 comments4 min readLW link

Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei Dai10 Sep 2019 8:29 UTC

48 points

26 comments3 min readLW link

[Question] Is there a simple parameter that controls human working memory capacity, which has been set tragically low?

Liron23 Aug 2019 22:10 UTC

17 points

8 comments1 min readLW link

Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong31 Jul 2019 18:48 UTC

59 points

154 comments3 min readLW link

Self-confirming prophecies, and simplified Oracle designs

Stuart_Armstrong28 Jun 2019 9:57 UTC

10 points

1 comment5 min readLW link

Oracles, sequence predictors, and self-confirming predictions

Stuart_Armstrong3 May 2019 14:09 UTC

22 points

0 comments3 min readLW link

Sandboxing by Physical Simulation?

moridinamael1 Aug 2018 0:36 UTC

12 points

4 comments1 min readLW link

Safely and usefully spectating on AIs optimizing over toy worlds

AlexMennen31 Jul 2018 18:30 UTC

24 points

16 comments2 min readLW link

Quantum AI Box

Gurkenglas8 Jun 2018 16:20 UTC

4 points

15 comments1 min readLW link

AI Alignment Prize: Super-Boxing

X4vier18 Mar 2018 1:03 UTC

16 points

6 comments6 min readLW link

Epiphenomenal Oracles Ignore Holes in the Box

SilentCal31 Jan 2018 20:08 UTC

15 points

8 comments2 min readLW link

Oracle paper

Stuart_Armstrong13 Dec 2017 14:59 UTC

12 points

7 comments1 min readLW link

How To Win The AI Box Experiment (Sometimes)

pinkgothic12 Sep 2015 12:34 UTC

55 points

21 comments22 min readLW link

How to escape from your sandbox and from your hardware host

PhilGoetz31 Jul 2015 17:26 UTC

43 points

28 comments1 min readLW link

Boxing an AI?

tailcalled27 Mar 2015 14:06 UTC

3 points

39 comments1 min readLW link

Superintelligence 13: Capability control methods

KatjaGrace9 Dec 2014 2:00 UTC

14 points

48 comments6 min readLW link

xkcd on the AI box experiment

FiftyTwo21 Nov 2014 8:26 UTC

28 points

234 comments1 min readLW link

How to Study Unsafe AGI’s safely (and why we might have no choice)

Punoxysm7 Mar 2014 7:24 UTC

10 points

47 comments5 min readLW link

I played the AI Box Experiment again! (and lost both games)

Tuxedage27 Sep 2013 2:32 UTC

59 points

123 comments11 min readLW link

I attempted the AI Box Experiment again! (And won—Twice!)

Tuxedage5 Sep 2013 4:49 UTC

76 points

168 comments12 min readLW link

AI box: AI has one shot at avoiding destruction—what might it say?

ancientcampus22 Jan 2013 20:22 UTC

25 points

355 comments1 min readLW link

I attempted the AI Box Experiment (and lost)

Tuxedage21 Jan 2013 2:59 UTC

78 points

245 comments5 min readLW link

AI Box Log

Dorikka27 Jan 2012 4:47 UTC

23 points

30 comments23 min readLW link

AI-Box Experiment—The Acausal Trade Argument

XiXiDu8 Jul 2011 9:18 UTC

14 points

20 comments2 min readLW link

Cryptographic Boxes for Unfriendly AI

paulfchristiano18 Dec 2010 8:28 UTC

70 points

162 comments5 min readLW link

Anthropomorphic AI and Sandboxed Virtual Universes

jacob_cannell3 Sep 2010 19:02 UTC

4 points

124 comments5 min readLW link

The AI in a box boxes you

Stuart_Armstrong2 Feb 2010 10:10 UTC

169 points

389 comments1 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer Yudkowsky15 Jul 2009 2:27 UTC

129 points

613 comments2 min readLW link

AIs and Gatekeepers Unite!

Eliezer Yudkowsky9 Oct 2008 17:04 UTC

14 points

163 comments1 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC

26 points

81 comments9 min readLW link

That Alien Message

Eliezer Yudkowsky22 May 2008 5:55 UTC

359 points

174 comments10 min readLW link

Ruby 12 Sep 2020 5:11 UTC
2 points
from the original talk page

Talk:AI boxing
If an SF reference is not considered a faux pas, this reminds me of John Barnes ( https://en.wikipedia.org/wiki/John_Barnes_%28author%29 ) “Meme Wars”. The way One True infected humanity is, if possible, an obvious attack vector for a sufficiently powerful AI. -- Resuna (talk) 10:20, 27 November 2014 (AEDT)

AI Box­ing (Con­tain­ment)

AI Boxing (Containment)