RSS

AI Box­ing (Con­tain­ment)

TagLast edit: 4 May 2020 23:52 UTC by Ruby

AI Boxing is attempts, experiments, or proposals to isolate (“box”) an unaligned AI where it can’t interact with the world at large and cause harm. See also: AI

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

An AI, a box, and a threat

jwfiredragon7 Mar 2024 6:15 UTC
5 points
0 comments6 min readLW link

The case for train­ing fron­tier AIs on Sume­rian-only corpus

15 Jan 2024 16:40 UTC
128 points
14 comments3 min readLW link

[Question] Why do so many think de­cep­tion in AI is im­por­tant?

Prometheus13 Jan 2024 8:14 UTC
23 points
12 comments1 min readLW link

Plan­ning to build a cryp­to­graphic box with perfect secrecy

Lysandre Terrisse31 Dec 2023 9:31 UTC
37 points
6 comments11 min readLW link

Pro­tect­ing against sud­den ca­pa­bil­ity jumps dur­ing training

nikola2 Dec 2023 4:22 UTC
8 points
0 comments2 min readLW link

In­for­ma­tion-The­o­retic Box­ing of Superintelligences

30 Nov 2023 14:31 UTC
30 points
0 comments7 min readLW link

Self-shut­down AI

jan betley21 Aug 2023 16:48 UTC
12 points
2 comments2 min readLW link

[Question] Boxing

Zach Stein-Perlman2 Aug 2023 23:38 UTC
6 points
1 comment1 min readLW link

Thoughts on “Pro­cess-Based Su­per­vi­sion”

Steven Byrnes17 Jul 2023 14:08 UTC
74 points
4 comments23 min readLW link

A way to make solv­ing al­ign­ment 10.000 times eas­ier. The shorter case for a mas­sive open source sim­box pro­ject.

AlexFromSafeTransition21 Jun 2023 8:08 UTC
2 points
16 comments14 min readLW link

[FICTION] Un­box­ing Ely­sium: An AI’S Escape

Super AGI10 Jun 2023 4:41 UTC
−14 points
4 comments14 min readLW link

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

ChatGPT get­ting out of the box

qbolec16 Mar 2023 13:47 UTC
6 points
3 comments1 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Christopher King20 Feb 2023 15:11 UTC
16 points
15 comments1 min readLW link

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC
63 points
27 comments1 min readLW link

How it feels to have your mind hacked by an AI

blaked12 Jan 2023 0:33 UTC
354 points
219 comments17 min readLW link

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieve25 Dec 2022 20:14 UTC
3 points
6 comments1 min readLW link

I’ve up­dated to­wards AI box­ing be­ing sur­pris­ingly easy

Noosphere8925 Dec 2022 15:40 UTC
8 points
20 comments2 min readLW link

Side-chan­nels: in­put ver­sus output

davidad12 Dec 2022 12:32 UTC
44 points
16 comments2 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC
40 points
2 comments5 min readLW link

My take on Ja­cob Can­nell’s take on AGI safety

Steven Byrnes28 Nov 2022 14:01 UTC
71 points
15 comments30 min readLW link1 review

De­ci­sion the­ory does not im­ply that we get to have nice things

So8res18 Oct 2022 3:04 UTC
168 points
58 comments26 min readLW link2 reviews

Another prob­lem with AI con­fine­ment: or­di­nary CPUs can work as ra­dio transmitters

RomanS14 Oct 2022 8:28 UTC
35 points
1 comment1 min readLW link
(news.softpedia.com)

Smoke with­out fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC
51 points
22 comments4 min readLW link

LOVE in a sim­box is all you need

jacob_cannell28 Sep 2022 18:25 UTC
63 points
72 comments44 min readLW link1 review

Gate­keeper Vic­tory: AI Box Reflection

9 Sep 2022 21:38 UTC
6 points
6 comments9 min readLW link

[Question] AI Box Ex­per­i­ment: Are peo­ple still in­ter­ested?

Double31 Aug 2022 3:04 UTC
30 points
13 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon Fischer21 Aug 2022 17:13 UTC
26 points
3 comments7 min readLW link

An Un­canny Prison

Nathan112313 Aug 2022 21:40 UTC
3 points
3 comments2 min readLW link

Dis­sected boxed AI

Nathan112312 Aug 2022 2:37 UTC
−8 points
2 comments1 min readLW link

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

10 Aug 2022 18:14 UTC
27 points
30 comments11 min readLW link

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC
15 points
5 comments22 min readLW link

Loose thoughts on AGI risk

Yitz23 Jun 2022 1:02 UTC
7 points
3 comments1 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland Barstad21 Jun 2022 12:36 UTC
13 points
7 comments9 min readLW link

Mul­ti­ple AIs in boxes, eval­u­at­ing each other’s alignment

Moebius31429 May 2022 8:36 UTC
8 points
0 comments14 min readLW link

Another ar­gu­ment that you will let the AI out of the box

Garrett Baker19 Apr 2022 21:54 UTC
7 points
16 comments2 min readLW link

An AI-in-a-box suc­cess model

azsantosk11 Apr 2022 22:28 UTC
16 points
1 comment10 min readLW link

[In­tro to brain-like-AGI safety] 11. Safety ≠ al­ign­ment (but they’re close!)

Steven Byrnes6 Apr 2022 13:39 UTC
35 points
1 comment10 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

Yitz16 Mar 2022 2:47 UTC
8 points
8 comments1 min readLW link

I wanted to in­ter­view Eliezer Yud­kowsky but he’s busy so I simu­lated him instead

lsusr16 Sep 2021 7:34 UTC
111 points
33 comments5 min readLW link

[Question] Is keep­ing AI “in the box” dur­ing train­ing enough?

tgb6 Jul 2021 15:17 UTC
7 points
10 comments1 min readLW link

Con­tain­ing the AI… In­side a Si­mu­lated Reality

HumaneAutomation31 Oct 2020 16:16 UTC
1 point
9 comments2 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_Armstrong17 Jun 2020 17:44 UTC
60 points
2 comments1 min readLW link

“Don’t even think about hell”

emmab2 May 2020 8:06 UTC
6 points
2 comments1 min readLW link

Or­a­cles: re­ject all deals—break su­per­ra­tional­ity, with superrationality

Stuart_Armstrong5 Dec 2019 13:51 UTC
20 points
4 comments8 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC
25 points
15 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC
22 points
16 comments4 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei Dai10 Sep 2019 8:29 UTC
48 points
26 comments3 min readLW link

[Question] Is there a sim­ple pa­ram­e­ter that con­trols hu­man work­ing mem­ory ca­pac­ity, which has been set trag­i­cally low?

Liron23 Aug 2019 22:10 UTC
17 points
8 comments1 min readLW link

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_Armstrong31 Jul 2019 18:48 UTC
59 points
154 comments3 min readLW link

Self-con­firm­ing prophe­cies, and sim­plified Or­a­cle designs

Stuart_Armstrong28 Jun 2019 9:57 UTC
10 points
1 comment5 min readLW link

Or­a­cles, se­quence pre­dic­tors, and self-con­firm­ing predictions

Stuart_Armstrong3 May 2019 14:09 UTC
22 points
0 comments3 min readLW link

Sand­box­ing by Phys­i­cal Si­mu­la­tion?

moridinamael1 Aug 2018 0:36 UTC
12 points
4 comments1 min readLW link

Safely and use­fully spec­tat­ing on AIs op­ti­miz­ing over toy worlds

AlexMennen31 Jul 2018 18:30 UTC
24 points
16 comments2 min readLW link

Quan­tum AI Box

Gurkenglas8 Jun 2018 16:20 UTC
4 points
15 comments1 min readLW link

AI Align­ment Prize: Su­per-Box­ing

X4vier18 Mar 2018 1:03 UTC
16 points
6 comments6 min readLW link

Epiphe­nom­e­nal Or­a­cles Ig­nore Holes in the Box

SilentCal31 Jan 2018 20:08 UTC
15 points
8 comments2 min readLW link

Or­a­cle paper

Stuart_Armstrong13 Dec 2017 14:59 UTC
12 points
7 comments1 min readLW link

How To Win The AI Box Ex­per­i­ment (Some­times)

pinkgothic12 Sep 2015 12:34 UTC
52 points
21 comments22 min readLW link

How to es­cape from your sand­box and from your hard­ware host

PhilGoetz31 Jul 2015 17:26 UTC
43 points
28 comments1 min readLW link

Box­ing an AI?

tailcalled27 Mar 2015 14:06 UTC
3 points
39 comments1 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGrace9 Dec 2014 2:00 UTC
14 points
48 comments6 min readLW link

xkcd on the AI box experiment

FiftyTwo21 Nov 2014 8:26 UTC
28 points
232 comments1 min readLW link

How to Study Un­safe AGI’s safely (and why we might have no choice)

Punoxysm7 Mar 2014 7:24 UTC
10 points
47 comments5 min readLW link

I played the AI Box Ex­per­i­ment again! (and lost both games)

Tuxedage27 Sep 2013 2:32 UTC
59 points
123 comments11 min readLW link

I at­tempted the AI Box Ex­per­i­ment again! (And won—Twice!)

Tuxedage5 Sep 2013 4:49 UTC
76 points
168 comments12 min readLW link

AI box: AI has one shot at avoid­ing de­struc­tion—what might it say?

ancientcampus22 Jan 2013 20:22 UTC
25 points
355 comments1 min readLW link

I at­tempted the AI Box Ex­per­i­ment (and lost)

Tuxedage21 Jan 2013 2:59 UTC
78 points
245 comments5 min readLW link

AI Box Log

Dorikka27 Jan 2012 4:47 UTC
24 points
30 comments23 min readLW link

AI-Box Ex­per­i­ment—The Acausal Trade Argument

XiXiDu8 Jul 2011 9:18 UTC
14 points
20 comments2 min readLW link

Cryp­to­graphic Boxes for Un­friendly AI

paulfchristiano18 Dec 2010 8:28 UTC
70 points
162 comments5 min readLW link

An­thro­po­mor­phic AI and Sand­boxed Vir­tual Uni­verses

jacob_cannell3 Sep 2010 19:02 UTC
4 points
124 comments5 min readLW link

The AI in a box boxes you

Stuart_Armstrong2 Feb 2010 10:10 UTC
168 points
389 comments1 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer Yudkowsky15 Jul 2009 2:27 UTC
129 points
613 comments2 min readLW link

AIs and Gate­keep­ers Unite!

Eliezer Yudkowsky9 Oct 2008 17:04 UTC
14 points
163 comments1 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
26 points
81 comments9 min readLW link

That Alien Message

Eliezer Yudkowsky22 May 2008 5:55 UTC
357 points
174 comments10 min readLW link