RSS

AI Risk

TagLast edit: 2 Nov 2022 20:27 UTC by brook

AI Risk is analysis of the risks associated with building powerful AI systems.

Related: AI, Orthogonality thesis, Complexity of value, Goodhart’s law, Paperclip maximiser

Su­per­in­tel­li­gence FAQ

Scott Alexander20 Sep 2016 19:00 UTC
128 points
36 comments27 min readLW link

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
401 points
54 comments8 min readLW link2 reviews

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
44 points
9 comments1 min readLW link2 reviews

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob Bensinger7 May 2023 21:52 UTC
91 points
26 comments19 min readLW link

MIRI an­nounces new “Death With Dig­nity” strategy

Eliezer Yudkowsky2 Apr 2022 0:43 UTC
339 points
543 comments18 min readLW link1 review

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

11 Nov 2021 3:01 UTC
328 points
251 comments34 min readLW link1 review

PreDCA: vanessa kosoy’s al­ign­ment protocol

Tamsin Leake20 Aug 2022 10:03 UTC
50 points
8 comments7 min readLW link
(carado.moe)

“Cor­rigi­bil­ity at some small length” by dath ilan

Christopher King5 Apr 2023 1:47 UTC
32 points
3 comments9 min readLW link
(www.glowfic.com)

In­tu­itions about goal-di­rected behavior

Rohin Shah1 Dec 2018 4:25 UTC
54 points
15 comments6 min readLW link

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
885 points
690 comments30 min readLW link3 reviews

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
154 points
5 comments3 min readLW link
(www.conjecture.dev)

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
870 points
219 comments18 min readLW link2 reviews

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimi8 Mar 2021 22:05 UTC
55 points
7 comments9 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

10 Jun 2022 2:08 UTC
59 points
6 comments36 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
35 points
4 comments2 min readLW link

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

18 Dec 2023 20:35 UTC
160 points
20 comments12 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

apc8 Feb 2020 21:28 UTC
104 points
29 comments16 min readLW link

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

22 Nov 2022 18:57 UTC
133 points
98 comments24 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
74 points
9 comments3 min readLW link
(github.com)

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese Anthis22 Nov 2022 16:50 UTC
93 points
64 comments1 min readLW link
(www.science.org)

Robin Han­son’s lat­est AI risk po­si­tion statement

Liron3 Mar 2023 14:25 UTC
55 points
17 comments1 min readLW link
(www.overcomingbias.com)

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimi11 Aug 2020 18:16 UTC
53 points
55 comments1 min readLW link

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC
103 points
13 comments4 min readLW link

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
112 points
62 comments3 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
298 points
88 comments29 min readLW link3 reviews

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
238 points
38 comments12 min readLW link1 review

Stampy’s AI Safety Info—New Distil­la­tions #1 [March 2023]

markov7 Apr 2023 11:06 UTC
42 points
0 comments2 min readLW link
(aisafety.info)

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
8 comments9 min readLW link

Devel­op­men­tal Stages of GPTs

orthonormal26 Jul 2020 22:03 UTC
140 points
71 comments7 min readLW link1 review

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC
67 points
11 comments1 min readLW link

Should we post­pone AGI un­til we reach safety?

otto.barten18 Nov 2020 15:43 UTC
27 points
36 comments3 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

Buck21 Jul 2020 20:01 UTC
81 points
44 comments3 min readLW link

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
215 points
11 comments8 min readLW link

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannell18 Dec 2021 2:15 UTC
32 points
22 comments4 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

15 Feb 2023 18:11 UTC
100 points
26 comments4 min readLW link

Devil’s Ad­vo­cate: Ad­verse Selec­tion Against Con­scien­tious­ness

lionhearted (Sebastian Marshall)28 May 2023 17:53 UTC
10 points
2 comments1 min readLW link

Be­ing at peace with Doom

Johannes C. Mayer9 Apr 2023 14:53 UTC
23 points
11 comments4 min readLW link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin Pope21 Mar 2023 0:06 UTC
362 points
218 comments39 min readLW link

Soft take­off can still lead to de­ci­sive strate­gic advantage

Daniel Kokotajlo23 Aug 2019 16:39 UTC
122 points
47 comments8 min readLW link4 reviews

Ar­chi­tects of Our Own Demise: We Should Stop Devel­op­ing AI

Roko26 Oct 2023 0:36 UTC
174 points
74 comments3 min readLW link

[Question] How likely are sce­nar­ios where AGI ends up overtly or de facto tor­tur­ing us? How likely are sce­nar­ios where AGI pre­vents us from com­mit­ting suicide or dy­ing?

JohnGreer28 Mar 2023 18:00 UTC
11 points
4 comments1 min readLW link

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky24 Nov 2007 0:12 UTC
171 points
166 comments7 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
300 points
33 comments2 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons to Pre­serve Humanity

James_Miller18 Mar 2023 16:22 UTC
30 points
72 comments12 min readLW link

“Endgame safety” for AGI

Steven Byrnes24 Jan 2023 14:15 UTC
84 points
10 comments6 min readLW link

Alexan­der and Yud­kowsky on AGI goals

24 Jan 2023 21:09 UTC
174 points
52 comments26 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhub7 Sep 2019 18:11 UTC
77 points
11 comments8 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimi11 Oct 2021 8:20 UTC
107 points
10 comments15 min readLW link

Did Ben­gio and Teg­mark lose a de­bate about AI x-risk against LeCun and Mitchell?

Karl von Wendt25 Jun 2023 16:59 UTC
107 points
52 comments7 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
65 points
14 comments13 min readLW link

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofsky9 Jun 2022 15:50 UTC
170 points
42 comments17 min readLW link
(www.cold-takes.com)

Re­sults from the lan­guage model hackathon

Esben Kran10 Oct 2022 8:29 UTC
22 points
1 comment4 min readLW link

World-Model In­ter­pretabil­ity Is All We Need

Thane Ruthenis14 Jan 2023 19:37 UTC
29 points
22 comments21 min readLW link

How we could stum­ble into AI catastrophe

HoldenKarnofsky13 Jan 2023 16:20 UTC
64 points
18 comments18 min readLW link
(www.cold-takes.com)

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

Davidmanheim21 Sep 2022 19:32 UTC
13 points
0 comments1 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane Ruthenis2 May 2023 21:34 UTC
98 points
82 comments22 min readLW link

Sys­tems that can­not be un­safe can­not be safe

Davidmanheim2 May 2023 8:53 UTC
62 points
27 comments2 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

steven04618 May 2023 22:30 UTC
33 points
44 comments2 min readLW link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
5 points
2 comments4 min readLW link

Con­tra “Strong Co­her­ence”

DragonGod4 Mar 2023 20:05 UTC
39 points
24 comments1 min readLW link

The other side of the tidal wave

KatjaGrace3 Nov 2023 5:40 UTC
175 points
77 comments1 min readLW link
(worldspiritsockpuppet.com)

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC
146 points
39 comments31 min readLW link

Deep­Mind and Google Brain are merg­ing [Linkpost]

Akash20 Apr 2023 18:47 UTC
55 points
5 comments1 min readLW link
(www.deepmind.com)

4 ways to think about de­moc­ra­tiz­ing AI [GovAI Linkpost]

Akash13 Feb 2023 18:06 UTC
24 points
4 comments1 min readLW link
(www.governance.ai)

[Question] Why are we sure that AI will “want” some­thing?

shminux16 Sep 2022 20:35 UTC
31 points
57 comments1 min readLW link

RA Bounty: Look­ing for feed­back on screen­play about AI Risk

Writer26 Oct 2023 13:23 UTC
30 points
6 comments1 min readLW link

The prob­lem/​solu­tion ma­trix: Calcu­lat­ing the prob­a­bil­ity of AI safety “on the back of an en­velope”

John_Maxwell20 Oct 2019 8:03 UTC
22 points
4 comments2 min readLW link

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane Ruthenis16 Dec 2023 20:08 UTC
170 points
23 comments5 min readLW link

Ac­ti­va­tion ad­di­tions in a small resi­d­ual network

Garrett Baker22 May 2023 20:28 UTC
22 points
4 comments3 min readLW link

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

Akash26 Mar 2023 19:33 UTC
157 points
9 comments4 min readLW link
(arxiv.org)

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofsky27 Oct 2023 15:19 UTC
200 points
33 comments8 min readLW link

Paus­ing AI is Pos­i­tive Ex­pected Value

Liron10 Mar 2024 17:10 UTC
7 points
2 comments3 min readLW link
(twitter.com)

Min­i­mum Vi­able Exterminator

Richard Horvath29 May 2023 16:32 UTC
14 points
5 comments5 min readLW link

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcs15 Mar 2023 17:55 UTC
178 points
41 comments1 min readLW link

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

beren30 Oct 2023 15:20 UTC
3 points
0 comments1 min readLW link

[Linkpost] Scott Alexan­der re­acts to OpenAI’s lat­est post

Akash11 Mar 2023 22:24 UTC
27 points
0 comments5 min readLW link
(astralcodexten.substack.com)

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
170 points
44 comments61 min readLW link
(thezvi.wordpress.com)

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC
201 points
14 comments1 min readLW link

Re­quest: stop ad­vanc­ing AI capabilities

So8res26 May 2023 17:42 UTC
155 points
23 comments1 min readLW link

Talk­ing pub­li­cly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC
173 points
8 comments6 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

Akash30 Dec 2022 19:33 UTC
55 points
3 comments20 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofsky8 Dec 2022 19:50 UTC
71 points
6 comments14 min readLW link
(www.cold-takes.com)

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
107 points
5 comments4 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidad15 Dec 2022 13:42 UTC
64 points
1 comment3 min readLW link

Talk to me about your sum­mer/​ca­reer plans

Akash31 Jan 2023 18:29 UTC
31 points
3 comments2 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

26 Jan 2023 21:01 UTC
39 points
80 comments2 min readLW link

ea.do­mains—Do­mains Free to a Good Home

plex12 Jan 2023 13:32 UTC
24 points
0 comments1 min readLW link

AI Safety Info Distil­la­tion Fellowship

17 Feb 2023 16:16 UTC
47 points
3 comments3 min readLW link

Ban­kless Pod­cast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesed20 Feb 2023 16:42 UTC
83 points
54 comments1 min readLW link
(www.youtube.com)

Evil au­to­com­plete: Ex­is­ten­tial Risk and Next-To­ken Predictors

Yitz28 Feb 2023 8:47 UTC
9 points
3 comments5 min readLW link

Ques­tions about Con­je­cure’s CoEm proposal

9 Mar 2023 19:32 UTC
51 points
4 comments2 min readLW link

Plan for mediocre al­ign­ment of brain-like [model-based RL] AGI

Steven Byrnes13 Mar 2023 14:11 UTC
63 points
24 comments12 min readLW link

The Wizard of Oz Prob­lem: How in­cen­tives and nar­ra­tives can skew our per­cep­tion of AI developments

Akash20 Mar 2023 20:44 UTC
16 points
3 comments6 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace3 Nov 2019 18:20 UTC
82 points
32 comments3 min readLW link1 review
(meteuphoric.com)

Adam Smith Meets AI Doomers

James_Miller31 Jan 2024 15:53 UTC
24 points
9 comments5 min readLW link

Hands-On Ex­pe­rience Is Not Magic

Thane Ruthenis27 May 2023 16:57 UTC
21 points
14 comments5 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
75 points
12 comments15 min readLW link

Three Sto­ries for How AGI Comes Be­fore FAI

John_Maxwell17 Sep 2019 23:26 UTC
27 points
5 comments6 min readLW link

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

Akash25 Apr 2023 18:49 UTC
27 points
11 comments3 min readLW link
(childrenoficarus.substack.com)

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

16 May 2023 15:14 UTC
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_Maxwell11 Mar 2020 16:27 UTC
43 points
20 comments4 min readLW link

Let’s build a fire alarm for AGI

chaosmage15 May 2023 9:16 UTC
−2 points
0 comments2 min readLW link

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilan14 May 2023 20:50 UTC
41 points
1 comment10 min readLW link
(danielfilan.com)

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

18 Oct 2022 18:40 UTC
43 points
9 comments33 min readLW link

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

beren30 May 2023 20:54 UTC
48 points
8 comments5 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC
15 points
3 comments1 min readLW link

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [Sept 2022]

plex8 Sep 2022 11:56 UTC
22 points
48 comments2 min readLW link

“Care­fully Boot­strapped Align­ment” is or­ga­ni­za­tion­ally hard

Raemon17 Mar 2023 18:00 UTC
258 points
22 comments11 min readLW link

Con­tra Han­son on AI Risk

Liron4 Mar 2023 8:02 UTC
36 points
23 comments8 min readLW link

AI Doom Is Not (Only) Disjunctive

NickGabs30 Mar 2023 1:42 UTC
12 points
0 comments5 min readLW link

De­tach­ment vs at­tach­ment [AI risk and men­tal health]

Neil 15 Jan 2024 0:41 UTC
14 points
4 comments3 min readLW link

[Question] What does it look like for AI to sig­nifi­cantly im­prove hu­man co­or­di­na­tion, be­fore su­per­in­tel­li­gence?

jacobjacob15 Jan 2024 19:22 UTC
22 points
2 comments1 min readLW link

Thoughts on re­fus­ing harm­ful re­quests to large lan­guage models

William_S19 Jan 2023 19:49 UTC
30 points
4 comments2 min readLW link

Quote quiz: “drift­ing into de­pen­dence”

jasoncrawford27 Apr 2023 15:13 UTC
7 points
6 comments1 min readLW link
(rootsofprogress.org)

Against most, but not all, AI risk analogies

Matthew Barnett14 Jan 2024 3:36 UTC
62 points
40 comments7 min readLW link

[Question] How much do per­sonal bi­ases in risk as­sess­ment af­fect as­sess­ment of AI risks?

Gordon Seidoh Worley3 May 2023 6:12 UTC
10 points
8 comments1 min readLW link

(4 min read) An in­tu­itive ex­pla­na­tion of the AI in­fluence situation

trevor13 Jan 2024 17:34 UTC
12 points
26 comments4 min readLW link

Wizards and prophets of AI [draft for com­ment]

jasoncrawford31 Mar 2023 20:22 UTC
16 points
11 comments6 min readLW link

Why AI Safety is Hard

Simon Möller22 Mar 2023 10:44 UTC
3 points
0 comments6 min readLW link

[Question] Why not con­strain wet­labs in­stead of AI?

Lone Pine21 Mar 2023 18:02 UTC
15 points
10 comments1 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC
376 points
37 comments14 min readLW link1 review

Ta­boo P(doom)

NathanBarnard3 Feb 2023 10:37 UTC
13 points
10 comments1 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
37 points
5 comments3 min readLW link

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel Nellessen22 Jun 2022 17:59 UTC
2 points
1 comment3 min readLW link
(snellessen.com)

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas Teixeira29 Jul 2022 21:06 UTC
35 points
1 comment9 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_Altair10 Jun 2022 17:38 UTC
21 points
4 comments8 min readLW link

Episte­molog­i­cal Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC
65 points
11 comments10 min readLW link

On A List of Lethalities

Zvi13 Jun 2022 12:30 UTC
161 points
49 comments54 min readLW link1 review
(thezvi.wordpress.com)

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_Friedrich3 Aug 2022 19:17 UTC
1 point
0 comments1 min readLW link
(forms.gle)

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

24 May 2022 0:00 UTC
57 points
2 comments21 min readLW link

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

mako yass2 Mar 2022 23:47 UTC
25 points
10 comments1 min readLW link

Will work­ing here ad­vance AGI? Help us not de­stroy the world!

Yonatan Cale29 May 2022 11:42 UTC
30 points
46 comments1 min readLW link

Reli­a­bil­ity, Se­cu­rity, and AI risk: Notes from in­fosec text­book chap­ter 1

Akash7 Apr 2023 15:47 UTC
34 points
1 comment4 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
52 points
3 comments28 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC
75 points
15 comments3 min readLW link

Distil­led—AGI Safety from First Principles

Harrison G29 May 2022 0:57 UTC
11 points
1 comment14 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_Ngo10 Aug 2022 22:46 UTC
107 points
15 comments27 min readLW link1 review

Many AI gov­er­nance pro­pos­als have a trade­off be­tween use­ful­ness and feasibility

3 Feb 2023 18:49 UTC
22 points
2 comments2 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021 21:42 UTC
47 points
3 comments1 min readLW link

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

Robbo7 Nov 2021 0:00 UTC
43 points
10 comments7 min readLW link

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan Zoellner3 Dec 2021 18:31 UTC
7 points
4 comments1 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

mako yass28 Sep 2021 1:54 UTC
11 points
3 comments1 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC
137 points
42 comments1 min readLW link

In­ter­view with Skynet

lsusr30 Sep 2021 2:20 UTC
49 points
1 comment2 min readLW link

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouarti6 Sep 2021 17:30 UTC
2 points
1 comment1 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei Dai7 Sep 2019 2:54 UTC
116 points
27 comments4 min readLW link1 review

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_Porter16 Oct 2012 4:33 UTC
21 points
39 comments1 min readLW link

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

Jan5 Nov 2021 21:42 UTC
42 points
2 comments12 min readLW link
(universalprior.substack.com)

Us­ing blin­ders to help you see things for what they are

Adam Zerner11 Nov 2021 7:07 UTC
13 points
2 comments2 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

15 Nov 2021 20:31 UTC
250 points
148 comments99 min readLW link1 review

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
127 points
9 comments15 min readLW link

Uber Self-Driv­ing Crash

jefftk7 Nov 2019 15:00 UTC
110 points
1 comment2 min readLW link
(www.jefftk.com)

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC
43 points
19 comments7 min readLW link
(plato.stanford.edu)

More Is Differ­ent for AI

jsteinhardt4 Jan 2022 19:30 UTC
139 points
23 comments3 min readLW link1 review
(bounded-regret.ghost.io)

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron Berg8 Feb 2022 16:12 UTC
26 points
5 comments14 min readLW link

How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:50 UTC
64 points
6 comments13 min readLW link
(www.neelnanda.io)

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not Relevant8 Apr 2022 16:07 UTC
109 points
163 comments4 min readLW link

Why I’m Wor­ried About AI

peterbarnett23 May 2022 21:13 UTC
22 points
2 comments12 min readLW link

I’m try­ing out “as­ter­oid mind­set”

Alex_Altair3 Jun 2022 13:35 UTC
90 points
5 comments4 min readLW link

Why I don’t be­lieve in doom

mukashi7 Jun 2022 23:49 UTC
6 points
30 comments4 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchin10 Jun 2022 17:24 UTC
10 points
2 comments1 min readLW link

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMills15 Jun 2022 3:12 UTC
35 points
4 comments2 min readLW link

Are we there yet?

theflowerpot20 Jun 2022 11:19 UTC
2 points
2 comments1 min readLW link

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan H2 Jul 2022 0:09 UTC
40 points
0 comments1 min readLW link
(arxiv.org)

Con­fu­sions in My Model of AI Risk

peterbarnett7 Jul 2022 1:05 UTC
22 points
9 comments5 min readLW link

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworth12 Aug 2022 16:30 UTC
97 points
50 comments1 min readLW link

Thoughts on ‘List of Lethal­ities’

Alex Lawsen 17 Aug 2022 18:33 UTC
27 points
0 comments10 min readLW link

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshen20 Aug 2022 10:29 UTC
33 points
4 comments7 min readLW link

How Josiah be­came an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC
4 points
0 comments1 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimi17 Sep 2022 17:03 UTC
18 points
0 comments1 min readLW link

Coun­ter­ar­gu­ments to the ba­sic AI x-risk case

KatjaGrace14 Oct 2022 13:00 UTC
368 points
124 comments34 min readLW link1 review
(aiimpacts.org)

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannell23 Oct 2022 21:48 UTC
64 points
44 comments17 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert Miles1 Nov 2022 23:23 UTC
68 points
105 comments2 min readLW link

AI Safety Micro­grant Round

Chris_Leong14 Nov 2022 4:25 UTC
22 points
1 comment1 min readLW link

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

Akash25 Nov 2022 20:47 UTC
37 points
2 comments9 min readLW link

Top les­son from GPT: we will prob­a­bly de­stroy hu­man­ity “for the lulz” as soon as we are able.

shminux16 Apr 2023 20:27 UTC
65 points
28 comments1 min readLW link

Rac­ing through a minefield: the AI de­ploy­ment problem

HoldenKarnofsky22 Dec 2022 16:10 UTC
38 points
2 comments13 min readLW link
(www.cold-takes.com)

Went­worth and Larsen on buy­ing time

9 Jan 2023 21:31 UTC
73 points
6 comments12 min readLW link

[Linkpost] TIME ar­ti­cle: Deep­Mind’s CEO Helped Take AI Main­stream. Now He’s Urg­ing Caution

Akash21 Jan 2023 16:51 UTC
58 points
2 comments3 min readLW link
(time.com)

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC
46 points
20 comments1 min readLW link

Many im­por­tant tech­nolo­gies start out as sci­ence fic­tion be­fore be­com­ing real

trevor10 Feb 2023 9:36 UTC
26 points
2 comments2 min readLW link

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

Akash23 Mar 2023 17:10 UTC
107 points
24 comments6 min readLW link

The Prefer­ence Fulfill­ment Hypothesis

Kaj_Sotala26 Feb 2023 10:55 UTC
66 points
62 comments11 min readLW link

AI #2

Zvi2 Mar 2023 14:50 UTC
66 points
18 comments55 min readLW link
(thezvi.wordpress.com)

Against ubiquitous al­ign­ment taxes

beren6 Mar 2023 19:50 UTC
56 points
10 comments2 min readLW link

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

9 May 2023 15:26 UTC
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel Kokotajlo27 Feb 2020 18:10 UTC
27 points
5 comments8 min readLW link

Will Ar­tifi­cial Su­per­in­tel­li­gence Kill Us?

James_Miller23 May 2023 16:27 UTC
33 points
2 comments22 min readLW link

[Linkpost] Tran­script of Sam Alt­man’s Lex Frid­man interview

trevor19 Mar 2024 1:46 UTC
17 points
1 comment69 min readLW link
(lexfridman.com)

[Question] Should peo­ple build pro­duc­ti­za­tions of open source AI mod­els?

lc2 Nov 2023 1:26 UTC
21 points
0 comments1 min readLW link

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

31 Oct 2023 19:34 UTC
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

Why and When In­ter­pretabil­ity Work is Dangerous

NicholasKross28 May 2023 0:27 UTC
20 points
7 comments8 min readLW link
(www.thinkingmuchbetter.com)

Bayeswatch 7: Wildfire

lsusr8 Sep 2021 5:35 UTC
48 points
6 comments3 min readLW link

Work on Se­cu­rity In­stead of Friendli­ness?

Wei Dai21 Jul 2012 18:28 UTC
65 points
107 comments2 min readLW link

Sen­sor Ex­po­sure can Com­pro­mise the Hu­man Brain in the 2020s

trevor26 Oct 2023 3:31 UTC
17 points
6 comments10 min readLW link

Brain­storm­ing ad­di­tional AI risk re­duc­tion ideas

John_Maxwell14 Jun 2012 7:55 UTC
19 points
37 comments1 min readLW link

AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

23 May 2023 21:47 UTC
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

Four lenses on AI risks

jasoncrawford28 Mar 2023 21:52 UTC
23 points
5 comments3 min readLW link
(rootsofprogress.org)

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC
16 points
15 comments27 min readLW link

Ac­ti­va­tion ad­di­tions in a sim­ple MNIST network

Garrett Baker18 May 2023 2:49 UTC
26 points
0 comments2 min readLW link

Chad Jones pa­per mod­el­ing AI and x-risk vs. growth

jasoncrawford26 Apr 2023 20:07 UTC
39 points
7 comments2 min readLW link
(web.stanford.edu)

AI Align­ment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC
126 points
6 comments35 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworth8 Aug 2020 18:31 UTC
140 points
29 comments3 min readLW link

Sam Alt­man and Ezra Klein on the AI Revolution

Zack_M_Davis27 Jun 2021 4:53 UTC
38 points
17 comments1 min readLW link
(www.nytimes.com)

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_Altair15 May 2023 1:40 UTC
25 points
2 comments1 min readLW link

Neu­ro­science and Alignment

Garrett Baker18 Mar 2024 21:09 UTC
30 points
8 comments2 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021 15:16 UTC
16 points
8 comments8 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTrout22 Jun 2021 22:26 UTC
71 points
43 comments16 min readLW link
(arxiv.org)

Cur­rent AIs Provide Nearly No Data Rele­vant to AGI Alignment

Thane Ruthenis15 Dec 2023 20:16 UTC
110 points
150 comments8 min readLW link

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

20 Dec 2023 17:11 UTC
15 points
8 comments16 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimi28 Aug 2021 13:35 UTC
12 points
2 comments1 min readLW link

New sur­vey: 46% of Amer­i­cans are con­cerned about ex­tinc­tion from AI; 69% sup­port a six-month pause in AI development

Akash5 Apr 2023 1:26 UTC
46 points
9 comments1 min readLW link
(today.yougov.com)

Com­plex Sys­tems are Hard to Control

jsteinhardt4 Apr 2023 0:00 UTC
42 points
5 comments10 min readLW link
(bounded-regret.ghost.io)

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
113 points
13 comments4 min readLW link

Cri­tiquing “What failure looks like”

Grue_Slinky27 Dec 2019 23:59 UTC
35 points
6 comments3 min readLW link

April drafts

AI Impacts1 Apr 2021 18:10 UTC
49 points
2 comments1 min readLW link
(aiimpacts.org)

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June Ku29 Apr 2021 15:38 UTC
21 points
7 comments1 min readLW link

In­cen­tives and Selec­tion: A Miss­ing Frame From AI Threat Dis­cus­sions?

DragonGod26 Feb 2023 1:18 UTC
11 points
16 comments2 min readLW link

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

23 Feb 2023 12:34 UTC
138 points
89 comments75 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
71 points
9 comments3 min readLW link

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippke3 Sep 2021 14:19 UTC
5 points
14 comments1 min readLW link

How evals might (or might not) pre­vent catas­trophic risks from AI

Akash7 Feb 2023 20:16 UTC
40 points
0 comments9 min readLW link

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob Bensinger12 Dec 2021 2:08 UTC
70 points
35 comments7 min readLW link

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

18 Oct 2023 17:06 UTC
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

The AI Safety Game (UPDATED)

Daniel Kokotajlo5 Dec 2020 10:27 UTC
44 points
10 comments3 min readLW link

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Akash20 Dec 2022 21:39 UTC
18 points
2 comments11 min readLW link

AI Takeover Sce­nario with Scaled LLMs

simeon_c16 Apr 2023 23:28 UTC
42 points
15 comments8 min readLW link

[Question] Sugges­tions of posts on the AF to review

adamShimi16 Feb 2021 12:40 UTC
56 points
20 comments1 min readLW link

A Com­mon-Sense Case For Mu­tu­ally-Misal­igned AGIs Ally­ing Against Humans

Thane Ruthenis17 Dec 2023 20:28 UTC
29 points
7 comments11 min readLW link

Poster Ses­sion on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC
7 points
6 comments1 min readLW link

Orthog­o­nal­ity is expensive

beren3 Apr 2023 10:20 UTC
34 points
8 comments3 min readLW link

Fi­nan­cial Times: We must slow down the race to God-like AI

trevor13 Apr 2023 19:55 UTC
103 points
17 comments16 min readLW link
(www.ft.com)

AI al­ign­ment as a trans­la­tion problem

Roman Leventov5 Feb 2024 14:14 UTC
21 points
2 comments3 min readLW link

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

Roko4 Nov 2023 22:23 UTC
106 points
23 comments6 min readLW link

Microdooms averted by work­ing on AI Safety

nikola17 Sep 2023 21:46 UTC
30 points
2 comments3 min readLW link
(forum.effectivealtruism.org)

Google’s Eth­i­cal AI team and AI Safety

magfrump20 Feb 2021 9:42 UTC
12 points
16 comments7 min readLW link

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevor28 Aug 2023 18:54 UTC
37 points
7 comments3 min readLW link

AI risk hub in Sin­ga­pore?

Daniel Kokotajlo29 Oct 2020 11:45 UTC
57 points
18 comments4 min readLW link

[Question] Mea­sure of com­plex­ity al­lowed by the laws of the uni­verse and rel­a­tive the­ory?

dr_s7 Sep 2023 12:21 UTC
8 points
22 comments1 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
117 points
20 comments17 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
205 points
36 comments38 min readLW link2 reviews

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
103 points
17 comments13 min readLW link

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

13 Sep 2023 13:29 UTC
19 points
0 comments5 min readLW link
(aisafety.camp)

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimi11 Mar 2021 15:01 UTC
21 points
12 comments9 min readLW link

“Why can’t you just turn it off?”

Roko19 Nov 2023 14:46 UTC
42 points
25 comments1 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_Davis17 Sep 2020 2:23 UTC
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

Win­ners of AI Align­ment Awards Re­search Contest

13 Jul 2023 16:14 UTC
114 points
3 comments12 min readLW link
(alignmentawards.com)

Clar­ify­ing “What failure looks like”

Sam Clarke20 Sep 2020 20:40 UTC
95 points
14 comments17 min readLW link

If you wish to make an ap­ple pie, you must first be­come dic­ta­tor of the universe

jasoncrawford5 Jul 2023 18:14 UTC
27 points
9 comments13 min readLW link
(rootsofprogress.org)

[Question] First and Last Ques­tions for GPT-5*

Mitchell_Porter24 Nov 2023 5:03 UTC
15 points
5 comments1 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
183 points
42 comments12 min readLW link3 reviews

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
82 points
5 comments6 min readLW link

Will GPT-5 be able to self-im­prove?

Nathan Helm-Burger29 Apr 2023 17:34 UTC
18 points
22 comments3 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
83 points
48 comments12 min readLW link

Risks from AI Overview: Summary

18 Aug 2023 1:21 UTC
25 points
0 comments13 min readLW link
(www.safe.ai)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
106 points
16 comments5 min readLW link
(arxiv.org)

Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
104 points
11 comments12 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
22 points
12 comments4 min readLW link

Cortés, AI Risk, and the Dy­nam­ics of Com­pet­ing Conquerors

James_Miller2 Jan 2024 16:37 UTC
14 points
2 comments3 min readLW link

An un­al­igned benchmark

paulfchristiano17 Nov 2018 15:51 UTC
31 points
0 comments9 min readLW link

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

4 Oct 2023 17:37 UTC
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven Byrnes24 Nov 2019 1:40 UTC
25 points
3 comments7 min readLW link

Eight Short Stud­ies On Excuses

Scott Alexander20 Apr 2010 23:01 UTC
783 points
250 comments10 min readLW link

AI Safety is Drop­ping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC
69 points
72 comments34 min readLW link

A case for AI al­ign­ment be­ing difficult

jessicata31 Dec 2023 19:55 UTC
95 points
51 comments15 min readLW link
(unstableontology.com)

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
410 points
54 comments7 min readLW link
(www.youtube.com)

Ar­gu­ments against ex­is­ten­tial risk from AI, part 2

Nina Rimsky10 Jul 2023 8:25 UTC
6 points
0 comments5 min readLW link
(ninarimsky.substack.com)

The Main Sources of AI Risk?

21 Mar 2019 18:28 UTC
119 points
26 comments2 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
79 points
12 comments9 min readLW link

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristiano31 Jul 2023 16:04 UTC
191 points
34 comments11 min readLW link

“Tak­ing AI Risk Se­ri­ously” (thoughts by Critch)

Raemon29 Jan 2018 9:27 UTC
110 points
68 comments13 min readLW link

Catas­trophic Risks from AI #6: Dis­cus­sion and FAQ

27 Jun 2023 23:23 UTC
24 points
1 comment13 min readLW link
(arxiv.org)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC
37 points
4 comments6 min readLW link
(kajsotala.fi)

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC
22 points
11 comments6 min readLW link

Six AI Risk/​Strat­egy Ideas

Wei Dai27 Aug 2019 0:40 UTC
64 points
17 comments4 min readLW link1 review

[Question] Did AI pi­o­neers not worry much about AI risks?

lisperati9 Feb 2020 19:58 UTC
42 points
9 comments1 min readLW link

Some dis­junc­tive rea­sons for ur­gency on AI risk

Wei Dai15 Feb 2019 20:43 UTC
36 points
24 comments1 min readLW link

Drexler on AI Risk

PeterMcCluskey1 Feb 2019 5:11 UTC
35 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

A shift in ar­gu­ments for AI risk

Richard_Ngo28 May 2019 13:47 UTC
32 points
7 comments1 min readLW link
(fragile-credences.github.io)

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
133 points
23 comments8 min readLW link

My re­search agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC
70 points
9 comments11 min readLW link

Catas­trophic Risks from AI #5: Rogue AIs

27 Jun 2023 22:06 UTC
15 points
0 comments22 min readLW link
(arxiv.org)

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
221 points
61 comments15 min readLW link2 reviews

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin Shah2 Jan 2020 18:20 UTC
36 points
95 comments10 min readLW link
(mailchi.mp)

Catas­trophic Risks from AI #4: Or­ga­ni­za­tional Risks

26 Jun 2023 19:36 UTC
23 points
0 comments21 min readLW link
(arxiv.org)

The Short­est Path Between Scylla and Charybdis

Thane Ruthenis18 Dec 2023 20:08 UTC
50 points
8 comments5 min readLW link

The strat­egy-steal­ing assumption

paulfchristiano16 Sep 2019 15:23 UTC
86 points
53 comments12 min readLW link3 reviews

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusr20 Sep 2021 6:36 UTC
70 points
16 comments6 min readLW link

Levels of safety for AI and other technologies

jasoncrawford28 Jun 2023 18:35 UTC
16 points
0 comments2 min readLW link
(rootsofprogress.org)

Catas­trophic Risks from AI #1: Introduction

22 Jun 2023 17:09 UTC
40 points
1 comment5 min readLW link
(arxiv.org)

Epistemic Strate­gies of Selec­tion Theorems

adamShimi18 Oct 2021 8:57 UTC
33 points
1 comment12 min readLW link

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimi22 Oct 2021 8:22 UTC
5 points
0 comments6 min readLW link

A plea for solu­tion­ism on AI safety

jasoncrawford9 Jun 2023 16:29 UTC
72 points
6 comments6 min readLW link
(rootsofprogress.org)

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_it15 Jan 2018 14:33 UTC
80 points
68 comments2 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [April 2023]

steven04618 Apr 2023 4:21 UTC
57 points
88 comments2 min readLW link

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicq14 Nov 2021 14:08 UTC
2 points
3 comments2 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

19 Dec 2023 19:14 UTC
45 points
4 comments6 min readLW link
(arxiv.org)

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
81 points
14 comments7 min readLW link

Catas­trophic Risks from AI #2: Mal­i­cious Use

22 Jun 2023 17:10 UTC
38 points
1 comment17 min readLW link
(arxiv.org)

Re­ply to Holden on ‘Tool AI’

Eliezer Yudkowsky12 Jun 2012 18:00 UTC
152 points
356 comments17 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Remmelt2 Jun 2023 15:42 UTC
47 points
46 comments13 min readLW link

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_Sotala4 May 2018 8:56 UTC
14 points
1 comment1 min readLW link
(arxiv.org)

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)7 Apr 2023 20:27 UTC
6 points
3 comments4 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskey28 Dec 2021 2:20 UTC
10 points
0 comments6 min readLW link
(www.bayesianinvestor.com)

All images from the WaitButWhy se­quence on AI

trevor8 Apr 2023 7:36 UTC
72 points
5 comments2 min readLW link

Thoughts on AGI safety from the top

jylin042 Feb 2022 20:06 UTC
36 points
3 comments32 min readLW link

Up­grad­ing the AI Safety Community

16 Dec 2023 15:34 UTC
41 points
9 comments42 min readLW link

Paradigm-build­ing: Introduction

Cameron Berg8 Feb 2022 0:06 UTC
28 points
0 comments2 min readLW link

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilan31 Mar 2022 5:20 UTC
24 points
1 comment48 min readLW link

It Looks Like You’re Try­ing To Take Over The World

gwern9 Mar 2022 16:35 UTC
402 points
120 comments1 min readLW link1 review
(www.gwern.net)

“warn­ing about ai doom” is also “an­nounc­ing ca­pa­bil­ities progress to noobs”

the gears to ascension8 Apr 2023 23:42 UTC
16 points
5 comments3 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
47 points
42 comments1 min readLW link

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:00 UTC
13 points
1 comment1 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

30 May 2022 20:25 UTC
51 points
3 comments25 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_Sotala2 Jun 2022 21:41 UTC
129 points
27 comments4 min readLW link

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC
73 points
12 comments16 min readLW link

AI Safety Newslet­ter #1 [CAIS Linkpost]

10 Apr 2023 20:18 UTC
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

A Quick Guide to Con­fronting Doom

Ruby13 Apr 2022 19:30 UTC
240 points
33 comments2 min readLW link

A moral back­lash against AI will prob­a­bly slow down AGI development

geoffreymiller7 Jun 2023 20:39 UTC
47 points
10 comments14 min readLW link

Catas­trophic Risks from AI #3: AI Race

23 Jun 2023 19:21 UTC
18 points
9 comments29 min readLW link
(arxiv.org)

Con­ti­nu­ity Assumptions

Jan_Kulveit13 Jun 2022 21:31 UTC
35 points
13 comments4 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_Critch14 Jun 2022 19:31 UTC
237 points
41 comments2 min readLW link1 review

I Think Eliezer Should Go on Glenn Beck

Lao Mein30 Jun 2023 3:12 UTC
25 points
21 comments1 min readLW link

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_Leong15 Jun 2022 8:33 UTC
7 points
17 comments1 min readLW link

Where’s the foom?

Fergus Fettes11 Apr 2023 15:50 UTC
34 points
27 comments2 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
69 points
27 comments27 min readLW link

The Align­ment Problem

lsusr11 Jul 2022 3:03 UTC
46 points
18 comments3 min readLW link

Re­view of “Fun with +12 OOMs of Com­pute”

28 Mar 2021 14:55 UTC
63 points
21 comments8 min readLW link1 review

Sum­mary of the Acausal At­tack Is­sue for AIXI

Diffractor13 Dec 2021 8:16 UTC
12 points
6 comments4 min readLW link

[Link] Sarah Con­stantin: “Why I am Not An AI Doomer”

lbThingrb12 Apr 2023 1:52 UTC
61 points
13 comments1 min readLW link
(sarahconstantin.substack.com)

Shapes of Mind and Plu­ral­ism in Alignment

adamShimi13 Aug 2022 10:01 UTC
33 points
2 comments2 min readLW link

[Question] Good tax­onomies of all risks (small or large) from AI?

Aryeh Englander5 Mar 2024 18:15 UTC
6 points
1 comment1 min readLW link

Against a Gen­eral Fac­tor of Doom

Jeffrey Heninger23 Nov 2022 16:50 UTC
61 points
19 comments5 min readLW link1 review
(aiimpacts.org)

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC
23 points
1 comment2 min readLW link

Beyond Hyperanthropomorphism

PointlessOne21 Aug 2022 17:55 UTC
3 points
17 comments1 min readLW link
(studio.ribbonfarm.com)

AI as Su­per-Demagogue

RationalDino5 Nov 2023 21:21 UTC
−2 points
9 comments9 min readLW link

AI com­mu­nity build­ing: EliezerKart

Christopher King1 Apr 2023 15:25 UTC
45 points
0 comments2 min readLW link

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
260 points
40 comments3 min readLW link

Is AGI suici­dal­ity the golden ray of hope?

Alex Kirko4 Apr 2023 23:29 UTC
−18 points
4 comments1 min readLW link

Win­ners-take-how-much?

YonatanK29 May 2023 21:56 UTC
1 point
2 comments3 min readLW link

En­gag­ing First In­tro­duc­tions to AI Risk

Rob Bensinger19 Aug 2013 6:26 UTC
31 points
21 comments3 min readLW link

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbance2 Nov 2023 17:43 UTC
82 points
89 comments5 min readLW link

Imag­ine a world where Microsoft em­ploy­ees used Bing

Christopher King31 Mar 2023 18:36 UTC
6 points
2 comments2 min readLW link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

2 Feb 2024 23:17 UTC
94 points
0 comments9 min readLW link

Pro­posed Align­ment Tech­nique: OSNR (Out­put San­i­ti­za­tion via Nois­ing and Re­con­struc­tion) for Safer Usage of Po­ten­tially Misal­igned AGI

sudo29 May 2023 1:35 UTC
14 points
9 comments6 min readLW link

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher King31 Mar 2023 17:05 UTC
6 points
4 comments4 min readLW link

Hands of gods

Anders L28 May 2023 15:15 UTC
1 point
0 comments9 min readLW link
(woodfromeden.substack.com)

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC
−19 points
11 comments2 min readLW link

Why Yud­kowsky Is Wrong And What He Does Can Be More Dangerous

idontagreewiththat6 Jun 2023 17:59 UTC
−40 points
3 comments3 min readLW link

In­fer­ence from a Math­e­mat­i­cal De­scrip­tion of an Ex­ist­ing Align­ment Re­search: a pro­posal for an outer al­ign­ment re­search program

Christopher King2 Jun 2023 21:54 UTC
7 points
4 comments16 min readLW link

An In­ter­na­tional Man­hat­tan Pro­ject for Ar­tifi­cial Intelligence

Glenn Clayton27 Apr 2023 17:34 UTC
−11 points
2 comments5 min readLW link

Wi­den­ing Over­ton Win­dow—Open Thread

Prometheus31 Mar 2023 10:03 UTC
23 points
8 comments1 min readLW link

Char­bel-Raphaël and Lu­cius dis­cuss Interpretability

30 Oct 2023 5:50 UTC
104 points
7 comments21 min readLW link

An LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:12 UTC
16 points
0 comments12 min readLW link

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew Wearden30 Oct 2023 17:27 UTC
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

What Failure Looks Like is not an ex­is­ten­tial risk (and al­ign­ment is not the solu­tion)

otto.barten2 Feb 2024 18:59 UTC
13 points
12 comments9 min readLW link

Align­ing an H-JEPA agent via train­ing on the out­puts of an LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:08 UTC
12 points
10 comments30 min readLW link

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-Perlman25 May 2023 3:00 UTC
94 points
11 comments1 min readLW link
(arxiv.org)

[un­ti­tled post]

NeuralSystem_e5e127 Apr 2023 17:37 UTC
3 points
0 comments1 min readLW link

1hr talk: In­tro to AGI safety

Steven Byrnes18 Jun 2019 21:41 UTC
36 points
4 comments24 min readLW link

Alle­gory On AI Risk, Game The­ory, and Mithril

James_Miller13 Feb 2017 20:41 UTC
45 points
57 comments3 min readLW link

Align­ment—Path to AI as ally, not slave nor foe

ozb30 Mar 2023 14:54 UTC
10 points
3 comments2 min readLW link

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal Naran26 Oct 2023 17:54 UTC
−7 points
2 comments2 min readLW link

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_c25 Oct 2023 23:46 UTC
114 points
33 comments22 min readLW link
(www.navigatingrisks.ai)

[Question] What if AGI had its own uni­verse to maybe wreck?

mseale26 Oct 2023 17:49 UTC
−1 points
2 comments1 min readLW link

Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibs29 Mar 2023 23:16 UTC
298 points
296 comments3 min readLW link
(time.com)

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

StellaAthena25 Jan 2024 19:17 UTC
94 points
14 comments1 min readLW link
(www.rand.org)

I made a P(doom) calcu­la­tor for con­ve­nient Fermi estimation

Nicholas Kruus27 Dec 2023 18:22 UTC
1 point
0 comments5 min readLW link

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks19 Dec 2019 3:00 UTC
130 points
18 comments62 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.barten24 Oct 2023 10:11 UTC
17 points
1 comment1 min readLW link

“Sorcerer’s Ap­pren­tice” from Fan­ta­sia as an anal­ogy for alignment

awg29 Mar 2023 18:21 UTC
7 points
4 comments1 min readLW link
(video.disney.com)

Hal­loween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC
−10 points
1 comment1 min readLW link

I made AI Risk Propaganda

monkymind29 Mar 2023 14:26 UTC
−3 points
0 comments1 min readLW link

Yoshua Ben­gio: How Rogue AIs may Arise

harfe23 May 2023 18:28 UTC
92 points
12 comments18 min readLW link
(yoshuabengio.org)

Free­dom Is All We Need

Leo Glisic27 Apr 2023 0:09 UTC
−1 points
8 comments10 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooen8 Apr 2018 20:04 UTC
19 points
9 comments4 min readLW link

Take­aways from safety by de­fault interviews

3 Apr 2020 17:20 UTC
28 points
2 comments13 min readLW link
(aiimpacts.org)

“Un­in­ten­tional AI safety re­search”: Why not sys­tem­at­i­cally mine AI tech­ni­cal re­search for safety pur­poses?

ghostwheel29 Mar 2023 15:56 UTC
27 points
3 comments6 min readLW link

An­nounc­ing Con­ver­gence Anal­y­sis: An In­sti­tute for AI Sce­nario & Gover­nance Research

7 Mar 2024 21:37 UTC
22 points
1 comment4 min readLW link

More Thoughts on the Hu­man-AGI War

Seth Ahrenbach27 Dec 2023 1:03 UTC
−3 points
4 comments7 min readLW link

Post se­ries on “Li­a­bil­ity Law for re­duc­ing Ex­is­ten­tial Risk from AI”

Nora_Ammann29 Feb 2024 4:39 UTC
42 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 14:47 UTC
48 points
2 comments2 min readLW link

The Friendly AI Game

bentarm15 Mar 2011 16:45 UTC
50 points
178 comments1 min readLW link

My Assess­ment of the Chi­nese AI Safety Community

Lao Mein25 Apr 2023 4:21 UTC
244 points
93 comments3 min readLW link

Mak­ing Nanobots isn’t a one-shot pro­cess, even for an ar­tifi­cial superintelligance

dankrad25 Apr 2023 0:39 UTC
20 points
13 comments6 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne B27 Apr 2023 18:05 UTC
0 points
5 comments3 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

I had a chat with GPT-4 on the fu­ture of AI and AI safety

Kristian Freed28 Mar 2023 17:47 UTC
1 point
0 comments8 min readLW link

[un­ti­tled post]

[Error communicating with LW2 server]20 May 2023 3:08 UTC
1 point
0 comments1 min readLW link

A con­cise sum-up of the ba­sic ar­gu­ment for AI doom

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC
11 points
6 comments2 min readLW link

Adapt­ing to Change: Over­com­ing Chronos­ta­sis in AI Lan­guage Models

RationalMindset28 Mar 2023 14:32 UTC
−1 points
0 comments6 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC
1 point
0 comments1 min readLW link

[Question] In­ves­ti­gat­ing Alter­na­tive Fu­tures: Hu­man and Su­per­in­tel­li­gence In­ter­ac­tion Scenarios

Hiroshi Yamakawa27 Dec 2023 18:19 UTC
−4 points
0 comments17 min readLW link

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher King16 May 2023 15:18 UTC
22 points
6 comments2 min readLW link

Con­fu­sions and up­dates on STEM AI

Eleni Angelou19 May 2023 21:34 UTC
21 points
0 comments3 min readLW link

A&I (Rihanna ‘S&M’ par­ody lyrics)

nahoj21 May 2023 22:34 UTC
−3 points
0 comments2 min readLW link

GPT as an “In­tel­li­gence Fork­lift.”

boazbarak19 May 2023 21:15 UTC
46 points
27 comments3 min readLW link

We Shouldn’t Ex­pect AI to Ever be Fully Rational

OneManyNone18 May 2023 17:09 UTC
19 points
31 comments6 min readLW link

[Cross­post] A re­cent write-up of the case for AI (ex­is­ten­tial) risk

Timsey18 May 2023 13:13 UTC
6 points
0 comments19 min readLW link

The Po­lar­ity Prob­lem [Draft]

23 May 2023 21:05 UTC
24 points
3 comments44 min readLW link

Paths to failure

25 Apr 2023 8:03 UTC
29 points
1 comment8 min readLW link

A flaw in the A.G.I. Ruin Argument

Cole Wyeth19 May 2023 19:40 UTC
1 point
6 comments3 min readLW link
(colewyeth.com)

Tyler Cowen’s challenge to de­velop an ‘ac­tual math­e­mat­i­cal model’ for AI X-Risk

Joe Brenton16 May 2023 11:57 UTC
6 points
4 comments1 min readLW link

Q&A with Jür­gen Sch­mid­hu­ber on risks from AI

XiXiDu15 Jun 2011 15:51 UTC
61 points
45 comments4 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

Razied5 Aug 2020 23:52 UTC
7 points
4 comments1 min readLW link

Field-Build­ing and Deep Models

Ben Pace13 Jan 2018 21:16 UTC
21 points
12 comments4 min readLW link

Half-baked al­ign­ment idea

ozb28 Mar 2023 17:47 UTC
6 points
27 comments1 min readLW link

AI Risk & Policy Fore­casts from Me­tac­u­lus & FLI’s AI Path­ways Workshop

_will_16 May 2023 18:06 UTC
11 points
4 comments8 min readLW link

[Question] What pro­jects and efforts are there to pro­mote AI safety re­search?

Christopher King24 May 2023 0:33 UTC
4 points
0 comments1 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dov23 Apr 2023 5:34 UTC
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

A re­jec­tion of the Orthog­o­nal­ity Thesis

ArisC24 May 2023 16:37 UTC
−2 points
11 comments2 min readLW link
(medium.com)

Brain­storm­ing: Slow Takeoff

David Piepgrass23 Jan 2024 6:58 UTC
2 points
0 comments51 min readLW link

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarks22 Apr 2023 14:36 UTC
39 points
7 comments6 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_Armstrong13 Jan 2015 13:35 UTC
52 points
19 comments2 min readLW link

Co­or­di­na­tion by com­mon knowl­edge to pre­vent un­con­trol­lable AI

Karl von Wendt14 May 2023 13:37 UTC
10 points
2 comments9 min readLW link

PCAST Work­ing Group on Gen­er­a­tive AI In­vites Public Input

Christopher King13 May 2023 22:49 UTC
7 points
0 comments1 min readLW link
(terrytao.wordpress.com)

Two ideas for al­ign­ment, per­pet­ual mu­tual dis­trust and induction

APaleBlueDot25 May 2023 0:56 UTC
1 point
2 comments4 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

Ofer11 Aug 2020 17:30 UTC
15 points
0 comments2 min readLW link

G.K. Ch­ester­ton On AI Risk

Scott Alexander1 Apr 2017 19:00 UTC
17 points
0 comments7 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer Yudkowsky30 Aug 2008 2:12 UTC
30 points
56 comments12 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
26 points
81 comments9 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazz24 Jun 2018 15:09 UTC
33 points
0 comments1 min readLW link
(www.foldl.me)

On un­fix­ably un­safe AGI architectures

Steven Byrnes19 Feb 2020 21:16 UTC
33 points
8 comments5 min readLW link

A toy model of the treach­er­ous turn

Stuart_Armstrong8 Jan 2016 12:58 UTC
42 points
13 comments6 min readLW link

The way AGI wins could look very stupid

Christopher King12 May 2023 16:34 UTC
42 points
22 comments1 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubK21 Apr 2023 10:07 UTC
13 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

Snorkelfarsan25 May 2023 16:30 UTC
2 points
0 comments25 min readLW link

What can we learn from Lex Frid­man’s in­ter­view with Sam Alt­man?

Karl von Wendt27 Mar 2023 6:27 UTC
56 points
22 comments9 min readLW link

The Evil AI Over­lord List

Stuart_Armstrong20 Nov 2012 17:02 UTC
44 points
80 comments1 min readLW link

AI safety ad­vo­cates should con­sider pro­vid­ing gen­tle push­back fol­low­ing the events at OpenAI

civilsociety22 Dec 2023 18:55 UTC
16 points
5 comments3 min readLW link

For­mu­lat­ing the AI Doom Ar­gu­ment for An­a­lytic Philosophers

JonathanErhardt12 May 2023 7:54 UTC
13 points
0 comments2 min readLW link

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver Sourbut15 May 2023 13:23 UTC
26 points
10 comments12 min readLW link
(www.oliversourbut.net)

[Question] Why is vi­o­lence against AI labs a taboo?

ArisC26 May 2023 8:00 UTC
−21 points
63 comments1 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher King20 Apr 2023 19:57 UTC
2 points
7 comments3 min readLW link

[Question] What’s your view­point on the like­li­hood of GPT-5 be­ing able to au­tonomously cre­ate, train, and im­ple­ment an AI su­pe­rior to GPT-5?

Super AGI26 May 2023 1:43 UTC
7 points
15 comments1 min readLW link

What I would like the SIAI to publish

XiXiDu1 Nov 2010 14:07 UTC
36 points
225 comments3 min readLW link

AI X-risk is a pos­si­ble solu­tion to the Fermi Paradox

magic9mushroom30 May 2023 17:42 UTC
11 points
20 comments2 min readLW link

[Question] Term/​Cat­e­gory for AI with Neu­tral Im­pact?

isomic11 May 2023 22:00 UTC
6 points
1 comment1 min readLW link

[Question] How Poli­tics in­ter­acts with AI ?

qbolec26 Mar 2023 9:53 UTC
−18 points
4 comments1 min readLW link

Align­ment, Goals, and The Gut-Head Gap: A Re­view of Ngo. et al.

Violet Hour11 May 2023 18:06 UTC
20 points
2 comments13 min readLW link

Separat­ing the “con­trol prob­lem” from the “al­ign­ment prob­lem”

Yi-Yang11 May 2023 9:41 UTC
12 points
1 comment4 min readLW link

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

A more grounded idea of AI risk

Iknownothing11 May 2023 9:48 UTC
3 points
4 comments1 min readLW link

Without a tra­jec­tory change, the de­vel­op­ment of AGI is likely to go badly

Max H29 May 2023 23:42 UTC
16 points
2 comments13 min readLW link

On the Im­pos­si­bil­ity of In­tel­li­gent Paper­clip Maximizers

Michael Simkin29 May 2023 16:55 UTC
−21 points
5 comments4 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo Putera11 May 2023 4:16 UTC
11 points
1 comment3 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC
11 points
3 comments1 min readLW link
(github.com)

ChatGPT Plu­g­ins—The Begin­ning of the End

Bary Levy25 Mar 2023 11:45 UTC
15 points
4 comments1 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFox10 Jan 2013 8:17 UTC
39 points
187 comments4 min readLW link

The Un­der­re­ac­tion to OpenAI

Sherrinford18 Jan 2024 22:08 UTC
19 points
0 comments6 min readLW link

[Question] AI in­ter­pretabil­ity could be harm­ful?

Roman Leventov10 May 2023 20:43 UTC
13 points
2 comments1 min readLW link

[Linkpost] Mark Zucker­berg con­fronted about Meta’s Llama 2 AI’s abil­ity to give users de­tailed guidance on mak­ing an­thrax—Busi­ness Insider

mic26 Sep 2023 12:05 UTC
18 points
11 comments2 min readLW link
(www.businessinsider.com)

The mind-killer

Paul Crowley2 May 2009 16:49 UTC
29 points
160 comments2 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
145 points
81 comments1 min readLW link
(titotal.substack.com)

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre Peigné23 Sep 2023 16:21 UTC
29 points
8 comments5 min readLW link

Linkpost: Are Emer­gent Abil­ities in Large Lan­guage Models just In-Con­text Learn­ing?

Erich_Grunewald8 Oct 2023 12:14 UTC
12 points
6 comments2 min readLW link
(arxiv.org)

Paper: Iden­ti­fy­ing the Risks of LM Agents with an LM-Emu­lated Sand­box—Univer­sity of Toronto 2023 - Bench­mark con­sist­ing of 36 high-stakes tools and 144 test cases!

Singularian25019 Oct 2023 0:00 UTC
5 points
0 comments1 min readLW link

A thought ex­per­i­ment to help per­suade skep­tics that power-seek­ing AI is plausible

jacobcd5225 Nov 2023 23:26 UTC
1 point
4 comments5 min readLW link

Ideation and Tra­jec­tory Model­ling in Lan­guage Models

NickyP5 Oct 2023 19:21 UTC
15 points
2 comments10 min readLW link

The Gra­di­ent – The Ar­tifi­cial­ity of Alignment

mic8 Oct 2023 4:06 UTC
12 points
1 comment5 min readLW link
(thegradient.pub)

[Question] Should I do it?

MrLight19 Nov 2020 1:08 UTC
−3 points
16 comments2 min readLW link

Be­come a PIBBSS Re­search Affiliate

10 Oct 2023 7:41 UTC
24 points
6 comments6 min readLW link

Ra­tion­al­is­ing hu­mans: an­other mug­ging, but not Pas­cal’s

Stuart_Armstrong14 Nov 2017 15:46 UTC
7 points
1 comment3 min readLW link

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d616 Dec 2020 13:32 UTC
26 points
15 comments15 min readLW link
(cerebralab.com)

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
140 points
28 comments14 min readLW link

Back to the Past to the Future

Prometheus18 Oct 2023 16:51 UTC
5 points
0 comments1 min readLW link

[Question] What do you make of AGI:un­al­igned::space­ships:not enough food?

Ronny Fernandez22 Feb 2020 14:14 UTC
4 points
3 comments1 min readLW link

Tax­on­omy of AI-risk counterarguments

Odd anon16 Oct 2023 0:12 UTC
61 points
13 comments8 min readLW link

Risk Map of AI Systems

15 Dec 2020 9:16 UTC
28 points
3 comments8 min readLW link

AISU 2021

Linda Linsefors30 Jan 2021 17:40 UTC
28 points
2 comments1 min readLW link

Non­per­son Predicates

Eliezer Yudkowsky27 Dec 2008 1:47 UTC
62 points
177 comments6 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
49 points
123 comments2 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam Clarke7 Mar 2021 21:44 UTC
15 points
5 comments1 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxae25 Mar 2021 14:46 UTC
15 points
6 comments3 min readLW link

AI or­a­cles on blockchain

Caravaggio6 Apr 2021 20:13 UTC
5 points
0 comments3 min readLW link

What if AGI is near?

Wulky Wilkinsen14 Apr 2021 0:05 UTC
11 points
5 comments1 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky Wilkinsen22 Apr 2021 20:37 UTC
5 points
5 comments1 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

Quinn4 May 2021 19:10 UTC
6 points
2 comments1 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_Evans21 May 2021 13:39 UTC
58 points
2 comments3 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
63 points
11 comments7 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

Quinn27 Jun 2021 17:44 UTC
17 points
6 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg Severens3 Jul 2021 21:23 UTC
4 points
0 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykov16 Aug 2021 5:03 UTC
3 points
5 comments3 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos Ramirez27 Aug 2021 1:48 UTC
29 points
17 comments8 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-Perlman29 Aug 2021 18:00 UTC
5 points
2 comments11 min readLW link

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
72 points
11 comments14 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

Buck9 Sep 2021 19:46 UTC
88 points
12 comments5 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
58 points
24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
27 points
1 comment27 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard Harris27 Sep 2021 15:55 UTC
76 points
13 comments10 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
31 points
7 comments10 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

[Question] Whom Do You Trust?

JackOfAllTrades26 Feb 2024 19:38 UTC
1 point
0 comments1 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
42 points
1 comment13 min readLW link

What is the most evil AI that we could build, to­day?

ThomasJ1 Nov 2021 19:58 UTC
−2 points
14 comments1 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius Hobbhahn8 Nov 2021 12:51 UTC
29 points
15 comments12 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJules11 Nov 2021 7:04 UTC
2 points
2 comments1 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant Demaree14 Nov 2021 8:09 UTC
75 points
39 comments3 min readLW link

Two Stupid AI Align­ment Ideas

aphyer16 Nov 2021 16:13 UTC
24 points
3 comments4 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair Halberstadt16 Nov 2021 19:55 UTC
10 points
2 comments6 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

23 Nov 2021 19:16 UTC
64 points
13 comments3 min readLW link
(aitracker.org)

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika Srikumar24 Nov 2021 8:27 UTC
6 points
0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
16 points
5 comments7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
54 points
1 comment12 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_lika7 Dec 2021 19:37 UTC
1 point
0 comments1 min readLW link

Univer­sal­ity and the “Filter”

maggiehayes16 Dec 2021 0:47 UTC
10 points
2 comments11 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC
79 points
20 comments1 min readLW link

2+2: On­tolog­i­cal Framework

Lyrialtus1 Feb 2022 1:07 UTC
−15 points
2 comments12 min readLW link

Can the laws of physics/​na­ture pre­vent hell?

superads916 Feb 2022 20:39 UTC
−5 points
8 comments2 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

15 Feb 2022 18:16 UTC
15 points
4 comments8 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donor6 Mar 2022 18:43 UTC
39 points
11 comments5 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

mic9 Mar 2022 15:14 UTC
51 points
0 comments2 min readLW link

Is There a Valley of Bad Civ­i­liza­tional Ad­e­quacy?

lbThingrb11 Mar 2022 19:49 UTC
13 points
1 comment2 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

Yitz16 Mar 2022 2:47 UTC
8 points
8 comments1 min readLW link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy8 Apr 2022 11:40 UTC
197 points
25 comments4 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bear12 Apr 2022 13:07 UTC
6 points
2 comments2 min readLW link

The Peerless

Tamsin Leake13 Apr 2022 1:07 UTC
18 points
2 comments1 min readLW link
(carado.moe)

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouarti14 Apr 2022 20:28 UTC
10 points
7 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

Yitz15 Apr 2022 17:26 UTC
61 points
49 comments1 min readLW link

Reflec­tions on My Own Miss­ing Mood

Lone Pine21 Apr 2022 16:19 UTC
52 points
25 comments5 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not Relevant17 Apr 2022 22:27 UTC
91 points
16 comments2 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Lone Pine25 Apr 2022 18:30 UTC
7 points
6 comments1 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

Kakili27 Apr 2022 22:07 UTC
10 points
2 comments8 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe Carlsmith8 May 2022 3:50 UTC
20 points
1 comment29 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
53 points
0 comments59 min readLW link

Agency As a Nat­u­ral Abstraction

Thane Ruthenis13 May 2022 18:02 UTC
55 points
9 comments13 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenz14 May 2022 16:01 UTC
34 points
0 comments1 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

16 May 2022 21:21 UTC
63 points
6 comments6 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

17 May 2022 15:26 UTC
26 points
0 comments3 min readLW link

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimony15 May 2022 23:05 UTC
57 points
27 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon Fischer21 Aug 2022 17:13 UTC
26 points
3 comments7 min readLW link

Re­shap­ing the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC
147 points
35 comments21 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy Gillen24 May 2022 23:10 UTC
9 points
2 comments10 min readLW link

A Story of AI Risk: In­struc­tGPT-N

peterbarnett26 May 2022 23:22 UTC
24 points
0 comments8 min readLW link

We will be around in 30 years

mukashi7 Jun 2022 3:47 UTC
12 points
205 comments2 min readLW link

Re­search Ques­tions from Stained Glass Windows

StefanHex8 Jun 2022 12:38 UTC
4 points
0 comments2 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane Ruthenis16 Jun 2022 22:00 UTC
23 points
4 comments18 min readLW link

A plau­si­ble story about AI risk.

DeLesley Hutchins10 Jun 2022 2:08 UTC
14 points
2 comments4 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleese10 Jun 2022 22:35 UTC
43 points
2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane Ruthenis11 Jun 2022 18:29 UTC
48 points
5 comments4 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

Nikita Sokolsky13 Jun 2022 18:26 UTC
136 points
72 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafety13 Jun 2022 19:24 UTC
−16 points
1 comment5 min readLW link

[Question] AI mis­al­ign­ment risk from GPT-like sys­tems?

fiso6419 Jun 2022 17:35 UTC
10 points
8 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
85 points
30 comments18 min readLW link

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGod25 Jun 2022 13:53 UTC
5 points
3 comments6 min readLW link

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

16 Jul 2022 12:57 UTC
84 points
132 comments3 min readLW link

Refram­ing the AI Risk

Thane Ruthenis1 Jul 2022 18:44 UTC
26 points
7 comments6 min readLW link

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanS2 Jul 2022 17:45 UTC
3 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben Smith3 Jul 2022 2:51 UTC
11 points
1 comment4 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. Murphy4 Jul 2022 1:25 UTC
35 points
12 comments1 min readLW link
(www.hsgac.senate.gov)

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermist4 Jul 2022 17:08 UTC
61 points
24 comments4 min readLW link
(forum.effectivealtruism.org)

Please help us com­mu­ni­cate AI xrisk. It could save the world.

otto.barten4 Jul 2022 21:47 UTC
4 points
7 comments2 min readLW link

When is it ap­pro­pri­ate to use statis­ti­cal mod­els and prob­a­bil­ities for de­ci­sion mak­ing ?

Younes Kamel5 Jul 2022 12:34 UTC
10 points
7 comments4 min readLW link
(youneskamel.substack.com)

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
50 points
0 comments1 min readLW link
(docs.google.com)

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane Ruthenis13 Jul 2022 20:23 UTC
47 points
16 comments4 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
58 points
8 comments20 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCeph19 Jul 2022 2:28 UTC
9 points
1 comment9 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane Ruthenis23 Jul 2022 19:27 UTC
24 points
1 comment12 min readLW link

Align­ment be­ing im­pos­si­ble might be bet­ter than it be­ing re­ally difficult

Martín Soto25 Jul 2022 23:57 UTC
13 points
2 comments2 min readLW link

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8res27 Jul 2022 3:21 UTC
170 points
38 comments6 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span11 Aug 2022 4:08 UTC
3 points
3 comments1 min readLW link

Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

Alex Lintz3 Aug 2022 23:15 UTC
22 points
0 comments12 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane Ruthenis4 Aug 2022 23:31 UTC
38 points
1 comment13 min readLW link

Com­plex­ity No Bar to AI (Or, why Com­pu­ta­tional Com­plex­ity mat­ters less than you think for real life prob­lems)

Noosphere897 Aug 2022 19:55 UTC
17 points
14 comments3 min readLW link
(www.gwern.net)

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworth10 Aug 2022 16:08 UTC
173 points
33 comments3 min readLW link1 review

Anti-squat­ted AI x-risk do­mains index

plex12 Aug 2022 12:01 UTC
56 points
6 comments1 min readLW link

In­fant AI Scenario

Nathan112312 Aug 2022 21:20 UTC
1 point
0 comments3 min readLW link

The Dumbest Pos­si­ble Gets There First

Artaxerxes13 Aug 2022 10:20 UTC
43 points
7 comments2 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane Ruthenis17 Aug 2022 18:47 UTC
42 points
14 comments1 min readLW link

Align­ment’s phlo­gis­ton

Eleni Angelou18 Aug 2022 22:27 UTC
10 points
2 comments2 min readLW link

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul Bricman20 Aug 2022 10:01 UTC
25 points
2 comments14 min readLW link

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

Algon20 Aug 2022 19:48 UTC
80 points
125 comments1 min readLW link

My Plan to Build Aligned Superintelligence

apollonianblues21 Aug 2022 13:16 UTC
18 points
7 comments8 min readLW link

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

Netcentrica21 Aug 2022 22:01 UTC
5 points
2 comments5 min readLW link

It Looks Like You’re Try­ing To Take Over The Narrative

George3d624 Aug 2022 13:36 UTC
3 points
20 comments9 min readLW link
(www.epistem.ink)

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane Ruthenis26 Aug 2022 18:49 UTC
30 points
1 comment6 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett Baker26 Aug 2022 18:26 UTC
120 points
48 comments1 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence Phillips27 Aug 2022 0:06 UTC
24 points
3 comments2 min readLW link
(www.metaculus.com)

Help Un­der­stand­ing Prefer­ences And Evil

Netcentrica27 Aug 2022 3:42 UTC
6 points
7 comments2 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh Englander27 Aug 2022 17:31 UTC
13 points
4 comments1 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane Ruthenis29 Aug 2022 18:37 UTC
13 points
2 comments3 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni Angelou3 Sep 2022 12:47 UTC
9 points
0 comments3 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworth30 Aug 2022 20:48 UTC
189 points
30 comments10 min readLW link1 review

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC
7 points
8 comments3 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubc2 Sep 2022 18:14 UTC
9 points
10 comments6 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhub2 Sep 2022 21:57 UTC
39 points
13 comments3 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardt5 Sep 2022 7:55 UTC
18 points
15 comments2 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford6 Sep 2022 17:17 UTC
6 points
0 comments4 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC
8 points
1 comment2 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
20 points
6 comments10 min readLW link

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul Bricman12 Sep 2022 12:00 UTC
6 points
0 comments14 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

11 Sep 2022 10:57 UTC
46 points
13 comments30 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Akash11 Sep 2022 23:43 UTC
44 points
7 comments6 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasper12 Sep 2022 19:07 UTC
97 points
7 comments2 min readLW link
(arxiv.org)

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul Bricman16 Sep 2022 14:45 UTC
30 points
0 comments16 min readLW link

Risk aver­sion and GPT-3

hatta_afiq13 Sep 2022 20:50 UTC
1 point
0 comments1 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGod14 Sep 2022 12:15 UTC
6 points
7 comments6 min readLW link

Emily Brontë on: Psy­chol­ogy Re­quired for Se­ri­ous™ AGI Safety Research

robertzk14 Sep 2022 14:47 UTC
2 points
0 comments1 min readLW link

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimony14 Sep 2022 17:19 UTC
14 points
6 comments1 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc1001414 Sep 2022 20:37 UTC
8 points
0 comments16 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilm15 Sep 2022 8:25 UTC
10 points
4 comments12 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
106 points
23 comments15 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

Fer32dwt34r3dfsz17 Sep 2022 22:27 UTC
17 points
4 comments1 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon Lang18 Sep 2022 13:08 UTC
44 points
3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John Nay18 Sep 2022 20:39 UTC
11 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

How to Train Your AGI Dragon

Oren Montano21 Sep 2022 22:28 UTC
−1 points
3 comments5 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

22 Sep 2022 13:55 UTC
22 points
0 comments27 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul Bricman23 Sep 2022 15:30 UTC
15 points
0 comments10 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
7 points
6 comments3 min readLW link

On Generality

Oren Montano26 Sep 2022 4:06 UTC
2 points
0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Oren Montano26 Sep 2022 4:06 UTC
0 points
0 comments1 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul Bricman30 Sep 2022 11:28 UTC
25 points
0 comments10 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC
17 points
2 comments12 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

elifland30 Sep 2022 12:21 UTC
67 points
0 comments3 min readLW link
(docs.google.com)

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

Krieger2 Oct 2022 9:53 UTC
8 points
6 comments1 min readLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc Carauleanu1 Oct 2022 1:49 UTC
10 points
0 comments1 min readLW link

Boolean Prim­i­tives for Cou­pled Optimizers

Paul Bricman7 Oct 2022 18:02 UTC
9 points
0 comments8 min readLW link

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael Glass5 Oct 2022 23:18 UTC
11 points
3 comments8 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Tech­nol­ogy Policy, USA (2022)

Fer32dwt34r3dfsz5 Oct 2022 16:42 UTC
9 points
4 comments2 min readLW link
(www.whitehouse.gov)

What does it mean for an AGI to be ‘safe’?

So8res7 Oct 2022 4:13 UTC
74 points
29 comments3 min readLW link

Pos­si­ble miracles

9 Oct 2022 18:17 UTC
64 points
33 comments8 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

12 Oct 2022 12:24 UTC
31 points
4 comments8 min readLW link
(www.gladstone.ai)

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul Bricman13 Oct 2022 12:36 UTC
13 points
8 comments7 min readLW link

Misal­ign­ment-by-de­fault in multi-agent systems

13 Oct 2022 15:38 UTC
19 points
8 comments20 min readLW link
(www.gladstone.ai)

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

14 Oct 2022 15:50 UTC
15 points
0 comments17 min readLW link
(www.gladstone.ai)

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio Franca11 Oct 2022 22:50 UTC
6 points
0 comments9 min readLW link

Nice­ness is unnatural

So8res13 Oct 2022 1:30 UTC
121 points
20 comments8 min readLW link1 review

Greed Is the Root of This Evil

Thane Ruthenis13 Oct 2022 20:40 UTC
18 points
7 comments8 min readLW link

Prov­ably Hon­est—A First Step

Srijanak De5 Nov 2022 19:18 UTC
10 points
2 comments8 min readLW link

[Question] How easy is it to su­per­vise pro­cesses vs out­comes?

Noosphere8918 Oct 2022 17:48 UTC
3 points
0 comments1 min readLW link

Re­sponse to Katja Grace’s AI x-risk counterarguments

19 Oct 2022 1:17 UTC
76 points
18 comments15 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
27 points
0 comments1 min readLW link
(github.com)

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDEL28 Oct 2022 1:53 UTC
−22 points
4 comments9 min readLW link

AI Re­searchers On AI Risk

Scott Alexander22 May 2015 11:16 UTC
19 points
0 comments16 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshev30 Oct 2022 16:57 UTC
9 points
0 comments10 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshev31 Oct 2022 17:03 UTC
7 points
4 comments14 min readLW link

AI as a Civ­i­liza­tional Risk Part 4/​6: Bioweapons and Philos­o­phy of Modification

PashaKamyshev1 Nov 2022 20:50 UTC
7 points
1 comment8 min readLW link

AI as a Civ­i­liza­tional Risk Part 5/​6: Re­la­tion­ship be­tween C-risk and X-risk

PashaKamyshev3 Nov 2022 2:19 UTC
2 points
0 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 6/​6: What can be done

PashaKamyshev3 Nov 2022 19:48 UTC
2 points
4 comments4 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porby29 Oct 2022 22:16 UTC
114 points
4 comments4 min readLW link

My (naive) take on Risks from Learned Optimization

Artyom Karpov31 Oct 2022 10:59 UTC
7 points
0 comments5 min readLW link

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
127 points
24 comments4 min readLW link1 review

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
74 points
4 comments25 min readLW link

a ca­sual in­tro to AI doom and alignment

Tamsin Leake1 Nov 2022 16:38 UTC
18 points
0 comments4 min readLW link
(carado.moe)

Why do we post our AI safety plans on the In­ter­net?

Peter S. Park3 Nov 2022 16:02 UTC
4 points
4 comments11 min readLW link

My sum­mary of “Prag­matic AI Safety”

Eleni Angelou5 Nov 2022 12:54 UTC
3 points
0 comments5 min readLW link

4 Key As­sump­tions in AI Safety

Prometheus7 Nov 2022 10:50 UTC
20 points
5 comments7 min readLW link

Loss of con­trol of AI is not a likely source of AI x-risk

squek7 Nov 2022 18:44 UTC
−6 points
0 comments5 min readLW link

AI Safety Un­con­fer­ence NeurIPS 2022

Orpheus7 Nov 2022 15:39 UTC
25 points
0 comments1 min readLW link
(aisafetyevents.org)

Value For­ma­tion: An Over­ar­ch­ing Model

Thane Ruthenis15 Nov 2022 17:16 UTC
34 points
20 comments34 min readLW link

Is AI Gain-of-Func­tion re­search a thing?

MadHatter12 Nov 2022 2:33 UTC
9 points
2 comments2 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
185 points
9 comments8 min readLW link

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane Ruthenis24 Nov 2022 17:06 UTC
17 points
5 comments9 min readLW link

Dis­cussing how to al­ign Trans­for­ma­tive AI if it’s de­vel­oped very soon

elifland28 Nov 2022 16:17 UTC
37 points
2 comments28 min readLW link

[Question] Is there any policy for a fair treat­ment of AIs whose friendli­ness is in doubt?

nahoj18 Nov 2022 19:01 UTC
15 points
9 comments1 min readLW link

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahoj3 Dec 2022 20:32 UTC
1 point
8 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC
−15 points
4 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella Duan4 Dec 2022 18:48 UTC
28 points
6 comments1 min readLW link

[Question] Who are some promi­nent rea­son­able peo­ple who are con­fi­dent that AI won’t kill ev­ery­one?

Optimization Process5 Dec 2022 9:12 UTC
72 points
54 comments1 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny Fernandez5 Dec 2022 15:19 UTC
19 points
5 comments7 min readLW link

Fore­sight for AGI Safety Strat­egy: Miti­gat­ing Risks and Iden­ti­fy­ing Golden Opportunities

jacquesthibs5 Dec 2022 16:09 UTC
28 points
6 comments8 min readLW link

AI Safety in a Vuln­er­a­ble World: Re­quest­ing Feed­back on Pre­limi­nary Thoughts

Jordan Arel6 Dec 2022 22:35 UTC
4 points
2 comments3 min readLW link

Fear miti­gated the nu­clear threat, can it do the same to AGI risks?

Igor Ivanov9 Dec 2022 10:04 UTC
6 points
8 comments5 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

11 Dec 2022 21:53 UTC
32 points
0 comments18 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC
28 points
3 comments20 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni Angelou19 Dec 2022 3:11 UTC
5 points
0 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

19 Dec 2022 21:31 UTC
63 points
28 comments10 min readLW link

New AI risk in­tro from Vox [link post]

JakubK21 Dec 2022 6:00 UTC
5 points
1 comment2 min readLW link
(www.vox.com)

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieve25 Dec 2022 20:14 UTC
3 points
6 comments1 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC
30 points
38 comments9 min readLW link

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

Can Rager26 Dec 2022 18:51 UTC
15 points
2 comments10 min readLW link
(forum.effectivealtruism.org)

In­tro­duc­tion: Bias in Eval­u­at­ing AGI X-Risks

27 Dec 2022 10:27 UTC
1 point
0 comments3 min readLW link

Mere ex­po­sure effect: Bias in Eval­u­at­ing AGI X-Risks

27 Dec 2022 14:05 UTC
0 points
2 comments1 min readLW link

Band­wagon effect: Bias in Eval­u­at­ing AGI X-Risks

28 Dec 2022 7:54 UTC
−1 points
0 comments1 min readLW link

In Defense of Wrap­per-Minds

Thane Ruthenis28 Dec 2022 18:28 UTC
23 points
38 comments3 min readLW link

Friendly and Un­friendly AGI are Indistinguishable

ErgoEcho29 Dec 2022 22:13 UTC
−4 points
4 comments4 min readLW link
(neologos.co)

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane Ruthenis29 Dec 2022 17:49 UTC
26 points
6 comments7 min readLW link

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram Rachum29 Dec 2022 16:08 UTC
15 points
0 comments1 min readLW link

Re­ac­tive de­val­u­a­tion: Bias in Eval­u­at­ing AGI X-Risks

30 Dec 2022 9:02 UTC
−15 points
9 comments1 min readLW link

Curse of knowl­edge and Naive re­al­ism: Bias in Eval­u­at­ing AGI X-Risks

31 Dec 2022 13:33 UTC
−7 points
1 comment1 min readLW link
(www.lesswrong.com)

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_c31 Dec 2022 11:34 UTC
7 points
5 comments1 min readLW link

Challenge to the no­tion that any­thing is (maybe) pos­si­ble with AGI

1 Jan 2023 3:57 UTC
−27 points
4 comments1 min readLW link
(mflb.com)

Sum­mary of 80k’s AI prob­lem profile

JakubK1 Jan 2023 7:30 UTC
7 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

Belief Bias: Bias in Eval­u­at­ing AGI X-Risks

2 Jan 2023 8:59 UTC
−10 points
1 comment1 min readLW link

Sta­tus quo bias; Sys­tem jus­tifi­ca­tion: Bias in Eval­u­at­ing AGI X-Risks

3 Jan 2023 2:50 UTC
−11 points
0 comments1 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMC4 Jan 2023 0:07 UTC
19 points
0 comments8 min readLW link

Illu­sion of truth effect and Am­bi­guity effect: Bias in Eval­u­at­ing AGI X-Risks

Remmelt5 Jan 2023 4:05 UTC
−13 points
2 comments1 min readLW link

AI Safety Camp, Vir­tual Edi­tion 2023

Linda Linsefors6 Jan 2023 11:09 UTC
40 points
10 comments3 min readLW link
(aisafety.camp)

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni Angelou6 Jan 2023 3:21 UTC
3 points
0 comments1 min readLW link

An­chor­ing fo­cal­ism and the Iden­ti­fi­able vic­tim effect: Bias in Eval­u­at­ing AGI X-Risks

Remmelt7 Jan 2023 9:59 UTC
1 point
2 comments1 min readLW link

Big list of AI safety videos

JakubK9 Jan 2023 6:12 UTC
11 points
2 comments1 min readLW link
(docs.google.com)

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

10 Jan 2023 16:06 UTC
83 points
8 comments39 min readLW link
(arxiv.org)

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_c13 Jan 2023 6:33 UTC
15 points
1 comment1 min readLW link

Reflec­tions on Trust­ing Trust & AI

Itay Yona16 Jan 2023 6:36 UTC
10 points
1 comment3 min readLW link
(mentaleap.ai)

OpenAI’s Align­ment Plan is not S.M.A.R.T.

Søren Elverlin18 Jan 2023 6:39 UTC
9 points
19 comments4 min readLW link

Gra­di­ent Filtering

18 Jan 2023 20:09 UTC
54 points
16 comments13 min readLW link

6-para­graph AI risk in­tro for MAISI

JakubK19 Jan 2023 9:22 UTC
11 points
0 comments2 min readLW link
(www.maisi.club)

List of tech­ni­cal AI safety ex­er­cises and projects

JakubK19 Jan 2023 9:35 UTC
40 points
5 comments1 min readLW link
(docs.google.com)

NYT: Google will “re­cal­ibrate” the risk of re­leas­ing AI due to com­pe­ti­tion with OpenAI

Michael Huang22 Jan 2023 8:38 UTC
47 points
2 comments1 min readLW link
(www.nytimes.com)

Next steps af­ter AGISF at UMich

JakubK25 Jan 2023 20:57 UTC
10 points
0 comments5 min readLW link
(docs.google.com)

What is the ground re­al­ity of coun­tries tak­ing steps to re­cal­ibrate AI de­vel­op­ment to­wards Align­ment first?

Nebuch29 Jan 2023 13:26 UTC
8 points
6 comments3 min readLW link

In­ter­views with 97 AI Re­searchers: Quan­ti­ta­tive Analysis

2 Feb 2023 1:01 UTC
23 points
0 comments7 min readLW link

[Question] What qual­ities does an AGI need to have to re­al­ize the risk of false vac­uum, with­out hard­cod­ing physics the­o­ries into it?

RationalSieve3 Feb 2023 16:00 UTC
1 point
4 comments1 min readLW link

Monthly Doom Ar­gu­ment Threads? Doom Ar­gu­ment Wiki?

LVSN4 Feb 2023 16:59 UTC
3 points
0 comments1 min readLW link

Se­cond call: CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram Rachum5 Feb 2023 12:18 UTC
2 points
0 comments2 min readLW link

Early situ­a­tional aware­ness and its im­pli­ca­tions, a story

Jacob Pfau6 Feb 2023 20:45 UTC
29 points
6 comments3 min readLW link

Is this a weak pivotal act: cre­at­ing nanobots that eat evil AGIs (but noth­ing else)?

Christopher King10 Feb 2023 19:26 UTC
0 points
3 comments1 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC
30 points
2 comments1 min readLW link

Near-Term Risks of an Obe­di­ent Ar­tifi­cial Intelligence

ymeskhout18 Feb 2023 18:30 UTC
20 points
1 comment6 min readLW link

Should we cry “wolf”?

Tapatakt18 Feb 2023 11:24 UTC
24 points
5 comments1 min readLW link

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC
114 points
9 comments1 min readLW link
(aiimpacts.org)

Nav­i­gat­ing pub­lic AI x-risk hype while pur­su­ing tech­ni­cal solutions

Dan Braun19 Feb 2023 12:22 UTC
18 points
0 comments2 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Christopher King20 Feb 2023 15:11 UTC
16 points
15 comments1 min readLW link

De­cep­tive Align­ment is <1% Likely by Default

DavidW21 Feb 2023 15:09 UTC
94 points
26 comments14 min readLW link

An­nounc­ing aisafety.training

JJ Hepburn21 Jan 2023 1:01 UTC
61 points
4 comments1 min readLW link

Is there a ML agent that aban­dons it’s util­ity func­tion out-of-dis­tri­bu­tion with­out los­ing ca­pa­bil­ities?

Christopher King22 Feb 2023 16:49 UTC
1 point
7 comments1 min readLW link

Au­to­mated Sand­wich­ing & Quan­tify­ing Hu­man-LLM Co­op­er­a­tion: ScaleOver­sight hackathon results

23 Feb 2023 10:48 UTC
8 points
0 comments6 min readLW link

Re­search pro­posal: Lev­er­ag­ing Jun­gian archetypes to cre­ate val­ues-based models

MiguelDev5 Mar 2023 17:39 UTC
5 points
2 comments2 min readLW link

Ret­ro­spec­tive on the 2022 Con­jec­ture AI Discussions

Andrea_Miotti24 Feb 2023 22:41 UTC
89 points
5 comments2 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
60 points
7 comments47 min readLW link

[Question] Pink Shog­goths: What does al­ign­ment look like in prac­tice?

Yuli_Ban25 Feb 2023 12:23 UTC
25 points
13 comments11 min readLW link

[Question] Would more model evals teams be good?

Ryan Kidd25 Feb 2023 22:01 UTC
20 points
4 comments1 min readLW link

Cu­ri­os­ity as a Solu­tion to AGI Alignment

Harsha G.26 Feb 2023 23:36 UTC
7 points
7 comments3 min readLW link

Ta­boo “hu­man-level in­tel­li­gence”

Sherrinford26 Feb 2023 20:42 UTC
12 points
7 comments1 min readLW link

The idea of an “al­igned su­per­in­tel­li­gence” seems misguided

ssadler27 Feb 2023 11:19 UTC
6 points
7 comments3 min readLW link
(ssadler.substack.com)

Tran­script: Yud­kowsky on Ban­kless fol­low-up Q&A

vonk28 Feb 2023 3:46 UTC
54 points
40 comments22 min readLW link

The bur­den of knowing

arisAlexis28 Feb 2023 18:40 UTC
5 points
0 comments2 min readLW link

Call for Cruxes by Rhyme, a Longter­mist His­tory Consultancy

Lara1 Mar 2023 18:39 UTC
1 point
0 comments3 min readLW link
(forum.effectivealtruism.org)

Reflec­tion Mechanisms as an Align­ment Tar­get—At­ti­tudes on “near-term” AI

2 Mar 2023 4:29 UTC
20 points
0 comments8 min readLW link

[Question] What are some sources re­lated to big-pic­ture AI strat­egy?

Jacob Watts2 Mar 2023 5:00 UTC
2 points
0 comments1 min readLW link

Con­scious­ness is ir­rele­vant—in­stead solve al­ign­ment by ask­ing this question

Oliver Siegel4 Mar 2023 22:06 UTC
−10 points
6 comments1 min readLW link

Why kill ev­ery­one?

arisAlexis5 Mar 2023 11:53 UTC
−3 points
5 comments2 min readLW link

Is it time to talk about AI dooms­day prep­ping yet?

bokov5 Mar 2023 21:17 UTC
0 points
6 comments1 min readLW link

Who Aligns the Align­ment Re­searchers?

Ben Smith5 Mar 2023 23:22 UTC
40 points
0 comments11 min readLW link

Cap Model Size for AI Safety

research_prime_space6 Mar 2023 1:11 UTC
0 points
4 comments1 min readLW link

In­tro­duc­ing AI Align­ment Inc., a Cal­ifor­nia pub­lic benefit cor­po­ra­tion...

TherapistAI7 Mar 2023 18:47 UTC
1 point
4 comments1 min readLW link

[Question] What‘s in your list of un­solved prob­lems in AI al­ign­ment?

jacquesthibs7 Mar 2023 18:58 UTC
60 points
9 comments1 min readLW link

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC
46 points
2 comments79 min readLW link
(futureoflife.org)

Why Un­con­trol­lable AI Looks More Likely Than Ever

8 Mar 2023 15:41 UTC
18 points
0 comments4 min readLW link
(time.com)

Speed run­ning ev­ery­one through the bad al­ign­ment bingo. $5k bounty for a LW con­ver­sa­tional agent

ArthurB9 Mar 2023 9:26 UTC
139 points
32 comments2 min readLW link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

Value drift threat models

Garrett Baker12 May 2023 23:03 UTC
27 points
4 comments5 min readLW link

Every­thing’s nor­mal un­til it’s not

Eleni Angelou10 Mar 2023 2:02 UTC
7 points
0 comments3 min readLW link

Is AI Safety drop­ping the ball on pri­vacy?

markov13 Sep 2023 13:07 UTC
50 points
17 comments7 min readLW link

The hu­man­ity’s biggest mistake

RomanS10 Mar 2023 16:30 UTC
0 points
1 comment2 min readLW link

On tak­ing AI risk se­ri­ously

Eleni Angelou13 Mar 2023 5:50 UTC
6 points
0 comments1 min readLW link
(www.nytimes.com)

We don’t trade with ants

KatjaGrace10 Jan 2023 23:50 UTC
264 points
108 comments7 min readLW link
(worldspiritsockpuppet.com)

Linkpost: A tale of 2.5 or­thog­o­nal­ity theses

DavidW13 Mar 2023 14:19 UTC
9 points
3 comments1 min readLW link
(forum.effectivealtruism.org)

Linkpost: A Con­tra AI FOOM Read­ing List

DavidW13 Mar 2023 14:45 UTC
25 points
4 comments1 min readLW link
(magnusvinding.com)

Linkpost: ‘Dis­solv­ing’ AI Risk – Pa­ram­e­ter Uncer­tainty in AI Fu­ture Forecasting

DavidW13 Mar 2023 16:52 UTC
6 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher King13 Mar 2023 18:21 UTC
1 point
8 comments1 min readLW link

A bet­ter anal­ogy and ex­am­ple for teach­ing AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC
18 points
0 comments5 min readLW link

Over­ton’s Basilisk

Alex Beyman15 Mar 2023 21:54 UTC
−20 points
0 comments5 min readLW link

New eco­nomic sys­tem for AI era

ksme sho17 Mar 2023 17:42 UTC
−1 points
1 comment5 min readLW link

Sur­vey on in­ter­me­di­ate goals in AI governance

17 Mar 2023 13:12 UTC
25 points
3 comments1 min readLW link

(re­tired ar­ti­cle) AGI With In­ter­net Ac­cess: Why we won’t stuff the ge­nie back in its bot­tle.

Max TK18 Mar 2023 3:43 UTC
5 points
10 comments4 min readLW link

The Answer

Alex Beyman19 Mar 2023 0:09 UTC
1 point
0 comments4 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons Not to Pre­serve (most of) Humanity

Alex Beyman22 Mar 2023 4:09 UTC
−15 points
6 comments19 min readLW link

Hu­man­ity’s Lack of Unity Will Lead to AGI Catastrophe

MiguelDev19 Mar 2023 19:18 UTC
3 points
2 comments4 min readLW link

[Question] Wouldn’t an in­tel­li­gent agent keep us al­ive and help us al­ign it­self to our val­ues in or­der to pre­vent risk ? by Risk I mean ex­per­i­men­ta­tion by try­ing to al­ign po­ten­tially smarter repli­cas?

Terrence Rotoufle21 Mar 2023 17:44 UTC
−3 points
1 comment2 min readLW link

[Question] What does pul­ling the fire alarm look like?

nem20 Mar 2023 21:45 UTC
2 points
0 comments1 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

18 Oct 2021 18:37 UTC
81 points
9 comments10 min readLW link

Biose­cu­rity and AI: Risks and Opportunities

Steve Newman27 Feb 2024 18:45 UTC
9 points
1 comment7 min readLW link
(www.safe.ai)

The con­ver­gent dy­namic we missed

Remmelt12 Dec 2023 23:19 UTC
2 points
2 comments1 min readLW link

Cor­po­rate Gover­nance for Fron­tier AI Labs: A Re­search Agenda

Matthew Wearden28 Feb 2024 11:29 UTC
4 points
0 comments16 min readLW link
(matthewwearden.co.uk)

Post se­ries on “Li­a­bil­ity Law for re­duc­ing Ex­is­ten­tial Risk from AI”

Nora_Ammann29 Feb 2024 4:39 UTC
42 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

In­cre­men­tal AI Risks from Proxy-Simulations

kmenou19 Dec 2023 18:56 UTC
2 points
0 comments1 min readLW link
(individual.utoronto.ca)

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

Open po­si­tions: Re­search An­a­lyst at the AI Stan­dards Lab

22 Dec 2023 16:31 UTC
17 points
0 comments1 min readLW link

“De­stroy hu­man­ity” as an im­me­di­ate subgoal

Seth Ahrenbach22 Dec 2023 18:52 UTC
3 points
13 comments3 min readLW link

AI safety ad­vo­cates should con­sider pro­vid­ing gen­tle push­back fol­low­ing the events at OpenAI

civilsociety22 Dec 2023 18:55 UTC
16 points
5 comments3 min readLW link

[Question] In­ves­ti­gat­ing Alter­na­tive Fu­tures: Hu­man and Su­per­in­tel­li­gence In­ter­ac­tion Scenarios

Hiroshi Yamakawa27 Dec 2023 18:19 UTC
−4 points
0 comments17 min readLW link

More Thoughts on the Hu­man-AGI War

Seth Ahrenbach27 Dec 2023 1:03 UTC
−3 points
4 comments7 min readLW link

I made a P(doom) calcu­la­tor for con­ve­nient Fermi estimation

Nicholas Kruus27 Dec 2023 18:22 UTC
1 point
0 comments5 min readLW link

Plan­ning to build a cryp­to­graphic box with perfect secrecy

Lysandre Terrisse31 Dec 2023 9:31 UTC
37 points
6 comments11 min readLW link

In­ves­ti­gat­ing Alter­na­tive Fu­tures: Hu­man and Su­per­in­tel­li­gence In­ter­ac­tion Scenarios

Hiroshi Yamakawa3 Jan 2024 23:46 UTC
1 point
0 comments17 min readLW link

Does AI care about re­al­ity or just its own per­cep­tion?

RedFishBlueFish5 Jan 2024 4:05 UTC
−5 points
8 comments1 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC
10 points
0 comments2 min readLW link
(www.youtube.com)

AI de­mands un­prece­dented reliability

Jono9 Jan 2024 16:30 UTC
22 points
5 comments2 min readLW link

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGrace6 Jan 2024 4:43 UTC
80 points
1 comment2 min readLW link

[Question] What’s the pro­to­col for if a novice has ML ideas that are un­likely to work, but might im­prove ca­pa­bil­ities if they do work?

drocta9 Jan 2024 22:51 UTC
6 points
2 comments2 min readLW link

Two Tales of AI Takeover: My Doubts

Violet Hour5 Mar 2024 15:51 UTC
26 points
6 comments29 min readLW link

The Un­der­re­ac­tion to OpenAI

Sherrinford18 Jan 2024 22:08 UTC
19 points
0 comments6 min readLW link

Brain­storm­ing: Slow Takeoff

David Piepgrass23 Jan 2024 6:58 UTC
2 points
0 comments51 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC
1 point
0 comments1 min readLW link

An­nounc­ing Con­ver­gence Anal­y­sis: An In­sti­tute for AI Sce­nario & Gover­nance Research

7 Mar 2024 21:37 UTC
22 points
1 comment4 min readLW link

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

StellaAthena25 Jan 2024 19:17 UTC
94 points
14 comments1 min readLW link
(www.rand.org)

What Failure Looks Like is not an ex­is­ten­tial risk (and al­ign­ment is not the solu­tion)

otto.barten2 Feb 2024 18:59 UTC
13 points
12 comments9 min readLW link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

2 Feb 2024 23:17 UTC
94 points
0 comments9 min readLW link

Why I think it’s net harm­ful to do tech­ni­cal safety re­search at AGI labs

Remmelt7 Feb 2024 4:17 UTC
26 points
24 comments1 min readLW link

Sce­nario plan­ning for AI x-risk

Corin Katzke10 Feb 2024 0:14 UTC
20 points
9 comments14 min readLW link
(forum.effectivealtruism.org)

Carl Shul­man On Dwarkesh Pod­cast June 2023

Moonicker11 Feb 2024 21:02 UTC
12 points
0 comments159 min readLW link

Tort Law Can Play an Im­por­tant Role in Miti­gat­ing AI Risk

Gabriel Weil12 Feb 2024 17:17 UTC
37 points
9 comments5 min readLW link

The Astro­nom­i­cal Sacri­fice Dilemma

Matthew McRedmond11 Mar 2024 19:58 UTC
13 points
3 comments4 min readLW link

AI Reg­u­la­tory Land­scape Re­view: In­ci­dent Reporting

11 Mar 2024 21:03 UTC
14 points
0 comments6 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouarti21 Aug 2020 23:25 UTC
8 points
10 comments1 min readLW link

The Shut­down Prob­lem: In­com­plete Prefer­ences as a Solution

EJT23 Feb 2024 16:01 UTC
47 points
6 comments41 min readLW link

Con­trol­ling AGI Risk

TeaSea15 Mar 2024 4:56 UTC
6 points
7 comments4 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo 8 Nov 2023 20:10 UTC
1 point
0 comments8 min readLW link

Grey Goo Re­quires AI

harsimony15 Jan 2021 4:45 UTC
8 points
11 comments4 min readLW link
(harsimony.wordpress.com)

Ca­pa­bil­ities De­nial: The Danger of Un­der­es­ti­mat­ing AI

Christopher King21 Mar 2023 1:24 UTC
6 points
5 comments3 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher King21 Mar 2023 3:53 UTC
−1 points
2 comments9 min readLW link

[Question] Em­ployer con­sid­er­ing part­ner­ing with ma­jor AI labs. What to do?

GraduallyMoreAgitated21 Mar 2023 17:43 UTC
37 points
7 comments2 min readLW link

Key Ques­tions for Digi­tal Minds

Jacy Reese Anthis22 Mar 2023 17:13 UTC
22 points
0 comments7 min readLW link
(www.sentienceinstitute.org)

Why We MUST Create an AGI that Disem­pow­ers Hu­man­ity. For Real.

twkaiser22 Mar 2023 23:01 UTC
−17 points
1 comment4 min readLW link

ChatGPT’s “fuzzy al­ign­ment” isn’t ev­i­dence of AGI al­ign­ment: the ba­nana test

Michael Tontchev23 Mar 2023 7:12 UTC
23 points
6 comments4 min readLW link

Limit in­tel­li­gent weapons

Lucas Pfeifer23 Mar 2023 17:54 UTC
−11 points
36 comments1 min readLW link

GPT-4 al­ign­ing with aca­sual de­ci­sion the­ory when in­structed to play games, but in­cludes a CDT ex­pla­na­tion that’s in­cor­rect if they differ

Christopher King23 Mar 2023 16:16 UTC
7 points
4 comments8 min readLW link

con­tinue work­ing on hard al­ign­ment! don’t give up!

Tamsin Leake24 Mar 2023 0:14 UTC
82 points
45 comments1 min readLW link
(carado.moe)

Grind­ing slimes in the dun­geon of AI al­ign­ment research

Max H24 Mar 2023 4:51 UTC
10 points
2 comments4 min readLW link

Does GPT-4 ex­hibit agency when sum­ma­riz­ing ar­ti­cles?

Christopher King24 Mar 2023 15:49 UTC
16 points
2 comments5 min readLW link

More ex­per­i­ments in GPT-4 agency: writ­ing memos

Christopher King24 Mar 2023 17:51 UTC
5 points
2 comments10 min readLW link

ChatGPT Plu­g­ins—The Begin­ning of the End

Bary Levy25 Mar 2023 11:45 UTC
15 points
4 comments1 min readLW link

[Question] How Poli­tics in­ter­acts with AI ?

qbolec26 Mar 2023 9:53 UTC
−18 points
4 comments1 min readLW link

What can we learn from Lex Frid­man’s in­ter­view with Sam Alt­man?

Karl von Wendt27 Mar 2023 6:27 UTC
56 points
22 comments9 min readLW link

Half-baked al­ign­ment idea

ozb28 Mar 2023 17:47 UTC
6 points
27 comments1 min readLW link

Adapt­ing to Change: Over­com­ing Chronos­ta­sis in AI Lan­guage Models

RationalMindset28 Mar 2023 14:32 UTC
−1 points
0 comments6 min readLW link

I had a chat with GPT-4 on the fu­ture of AI and AI safety

Kristian Freed28 Mar 2023 17:47 UTC
1 point
0 comments8 min readLW link

“Un­in­ten­tional AI safety re­search”: Why not sys­tem­at­i­cally mine AI tech­ni­cal re­search for safety pur­poses?

ghostwheel29 Mar 2023 15:56 UTC
27 points
3 comments6 min readLW link

I made AI Risk Propaganda

monkymind29 Mar 2023 14:26 UTC
−3 points
0 comments1 min readLW link

“Sorcerer’s Ap­pren­tice” from Fan­ta­sia as an anal­ogy for alignment

awg29 Mar 2023 18:21 UTC
7 points
4 comments1 min readLW link
(video.disney.com)

Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibs29 Mar 2023 23:16 UTC
298 points
296 comments3 min readLW link
(time.com)

Align­ment—Path to AI as ally, not slave nor foe

ozb30 Mar 2023 14:54 UTC
10 points
3 comments2 min readLW link

Wi­den­ing Over­ton Win­dow—Open Thread

Prometheus31 Mar 2023 10:03 UTC
23 points
8 comments1 min readLW link

Why Yud­kowsky Is Wrong And What He Does Can Be More Dangerous

idontagreewiththat6 Jun 2023 17:59 UTC
−40 points
3 comments3 min readLW link

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher King31 Mar 2023 17:05 UTC
6 points
4 comments4 min readLW link

Imag­ine a world where Microsoft em­ploy­ees used Bing

Christopher King31 Mar 2023 18:36 UTC
6 points
2 comments2 min readLW link

Is AGI suici­dal­ity the golden ray of hope?

Alex Kirko4 Apr 2023 23:29 UTC
−18 points
4 comments1 min readLW link

AI com­mu­nity build­ing: EliezerKart

Christopher King1 Apr 2023 15:25 UTC
45 points
0 comments2 min readLW link

Pes­simism about AI Safety

2 Apr 2023 7:43 UTC
4 points
1 comment25 min readLW link

AI Safety via Luck

Jozdien1 Apr 2023 20:13 UTC
74 points
6 comments11 min readLW link

The AI gov­er­nance gaps in de­vel­op­ing countries

nguyên17 Jun 2023 2:50 UTC
19 points
1 comment14 min readLW link

Ex­plor­ing non-an­thro­pocen­tric as­pects of AI ex­is­ten­tial safety

mishka3 Apr 2023 18:07 UTC
8 points
0 comments3 min readLW link

Steer­ing systems

Max H4 Apr 2023 0:56 UTC
50 points
1 comment15 min readLW link

ICA Simulacra

Ozyrus5 Apr 2023 6:41 UTC
26 points
2 comments7 min readLW link

Against sac­ri­fic­ing AI trans­parency for gen­er­al­ity gains

Ape in the coat7 May 2023 6:52 UTC
3 points
0 comments2 min readLW link

Su­per­in­tel­li­gence will out­smart us or it isn’t superintelligence

Neil 3 Apr 2023 15:01 UTC
−7 points
4 comments1 min readLW link

Do we have a plan for the “first crit­i­cal try” prob­lem?

Christopher King3 Apr 2023 16:27 UTC
−3 points
14 comments1 min readLW link

Towards em­pa­thy in RL agents and be­yond: In­sights from cog­ni­tive sci­ence for AI Align­ment

Marc Carauleanu3 Apr 2023 19:59 UTC
15 points
6 comments1 min readLW link
(clipchamp.com)

[Question] Does it be­come eas­ier, or harder, for the world to co­or­di­nate around not build­ing AGI as time goes on?

Eli Tyre29 Jul 2019 22:59 UTC
86 points
31 comments3 min readLW link2 reviews

Strate­gies to Prevent AI Annihilation

lastchanceformankind4 Apr 2023 8:59 UTC
−2 points
0 comments4 min readLW link

[Question] Isn’t safe AGI im­pos­si­ble?

lefoenix5 Apr 2023 4:01 UTC
1 point
0 comments1 min readLW link

AGI de­ploy­ment as an act of aggression

dr_s5 Apr 2023 6:39 UTC
27 points
29 comments13 min readLW link

[Question] Daisy-chain­ing ep­silon-step verifiers

Decaeneus6 Apr 2023 2:07 UTC
2 points
1 comment1 min readLW link

One Does Not Sim­ply Re­place the Hu­mans

JerkyTreats6 Apr 2023 20:56 UTC
9 points
3 comments4 min readLW link
(www.lesswrong.com)

OpenAI: Our ap­proach to AI safety

g-w15 Apr 2023 20:26 UTC
1 point
1 comment1 min readLW link
(openai.com)

Willi­ams-Beuren Syn­drome: Frendly Mutations

Takk5 Apr 2023 20:59 UTC
−1 points
1 comment1 min readLW link

Yoshua Ben­gio: “Slow­ing down de­vel­op­ment of AI sys­tems pass­ing the Tur­ing test”

Roman Leventov6 Apr 2023 3:31 UTC
49 points
2 comments5 min readLW link
(yoshuabengio.org)

Risks from GPT-4 Byproduct of Re­cur­sively Op­ti­miz­ing AIs

ben hayum7 Apr 2023 0:02 UTC
73 points
9 comments10 min readLW link
(forum.effectivealtruism.org)

A decade of lurk­ing, a month of posting

Max H9 Apr 2023 0:21 UTC
70 points
4 comments5 min readLW link

Align­ment of Au­toGPT agents

Ozyrus12 Apr 2023 12:54 UTC
14 points
1 comment4 min readLW link

Why I’m not wor­ried about im­mi­nent doom

Ariel Kwiatkowski10 Apr 2023 15:31 UTC
6 points
2 comments4 min readLW link

Mea­sur­ing ar­tifi­cial in­tel­li­gence on hu­man bench­marks is naive

Anomalous11 Apr 2023 11:34 UTC
11 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

In fa­vor of ac­cel­er­at­ing prob­lems you’re try­ing to solve

Christopher King11 Apr 2023 18:15 UTC
2 points
2 comments4 min readLW link

AI Risk US Pres­i­den­tial Candidate

Simon Berens11 Apr 2023 19:31 UTC
5 points
3 comments1 min readLW link

Open-source LLMs may prove Bostrom’s vuln­er­a­ble world hypothesis

Roope Ahvenharju15 Apr 2023 19:16 UTC
1 point
1 comment1 min readLW link

Ar­tifi­cial In­tel­li­gence as exit strat­egy from the age of acute ex­is­ten­tial risk

Arturo Macias12 Apr 2023 14:48 UTC
−7 points
15 comments7 min readLW link

AGI goal space is big, but nar­row­ing might not be as hard as it seems.

Jacy Reese Anthis12 Apr 2023 19:03 UTC
15 points
0 comments3 min readLW link

Pol­lut­ing the agen­tic commons

hamandcheese13 Apr 2023 17:42 UTC
7 points
4 comments2 min readLW link
(www.secondbest.ca)

The Virus—Short Story

Michael Soareverix13 Apr 2023 18:18 UTC
4 points
0 comments4 min readLW link

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman Yen13 May 2023 18:38 UTC
6 points
3 comments9 min readLW link

Spec­u­la­tion on map­ping the moral land­scape for fu­ture Ai Alignment

Sven Heinz (Welwordion)16 Apr 2023 13:43 UTC
1 point
0 comments1 min readLW link

On ur­gency, pri­or­ity and col­lec­tive re­ac­tion to AI-Risks: Part I

Denreik16 Apr 2023 19:14 UTC
−10 points
15 comments5 min readLW link

AGI Clinics: A Safe Haven for Hu­man­ity’s First En­coun­ters with Superintelligence

portr.17 Apr 2023 1:52 UTC
−5 points
1 comment1 min readLW link

Defin­ing Boundaries on Out­comes

Takk7 Jun 2023 17:41 UTC
1 point
0 comments1 min readLW link

No, re­ally, it pre­dicts next to­kens.

simon18 Apr 2023 3:47 UTC
58 points
37 comments3 min readLW link

Pre­dic­tion: any un­con­trol­lable AI will turn earth into a gi­ant computer

Karl von Wendt17 Apr 2023 12:30 UTC
9 points
8 comments3 min readLW link

What is your timelines for ADI (ar­tifi­cial dis­em­pow­er­ing in­tel­li­gence)?

Christopher King17 Apr 2023 17:01 UTC
3 points
3 comments2 min readLW link

Green goo is plausible

anithite18 Apr 2023 0:04 UTC
57 points
29 comments4 min readLW link

World and Mind in Ar­tifi­cial In­tel­li­gence: ar­gu­ments against the AI pause

Arturo Macias18 Apr 2023 14:40 UTC
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

18 Apr 2023 18:44 UTC
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

I Believe I Know Why AI Models Hallucinate

Richard Aragon19 Apr 2023 21:07 UTC
−10 points
6 comments7 min readLW link
(turingssolutions.com)

[Cross­post] Or­ga­niz­ing a de­bate with ex­perts and MPs to raise AI xrisk aware­ness: a pos­si­ble blueprint

otto.barten19 Apr 2023 11:45 UTC
8 points
0 comments4 min readLW link
(forum.effectivealtruism.org)

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver Siegel19 Apr 2023 20:13 UTC
−1 points
0 comments1 min readLW link

[Question] Is there any liter­a­ture on us­ing so­cial­iza­tion for AI al­ign­ment?

Nathan112319 Apr 2023 22:16 UTC
10 points
9 comments2 min readLW link

How does AI Risk Affect the Si­mu­la­tion Hy­poth­e­sis?

amelia20 Apr 2023 3:16 UTC
6 points
9 comments2 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC
11 points
3 comments1 min readLW link
(github.com)

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher King20 Apr 2023 19:57 UTC
2 points
7 comments3 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubK21 Apr 2023 10:07 UTC
13 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarks22 Apr 2023 14:36 UTC
39 points
7 comments6 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dov23 Apr 2023 5:34 UTC
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Paths to failure

25 Apr 2023 8:03 UTC
29 points
1 comment8 min readLW link

A con­cise sum-up of the ba­sic ar­gu­ment for AI doom

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC
11 points
6 comments2 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne B27 Apr 2023 18:05 UTC
0 points
5 comments3 min readLW link

Mak­ing Nanobots isn’t a one-shot pro­cess, even for an ar­tifi­cial superintelligance

dankrad25 Apr 2023 0:39 UTC
20 points
13 comments6 min readLW link

My Assess­ment of the Chi­nese AI Safety Community

Lao Mein25 Apr 2023 4:21 UTC
244 points
93 comments3 min readLW link

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 14:47 UTC
48 points
2 comments2 min readLW link

Free­dom Is All We Need

Leo Glisic27 Apr 2023 0:09 UTC
−1 points
8 comments10 min readLW link

Hal­loween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC
−10 points
1 comment1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.barten24 Oct 2023 10:11 UTC
17 points
1 comment1 min readLW link

[Question] What if AGI had its own uni­verse to maybe wreck?

mseale26 Oct 2023 17:49 UTC
−1 points
2 comments1 min readLW link

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_c25 Oct 2023 23:46 UTC
114 points
33 comments22 min readLW link
(www.navigatingrisks.ai)

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal Naran26 Oct 2023 17:54 UTC
−7 points
2 comments2 min readLW link

[un­ti­tled post]

NeuralSystem_e5e127 Apr 2023 17:37 UTC
3 points
0 comments1 min readLW link

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew Wearden30 Oct 2023 17:27 UTC
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

Char­bel-Raphaël and Lu­cius dis­cuss Interpretability

30 Oct 2023 5:50 UTC
104 points
7 comments21 min readLW link

An In­ter­na­tional Man­hat­tan Pro­ject for Ar­tifi­cial Intelligence

Glenn Clayton27 Apr 2023 17:34 UTC
−11 points
2 comments5 min readLW link

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC
−19 points
11 comments2 min readLW link

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbance2 Nov 2023 17:43 UTC
82 points
89 comments5 min readLW link

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
260 points
40 comments3 min readLW link

AI as Su­per-Demagogue

RationalDino5 Nov 2023 21:21 UTC
−2 points
9 comments9 min readLW link

Sym­biotic self-al­ign­ment of AIs.

Spiritus Dei7 Nov 2023 17:18 UTC
1 point
0 comments3 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

The So­cial Align­ment Problem

irving28 Apr 2023 14:16 UTC
98 points
13 comments8 min readLW link

Do you want a first-prin­ci­pled pre­pared­ness guide to pre­pare your­self and loved ones for po­ten­tial catas­tro­phes?

Ulrik Horn14 Nov 2023 12:13 UTC
15 points
5 comments15 min readLW link

[Question] Real­is­tic near-fu­ture sce­nar­ios of AI doom un­der­stand­able for non-techy peo­ple?

RomanS28 Apr 2023 14:45 UTC
4 points
4 comments1 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir Kumar16 Nov 2023 2:02 UTC
1 point
0 comments1 min readLW link

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg Weiss16 Nov 2023 18:46 UTC
11 points
2 comments29 min readLW link

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMoshe17 Nov 2023 11:14 UTC
23 points
5 comments3 min readLW link

Killswitch

Junio18 Nov 2023 22:53 UTC
2 points
0 comments3 min readLW link

Ilya: The AI sci­en­tist shap­ing the world

David Varga20 Nov 2023 13:09 UTC
11 points
0 comments4 min readLW link

A Guide to Fore­cast­ing AI Science Ca­pa­bil­ities

Eleni Angelou29 Apr 2023 23:24 UTC
6 points
1 comment4 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDAS25 Nov 2023 2:01 UTC
18 points
6 comments1 min readLW link

AISC 2024 - Pro­ject Summaries

NickyP27 Nov 2023 22:32 UTC
47 points
3 comments18 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreig29 Nov 2023 13:59 UTC
4 points
0 comments5 min readLW link

Sup­port me in a Week-Long Pick­et­ing Cam­paign Near OpenAI’s HQ: Seek­ing Sup­port and Ideas from the LessWrong Community

Percy30 Apr 2023 17:48 UTC
−26 points
15 comments1 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC
188 points
53 comments13 min readLW link

The benefits and risks of op­ti­mism (about AI safety)

Karl von Wendt3 Dec 2023 12:45 UTC
−11 points
6 comments5 min readLW link

A call for a quan­ti­ta­tive re­port card for AI bioter­ror­ism threat models

Juno4 Dec 2023 6:35 UTC
12 points
0 comments10 min readLW link

[Question] Ac­cu­racy of ar­gu­ments that are seen as ridicu­lous and in­tu­itively false but don’t have good counter-arguments

Christopher King29 Apr 2023 23:58 UTC
30 points
39 comments1 min readLW link

Call for sub­mis­sions: Choice of Fu­tures sur­vey questions

c.trout30 Apr 2023 6:59 UTC
4 points
0 comments2 min readLW link
(airtable.com)

Ac­cess to AI: a hu­man right?

dmtea25 Jul 2020 9:38 UTC
5 points
3 comments2 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
112 points
9 comments56 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog8 Jul 2013 14:33 UTC
17 points
8 comments1 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog13 Nov 2012 21:56 UTC
63 points
92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepa1 Apr 2018 1:22 UTC
46 points
14 comments2 min readLW link

When is un­al­igned AI morally valuable?

paulfchristiano25 May 2018 1:57 UTC
74 points
53 comments10 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

29 Oct 2018 21:07 UTC
86 points
8 comments6 min readLW link

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTrout3 Jun 2018 3:16 UTC
76 points
7 comments8 min readLW link

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooen9 Nov 2018 20:57 UTC
70 points
9 comments4 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei Dai12 Jan 2019 2:19 UTC
100 points
22 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei Dai17 Aug 2019 17:38 UTC
77 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
98 points
24 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

null30 Apr 2018 13:23 UTC
35 points
10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_Armstrong22 May 2019 21:35 UTC
75 points
7 comments1 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks24 Dec 2017 18:52 UTC
41 points
5 comments23 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
91 points
7 comments9 min readLW link

I Vouch For MIRI

Zvi17 Dec 2017 17:50 UTC
38 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_it18 Jan 2018 15:07 UTC
39 points
10 comments1 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi12 Mar 2018 12:10 UTC
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
68 points
38 comments2 min readLW link

A rant against robots

Lê Nguyên Hoang14 Jan 2020 22:03 UTC
65 points
7 comments5 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex Flint31 Mar 2018 18:37 UTC
30 points
3 comments11 min readLW link

Course recom­men­da­tions for Friendli­ness researchers

Louie9 Jan 2013 14:33 UTC
96 points
112 comments10 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC
29 points
11 comments8 min readLW link

AI Sum­mer Fel­lows Program

colm21 Mar 2018 15:32 UTC
21 points
0 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
120 points
495 comments8 min readLW link

[Question] Does agency nec­es­sar­ily im­ply self-preser­va­tion in­stinct?

Mislav Jurić1 May 2023 16:06 UTC
5 points
8 comments1 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC
62 points
16 comments5 min readLW link

The sim­ple pic­ture on AI safety

Alex Flint27 May 2018 19:43 UTC
31 points
10 comments2 min readLW link

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

1 May 2023 16:47 UTC
93 points
10 comments30 min readLW link

Elon Musk donates $10M to the Fu­ture of Life In­sti­tute to keep AI benefi­cial

Paul Crowley15 Jan 2015 16:33 UTC
78 points
52 comments1 min readLW link

Strate­gic im­pli­ca­tions of AIs’ abil­ity to co­or­di­nate at low cost, for ex­am­ple by merging

Wei Dai25 Apr 2019 5:08 UTC
67 points
46 comments2 min readLW link1 review

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana Kumar21 Jun 2019 12:50 UTC
43 points
6 comments1 min readLW link
(arxiv.org)

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogs15 May 2018 20:11 UTC
17 points
12 comments1 min readLW link
(www.theatlantic.com)

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_Armstrong10 Oct 2017 11:06 UTC
23 points
8 comments1 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]23 Jul 2014 0:24 UTC
62 points
28 comments3 min readLW link

AI Align­ment Re­search Overview (by Ja­cob Stein­hardt)

Ben Pace6 Nov 2019 19:24 UTC
44 points
0 comments7 min readLW link
(docs.google.com)

A gen­eral model of safety-ori­ented AI development

Wei Dai11 Jun 2018 21:00 UTC
65 points
8 comments1 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei Dai10 Sep 2019 8:29 UTC
48 points
26 comments3 min readLW link

AI Safety Newslet­ter #4: AI and Cy­ber­se­cu­rity, Per­sua­sive AIs, Weaponiza­tion, and Ge­offrey Hin­ton talks AI risks

2 May 2023 18:41 UTC
32 points
0 comments5 min readLW link
(newsletter.safe.ai)

Avert­ing Catas­tro­phe: De­ci­sion The­ory for COVID-19, Cli­mate Change, and Po­ten­tial Disasters of All Kinds

JakubK2 May 2023 22:50 UTC
10 points
0 comments1 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong7 Apr 2014 11:00 UTC
77 points
418 comments7 min readLW link

Reg­u­late or Com­pete? The China Fac­tor in U.S. AI Policy (NAIR #2)

charles_m5 May 2023 17:43 UTC
2 points
1 comment7 min readLW link
(navigatingairisks.substack.com)

But What If We Ac­tu­ally Want To Max­i­mize Paper­clips?

snerx25 May 2023 7:13 UTC
−17 points
6 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_Armstrong29 Jun 2015 20:41 UTC
68 points
45 comments2 min readLW link

For­mal­iz­ing the “AI x-risk is un­likely be­cause it is ridicu­lous” argument

Christopher King3 May 2023 18:56 UTC
47 points
17 comments3 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergal31 Oct 2019 12:10 UTC
40 points
58 comments1 min readLW link
(aiimpacts.org)

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

We don’t need AGI for an amaz­ing future

Karl von Wendt4 May 2023 12:10 UTC
18 points
32 comments5 min readLW link

[Question] Why not use ac­tive SETI to pre­vent AI Doom?

RomanS5 May 2023 14:41 UTC
13 points
13 comments1 min readLW link

CHAT Di­plo­macy: LLMs and Na­tional Security

JohnBuridan5 May 2023 19:45 UTC
25 points
5 comments7 min readLW link

The Mag­ni­tude of His Own Folly

Eliezer Yudkowsky30 Sep 2008 11:31 UTC
97 points
127 comments6 min readLW link

AI al­ign­ment landscape

paulfchristiano13 Oct 2019 2:10 UTC
40 points
3 comments1 min readLW link
(ai-alignment.com)

Launched: Friend­ship is Optimal

iceman15 Nov 2012 4:57 UTC
77 points
32 comments1 min readLW link

Friend­ship is Op­ti­mal: A My Lit­tle Pony fan­fic about an op­ti­miza­tion process

iceman8 Sep 2012 6:16 UTC
109 points
152 comments1 min readLW link

Is “red” for GPT-4 the same as “red” for you?

Yusuke Hayashi6 May 2023 17:55 UTC
9 points
6 comments2 min readLW link

Oh, Think of the Bananas

Jeffs1 Jun 2023 6:46 UTC
3 points
0 comments2 min readLW link

Do Earths with slower eco­nomic growth have a bet­ter chance at FAI?

Eliezer Yudkowsky12 Jun 2013 19:54 UTC
59 points
175 comments4 min readLW link

TED talk by Eliezer Yud­kowsky: Un­leash­ing the Power of Ar­tifi­cial Intelligence

bayesed7 May 2023 5:45 UTC
49 points
36 comments1 min readLW link
(www.youtube.com)

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
2 comments7 min readLW link
(yoshuabengio.org)

H-JEPA might be tech­ni­cally al­ignable in a mod­ified form

Roman Leventov8 May 2023 23:04 UTC
12 points
2 comments7 min readLW link

[Question] How much of a con­cern are open-source LLMs in the short, medium and long terms?

JavierCC10 May 2023 9:14 UTC
5 points
0 comments1 min readLW link

Idea: Open Ac­cess AI Safety Journal

Gordon Seidoh Worley23 Mar 2018 18:27 UTC
28 points
11 comments1 min readLW link

AGI-Au­to­mated In­ter­pretabil­ity is Suicide

__RicG__10 May 2023 14:20 UTC
23 points
33 comments7 min readLW link

[Question] Is “brit­tle al­ign­ment” good enough?

the8thbit23 May 2023 17:35 UTC
9 points
5 comments3 min readLW link

[Question] AI in­ter­pretabil­ity could be harm­ful?

Roman Leventov10 May 2023 20:43 UTC
13 points
2 comments1 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo Putera11 May 2023 4:16 UTC
11 points
1 comment3 min readLW link

A more grounded idea of AI risk

Iknownothing11 May 2023 9:48 UTC
3 points
4 comments1 min readLW link

Separat­ing the “con­trol prob­lem” from the “al­ign­ment prob­lem”

Yi-Yang11 May 2023 9:41 UTC
12 points
1 comment4 min readLW link

Align­ment, Goals, and The Gut-Head Gap: A Re­view of Ngo. et al.

Violet Hour11 May 2023 18:06 UTC
20 points
2 comments13 min readLW link

[Question] Term/​Cat­e­gory for AI with Neu­tral Im­pact?

isomic11 May 2023 22:00 UTC
6 points
1 comment1 min readLW link

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver Sourbut15 May 2023 13:23 UTC
26 points
10 comments12 min readLW link
(www.oliversourbut.net)

For­mu­lat­ing the AI Doom Ar­gu­ment for An­a­lytic Philosophers

JonathanErhardt12 May 2023 7:54 UTC
13 points
0 comments2 min readLW link

The way AGI wins could look very stupid

Christopher King12 May 2023 16:34 UTC
42 points
22 comments1 min readLW link

G.K. Ch­ester­ton On AI Risk

Scott Alexander1 Apr 2017 19:00 UTC
17 points
0 comments7 min readLW link

PCAST Work­ing Group on Gen­er­a­tive AI In­vites Public Input

Christopher King13 May 2023 22:49 UTC
7 points
0 comments1 min readLW link
(terrytao.wordpress.com)

Co­or­di­na­tion by com­mon knowl­edge to pre­vent un­con­trol­lable AI

Karl von Wendt14 May 2023 13:37 UTC
10 points
2 comments9 min readLW link

[Question] What pro­jects and efforts are there to pro­mote AI safety re­search?

Christopher King24 May 2023 0:33 UTC
4 points
0 comments1 min readLW link

AI Risk & Policy Fore­casts from Me­tac­u­lus & FLI’s AI Path­ways Workshop

_will_16 May 2023 18:06 UTC
11 points
4 comments8 min readLW link

Tyler Cowen’s challenge to de­velop an ‘ac­tual math­e­mat­i­cal model’ for AI X-Risk

Joe Brenton16 May 2023 11:57 UTC
6 points
4 comments1 min readLW link

GPT as an “In­tel­li­gence Fork­lift.”

boazbarak19 May 2023 21:15 UTC
46 points
27 comments3 min readLW link

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher King16 May 2023 15:18 UTC
22 points
6 comments2 min readLW link

[un­ti­tled post]

[Error communicating with LW2 server]20 May 2023 3:08 UTC
1 point
0 comments1 min readLW link

Con­fu­sions and up­dates on STEM AI

Eleni Angelou19 May 2023 21:34 UTC
21 points
0 comments3 min readLW link

A&I (Rihanna ‘S&M’ par­ody lyrics)

nahoj21 May 2023 22:34 UTC
−3 points
0 comments2 min readLW link

We Shouldn’t Ex­pect AI to Ever be Fully Rational

OneManyNone18 May 2023 17:09 UTC
19 points
31 comments6 min readLW link

[Cross­post] A re­cent write-up of the case for AI (ex­is­ten­tial) risk

Timsey18 May 2023 13:13 UTC
6 points
0 comments19 min readLW link

The Po­lar­ity Prob­lem [Draft]

23 May 2023 21:05 UTC
24 points
3 comments44 min readLW link

A flaw in the A.G.I. Ruin Argument

Cole Wyeth19 May 2023 19:40 UTC
1 point
6 comments3 min readLW link
(colewyeth.com)

The Friendly AI Game

bentarm15 Mar 2011 16:45 UTC
50 points
178 comments1 min readLW link

Q&A with Jür­gen Sch­mid­hu­ber on risks from AI

XiXiDu15 Jun 2011 15:51 UTC
61 points
45 comments4 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

Razied5 Aug 2020 23:52 UTC
7 points
4 comments1 min readLW link

Take­aways from safety by de­fault interviews

3 Apr 2020 17:20 UTC
28 points
2 comments13 min readLW link
(aiimpacts.org)

Field-Build­ing and Deep Models

Ben Pace13 Jan 2018 21:16 UTC
21 points
12 comments4 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooen8 Apr 2018 20:04 UTC
19 points
9 comments4 min readLW link

Yoshua Ben­gio: How Rogue AIs may Arise

harfe23 May 2023 18:28 UTC
92 points
12 comments18 min readLW link
(yoshuabengio.org)

A re­jec­tion of the Orthog­o­nal­ity Thesis

ArisC24 May 2023 16:37 UTC
−2 points
11 comments2 min readLW link
(medium.com)

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_Armstrong13 Jan 2015 13:35 UTC
52 points
19 comments2 min readLW link

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks19 Dec 2019 3:00 UTC
130 points
18 comments62 min readLW link

Two ideas for al­ign­ment, per­pet­ual mu­tual dis­trust and induction

APaleBlueDot25 May 2023 0:56 UTC
1 point
2 comments4 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

Ofer11 Aug 2020 17:30 UTC
15 points
0 comments2 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer Yudkowsky30 Aug 2008 2:12 UTC
30 points
56 comments12 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
26 points
81 comments9 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazz24 Jun 2018 15:09 UTC
33 points
0 comments1 min readLW link
(www.foldl.me)

On un­fix­ably un­safe AGI architectures

Steven Byrnes19 Feb 2020 21:16 UTC
33 points
8 comments5 min readLW link

A toy model of the treach­er­ous turn

Stuart_Armstrong8 Jan 2016 12:58 UTC
42 points
13 comments6 min readLW link

Alle­gory On AI Risk, Game The­ory, and Mithril

James_Miller13 Feb 2017 20:41 UTC
45 points
57 comments3 min readLW link

1hr talk: In­tro to AGI safety

Steven Byrnes18 Jun 2019 21:41 UTC
36 points
4 comments24 min readLW link

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

Snorkelfarsan25 May 2023 16:30 UTC
2 points
0 comments25 min readLW link

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-Perlman25 May 2023 3:00 UTC
94 points
11 comments1 min readLW link
(arxiv.org)

The Evil AI Over­lord List

Stuart_Armstrong20 Nov 2012 17:02 UTC
44 points
80 comments1 min readLW link

Align­ing an H-JEPA agent via train­ing on the out­puts of an LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:08 UTC
12 points
10 comments30 min readLW link

An LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:12 UTC
16 points
0 comments12 min readLW link

[Question] Why is vi­o­lence against AI labs a taboo?

ArisC26 May 2023 8:00 UTC
−21 points
63 comments1 min readLW link

[Question] What’s your view­point on the like­li­hood of GPT-5 be­ing able to au­tonomously cre­ate, train, and im­ple­ment an AI su­pe­rior to GPT-5?

Super AGI26 May 2023 1:43 UTC
7 points
15 comments1 min readLW link

What I would like the SIAI to publish

XiXiDu1 Nov 2010 14:07 UTC
36 points
225 comments3 min readLW link

In­fer­ence from a Math­e­mat­i­cal De­scrip­tion of an Ex­ist­ing Align­ment Re­search: a pro­posal for an outer al­ign­ment re­search program

Christopher King2 Jun 2023 21:54 UTC
7 points
4 comments16 min readLW link

AI X-risk is a pos­si­ble solu­tion to the Fermi Paradox

magic9mushroom30 May 2023 17:42 UTC
11 points
20 comments2 min readLW link

Hands of gods

Anders L28 May 2023 15:15 UTC
1 point
0 comments9 min readLW link
(woodfromeden.substack.com)

Pro­posed Align­ment Tech­nique: OSNR (Out­put San­i­ti­za­tion via Nois­ing and Re­con­struc­tion) for Safer Usage of Po­ten­tially Misal­igned AGI

sudo29 May 2023 1:35 UTC
14 points
9 comments6 min readLW link

Without a tra­jec­tory change, the de­vel­op­ment of AGI is likely to go badly

Max H29 May 2023 23:42 UTC
16 points
2 comments13 min readLW link

On the Im­pos­si­bil­ity of In­tel­li­gent Paper­clip Maximizers

Michael Simkin29 May 2023 16:55 UTC
−21 points
5 comments4 min readLW link

En­gag­ing First In­tro­duc­tions to AI Risk

Rob Bensinger19 Aug 2013 6:26 UTC
31 points
21 comments3 min readLW link

Win­ners-take-how-much?

YonatanK29 May 2023 21:56 UTC
1 point
2 comments3 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFox10 Jan 2013 8:17 UTC
39 points
187 comments4 min readLW link
No comments.