RSS

AI Risk

TagLast edit: 2 Nov 2022 20:27 UTC by brook

AI Risk is analysis of the risks associated with building powerful AI systems.

Related: AI, Orthogonality thesis, Complexity of value, Goodhart’s law, Paperclip maximiser

Su­per­in­tel­li­gence FAQ

Scott Alexander20 Sep 2016 19:00 UTC
118 points
32 comments27 min readLW link

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
391 points
53 comments8 min readLW link2 reviews

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
43 points
9 comments1 min readLW link2 reviews

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob Bensinger7 May 2023 21:52 UTC
91 points
26 comments19 min readLW link

MIRI an­nounces new “Death With Dig­nity” strategy

Eliezer Yudkowsky2 Apr 2022 0:43 UTC
333 points
534 comments18 min readLW link

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

11 Nov 2021 3:01 UTC
328 points
251 comments34 min readLW link1 review

PreDCA: vanessa kosoy’s al­ign­ment protocol

Tamsin Leake20 Aug 2022 10:03 UTC
50 points
8 comments7 min readLW link
(carado.moe)

“Cor­rigi­bil­ity at some small length” by dath ilan

Christopher King5 Apr 2023 1:47 UTC
32 points
3 comments9 min readLW link
(www.glowfic.com)

In­tu­itions about goal-di­rected behavior

Rohin Shah1 Dec 2018 4:25 UTC
54 points
15 comments6 min readLW link

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimi8 Mar 2021 22:05 UTC
55 points
7 comments9 min readLW link

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
153 points
5 comments3 min readLW link
(www.conjecture.dev)

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
863 points
683 comments30 min readLW link

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
226 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
862 points
217 comments18 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

10 Jun 2022 2:08 UTC
57 points
6 comments36 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

apc8 Feb 2020 21:28 UTC
104 points
29 comments16 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
292 points
83 comments29 min readLW link

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
236 points
38 comments12 min readLW link1 review

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimi11 Aug 2020 18:16 UTC
53 points
56 comments1 min readLW link

Devel­op­men­tal Stages of GPTs

orthonormal26 Jul 2020 22:03 UTC
140 points
71 comments7 min readLW link1 review

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
74 points
9 comments3 min readLW link
(github.com)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
112 points
62 comments3 min readLW link

Stampy’s AI Safety Info—New Distil­la­tions #1 [March 2023] (Ex­pan­sive in­ter­ac­tive FAQ)

markov7 Apr 2023 11:06 UTC
42 points
0 comments2 min readLW link
(aisafety.info)

Robin Han­son’s lat­est AI risk po­si­tion statement

Liron3 Mar 2023 14:25 UTC
55 points
17 comments1 min readLW link
(www.overcomingbias.com)

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese Anthis22 Nov 2022 16:50 UTC
93 points
64 comments1 min readLW link
(www.science.org)

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC
102 points
12 comments4 min readLW link

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

22 Nov 2022 18:57 UTC
128 points
95 comments24 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

15 Feb 2023 18:11 UTC
100 points
26 comments4 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhub7 Sep 2019 18:11 UTC
74 points
11 comments8 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofsky9 Jun 2022 15:50 UTC
170 points
42 comments17 min readLW link
(www.cold-takes.com)

“Endgame safety” for AGI

Steven Byrnes24 Jan 2023 14:15 UTC
84 points
10 comments6 min readLW link

Devil’s Ad­vo­cate: Ad­verse Selec­tion Against Con­scien­tious­ness

lionhearted (Sebastian Marshall)28 May 2023 17:53 UTC
10 points
2 comments1 min readLW link

Be­ing at peace with Doom

Johannes C. Mayer9 Apr 2023 14:53 UTC
24 points
11 comments4 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

Buck21 Jul 2020 20:01 UTC
81 points
44 comments3 min readLW link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin Pope21 Mar 2023 0:06 UTC
364 points
210 comments39 min readLW link

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

Ar­chi­tects of Our Own Demise: We Should Stop Devel­op­ing AI

Roko26 Oct 2023 0:36 UTC
168 points
73 comments3 min readLW link

Did Ben­gio and Teg­mark lose a de­bate about AI x-risk against LeCun and Mitchell?

Karl von Wendt25 Jun 2023 16:59 UTC
106 points
50 comments7 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimi11 Oct 2021 8:20 UTC
107 points
10 comments15 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky24 Nov 2007 0:12 UTC
163 points
142 comments7 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
65 points
14 comments13 min readLW link

Soft take­off can still lead to de­ci­sive strate­gic advantage

Daniel Kokotajlo23 Aug 2019 16:39 UTC
122 points
47 comments8 min readLW link4 reviews

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
214 points
10 comments8 min readLW link

Alexan­der and Yud­kowsky on AGI goals

24 Jan 2023 21:09 UTC
174 points
52 comments26 min readLW link

Should we post­pone AGI un­til we reach safety?

otto.barten18 Nov 2020 15:43 UTC
27 points
36 comments3 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
296 points
33 comments2 min readLW link

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannell18 Dec 2021 2:15 UTC
32 points
22 comments4 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons to Pre­serve Humanity

James_Miller18 Mar 2023 16:22 UTC
30 points
72 comments12 min readLW link

[Question] How likely are sce­nar­ios where AGI ends up overtly or de facto tor­tur­ing us? How likely are sce­nar­ios where AGI pre­vents us from com­mit­ting suicide or dy­ing?

JohnGreer28 Mar 2023 18:00 UTC
11 points
4 comments1 min readLW link

[Question] First and Last Ques­tions for GPT-5*

Mitchell_Porter24 Nov 2023 5:03 UTC
14 points
5 comments1 min readLW link

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel Kokotajlo27 Feb 2020 18:10 UTC
27 points
5 comments8 min readLW link

“Why can’t you just turn it off?”

Roko19 Nov 2023 14:46 UTC
39 points
25 comments1 min readLW link

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_Friedrich3 Aug 2022 19:17 UTC
1 point
0 comments1 min readLW link
(forms.gle)

Catas­trophic Risks from AI #5: Rogue AIs

27 Jun 2023 22:06 UTC
15 points
0 comments22 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #1: Introduction

22 Jun 2023 17:09 UTC
40 points
1 comment5 min readLW link
(arxiv.org)

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_Maxwell11 Mar 2020 16:27 UTC
43 points
20 comments4 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrump20 Feb 2021 9:42 UTC
12 points
16 comments7 min readLW link

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_Altair15 May 2023 1:40 UTC
25 points
2 comments1 min readLW link

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan H2 Jul 2022 0:09 UTC
40 points
0 comments1 min readLW link
(arxiv.org)

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel Nellessen22 Jun 2022 17:59 UTC
2 points
1 comment3 min readLW link
(snellessen.com)

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
71 points
9 comments3 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimi13 Aug 2022 10:01 UTC
33 points
2 comments2 min readLW link

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimi11 Mar 2021 15:01 UTC
21 points
12 comments9 min readLW link

[Question] Should peo­ple build pro­duc­ti­za­tions of open source AI mod­els?

lc2 Nov 2023 1:26 UTC
21 points
0 comments1 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan Zoellner3 Dec 2021 18:31 UTC
7 points
4 comments1 min readLW link

AI Safety is Drop­ping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC
51 points
69 comments34 min readLW link

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

18 Oct 2022 18:40 UTC
43 points
9 comments33 min readLW link

Bayeswatch 7: Wildfire

lsusr8 Sep 2021 5:35 UTC
48 points
6 comments3 min readLW link

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippke3 Sep 2021 14:19 UTC
5 points
14 comments1 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021 15:16 UTC
16 points
8 comments8 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

Davidmanheim21 Sep 2022 19:32 UTC
13 points
0 comments1 min readLW link

Epistemic Strate­gies of Selec­tion Theorems

adamShimi18 Oct 2021 8:57 UTC
33 points
1 comment12 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimi17 Sep 2022 17:03 UTC
18 points
0 comments1 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannell23 Oct 2022 21:48 UTC
59 points
44 comments17 min readLW link

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

18 Oct 2023 17:06 UTC
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

Akash25 Nov 2022 20:47 UTC
37 points
2 comments9 min readLW link

Poster Ses­sion on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC
7 points
6 comments1 min readLW link

[Question] Why are we sure that AI will “want” some­thing?

shminux16 Sep 2022 20:35 UTC
31 points
57 comments1 min readLW link

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC
22 points
11 comments6 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
16 points
9 comments4 min readLW link

Microdooms averted by work­ing on AI Safety

nikola17 Sep 2023 21:46 UTC
30 points
2 comments3 min readLW link
(forum.effectivealtruism.org)

Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
97 points
10 comments12 min readLW link

Catas­trophic Risks from AI #6: Dis­cus­sion and FAQ

27 Jun 2023 23:23 UTC
24 points
1 comment13 min readLW link
(arxiv.org)

If you wish to make an ap­ple pie, you must first be­come dic­ta­tor of the universe

jasoncrawford5 Jul 2023 18:14 UTC
27 points
9 comments13 min readLW link
(rootsofprogress.org)

Catas­trophic Risks from AI #4: Or­ga­ni­za­tional Risks

26 Jun 2023 19:36 UTC
23 points
0 comments21 min readLW link
(arxiv.org)

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Remmelt2 Jun 2023 15:42 UTC
43 points
45 comments11 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace3 Nov 2019 18:20 UTC
82 points
32 comments3 min readLW link1 review
(meteuphoric.com)

Risks from AI Overview: Summary

18 Aug 2023 1:21 UTC
25 points
0 comments13 min readLW link
(www.safe.ai)

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

13 Sep 2023 13:29 UTC
19 points
0 comments5 min readLW link
(aisafety.camp)

Beyond Hyperanthropomorphism

PointlessOne21 Aug 2022 17:55 UTC
3 points
17 comments1 min readLW link
(studio.ribbonfarm.com)

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob Bensinger12 Dec 2021 2:08 UTC
70 points
35 comments7 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert Miles1 Nov 2022 23:23 UTC
68 points
105 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_it15 Jan 2018 14:33 UTC
80 points
68 comments2 min readLW link

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

Robbo7 Nov 2021 0:00 UTC
42 points
10 comments7 min readLW link

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimi22 Oct 2021 8:22 UTC
5 points
0 comments6 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

mako yass28 Sep 2021 1:54 UTC
11 points
3 comments1 min readLW link

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusr20 Sep 2021 6:36 UTC
70 points
16 comments6 min readLW link

Some dis­junc­tive rea­sons for ur­gency on AI risk

Wei Dai15 Feb 2019 20:43 UTC
36 points
24 comments1 min readLW link

In­ter­view with Skynet

lsusr30 Sep 2021 2:20 UTC
49 points
1 comment2 min readLW link

Drexler on AI Risk

PeterMcCluskey1 Feb 2019 5:11 UTC
35 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [Sept 2022]

plex8 Sep 2022 11:56 UTC
22 points
48 comments2 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC
15 points
3 comments1 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworth8 Aug 2020 18:31 UTC
139 points
29 comments3 min readLW link

AI Align­ment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC
126 points
6 comments35 min readLW link

Brain­storm­ing ad­di­tional AI risk re­duc­tion ideas

John_Maxwell14 Jun 2012 7:55 UTC
19 points
37 comments1 min readLW link

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevor28 Aug 2023 18:54 UTC
37 points
7 comments3 min readLW link

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshen20 Aug 2022 10:29 UTC
32 points
4 comments7 min readLW link

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June Ku29 Apr 2021 15:38 UTC
21 points
7 comments1 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC
375 points
36 comments14 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_Ngo10 Aug 2022 22:46 UTC
97 points
13 comments27 min readLW link

April drafts

AI Impacts1 Apr 2021 18:10 UTC
49 points
2 comments1 min readLW link
(aiimpacts.org)

Catas­trophic Risks from AI #3: AI Race

23 Jun 2023 19:21 UTC
18 points
9 comments29 min readLW link
(arxiv.org)

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
37 points
5 comments3 min readLW link

Re­quest: stop ad­vanc­ing AI capabilities

So8res26 May 2023 17:42 UTC
154 points
23 comments1 min readLW link

The Align­ment Problem

lsusr11 Jul 2022 3:03 UTC
46 points
18 comments3 min readLW link

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilan14 May 2023 20:50 UTC
41 points
1 comment10 min readLW link
(danielfilan.com)

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
126 points
9 comments15 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskey28 Dec 2021 2:20 UTC
10 points
0 comments6 min readLW link
(www.bayesianinvestor.com)

An un­al­igned benchmark

paulfchristiano17 Nov 2018 15:51 UTC
31 points
0 comments9 min readLW link

The strat­egy-steal­ing assumption

paulfchristiano16 Sep 2019 15:23 UTC
84 points
46 comments12 min readLW link3 reviews

A plea for solu­tion­ism on AI safety

jasoncrawford9 Jun 2023 16:29 UTC
72 points
6 comments6 min readLW link
(rootsofprogress.org)

Why and When In­ter­pretabil­ity Work is Dangerous

NicholasKross28 May 2023 0:27 UTC
20 points
7 comments8 min readLW link
(www.thinkingmuchbetter.com)

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

16 May 2023 15:14 UTC
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

Will Ar­tifi­cial Su­per­in­tel­li­gence Kill Us?

James_Miller23 May 2023 16:27 UTC
33 points
2 comments22 min readLW link

Ac­ti­va­tion ad­di­tions in a sim­ple MNIST network

Garrett Baker18 May 2023 2:49 UTC
26 points
0 comments2 min readLW link

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

beren30 May 2023 20:54 UTC
48 points
8 comments5 min readLW link

Sys­tems that can­not be un­safe can­not be safe

Davidmanheim2 May 2023 8:53 UTC
62 points
27 comments2 min readLW link

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

Roko4 Nov 2023 22:23 UTC
103 points
22 comments6 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane Ruthenis2 May 2023 21:34 UTC
89 points
71 comments22 min readLW link

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

beren30 Oct 2023 15:20 UTC
6 points
0 comments1 min readLW link

RA Bounty: Look­ing for feed­back on screen­play about AI Risk

Writer26 Oct 2023 13:23 UTC
30 points
6 comments1 min readLW link

Quote quiz: “drift­ing into de­pen­dence”

jasoncrawford27 Apr 2023 15:13 UTC
7 points
6 comments1 min readLW link
(rootsofprogress.org)

[Question] How much do per­sonal bi­ases in risk as­sess­ment af­fect as­sess­ment of AI risks?

Gordon Seidoh Worley3 May 2023 6:12 UTC
10 points
8 comments1 min readLW link

Min­i­mum Vi­able Exterminator

Richard Horvath29 May 2023 16:32 UTC
14 points
5 comments5 min readLW link

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

4 Oct 2023 17:37 UTC
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

AI Takeover Sce­nario with Scaled LLMs

simeon_c16 Apr 2023 23:28 UTC
42 points
15 comments8 min readLW link

Fi­nan­cial Times: We must slow down the race to God-like AI

trevor13 Apr 2023 19:55 UTC
103 points
17 comments16 min readLW link
(www.ft.com)

All images from the WaitButWhy se­quence on AI

trevor8 Apr 2023 7:36 UTC
72 points
5 comments2 min readLW link

Reli­a­bil­ity, Se­cu­rity, and AI risk: Notes from in­fosec text­book chap­ter 1

Akash7 Apr 2023 15:47 UTC
34 points
1 comment4 min readLW link

“warn­ing about ai doom” is also “an­nounc­ing ca­pa­bil­ities progress to noobs”

the gears to ascension8 Apr 2023 23:42 UTC
16 points
5 comments3 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [April 2023]

steven04618 Apr 2023 4:21 UTC
57 points
88 comments2 min readLW link

Distil­led—AGI Safety from First Principles

Harrison G29 May 2022 0:57 UTC
10 points
1 comment14 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin Shah2 Jan 2020 18:20 UTC
36 points
95 comments10 min readLW link
(mailchi.mp)

Why I don’t be­lieve in doom

mukashi7 Jun 2022 23:49 UTC
6 points
30 comments4 min readLW link

[Link] Sarah Con­stantin: “Why I am Not An AI Doomer”

lbThingrb12 Apr 2023 1:52 UTC
61 points
13 comments1 min readLW link
(sarahconstantin.substack.com)

Top les­son from GPT: we will prob­a­bly de­stroy hu­man­ity “for the lulz” as soon as we are able.

shminux16 Apr 2023 20:27 UTC
65 points
28 comments1 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchin10 Jun 2022 17:24 UTC
10 points
2 comments1 min readLW link

Chad Jones pa­per mod­el­ing AI and x-risk vs. growth

jasoncrawford26 Apr 2023 20:07 UTC
39 points
7 comments2 min readLW link
(web.stanford.edu)

On A List of Lethalities

Zvi13 Jun 2022 12:30 UTC
160 points
48 comments54 min readLW link
(thezvi.wordpress.com)

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofsky27 Oct 2023 15:19 UTC
188 points
29 comments8 min readLW link

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven Byrnes24 Nov 2019 1:40 UTC
25 points
3 comments7 min readLW link

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

31 Oct 2023 19:34 UTC
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

The other side of the tidal wave

KatjaGrace3 Nov 2023 5:40 UTC
164 points
77 comments1 min readLW link
(worldspiritsockpuppet.com)

Will GPT-5 be able to self-im­prove?

Nathan Helm-Burger29 Apr 2023 17:34 UTC
17 points
22 comments3 min readLW link

Are we there yet?

theflowerpot20 Jun 2022 11:19 UTC
2 points
2 comments1 min readLW link

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

9 May 2023 15:26 UTC
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

Let’s build a fire alarm for AGI

chaosmage15 May 2023 9:16 UTC
−2 points
0 comments2 min readLW link

Ac­ti­va­tion ad­di­tions in a small resi­d­ual network

Garrett Baker22 May 2023 20:28 UTC
22 points
4 comments3 min readLW link

AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

23 May 2023 21:47 UTC
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

[Question] Sugges­tions of posts on the AF to review

adamShimi16 Feb 2021 12:40 UTC
56 points
20 comments1 min readLW link

Hands-On Ex­pe­rience Is Not Magic

Thane Ruthenis27 May 2023 16:57 UTC
20 points
14 comments5 min readLW link

AI Safety Newslet­ter #1 [CAIS Linkpost]

10 Apr 2023 20:18 UTC
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

A moral back­lash against AI will prob­a­bly slow down AGI development

geoffreymiller7 Jun 2023 20:39 UTC
48 points
10 comments14 min readLW link

Catas­trophic Risks from AI #2: Mal­i­cious Use

22 Jun 2023 17:10 UTC
37 points
1 comment17 min readLW link
(arxiv.org)

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas Teixeira29 Jul 2022 21:06 UTC
35 points
1 comment9 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
79 points
12 comments9 min readLW link

My re­search agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC
70 points
6 comments11 min readLW link

Levels of safety for AI and other technologies

jasoncrawford28 Jun 2023 18:35 UTC
16 points
0 comments2 min readLW link
(rootsofprogress.org)

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
401 points
52 comments7 min readLW link
(www.youtube.com)

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworth12 Aug 2022 16:30 UTC
90 points
50 comments1 min readLW link

Ar­gu­ments against ex­is­ten­tial risk from AI, part 2

Nina Rimsky10 Jul 2023 8:25 UTC
6 points
0 comments5 min readLW link
(ninarimsky.substack.com)

Win­ners of AI Align­ment Awards Re­search Contest

13 Jul 2023 16:14 UTC
114 points
2 comments12 min readLW link
(alignmentawards.com)

Thoughts on ‘List of Lethal­ities’

Alex Lawsen 17 Aug 2022 18:33 UTC
27 points
0 comments10 min readLW link

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristiano31 Jul 2023 16:04 UTC
188 points
34 comments11 min readLW link

The prob­lem/​solu­tion ma­trix: Calcu­lat­ing the prob­a­bil­ity of AI safety “on the back of an en­velope”

John_Maxwell20 Oct 2019 8:03 UTC
22 points
4 comments2 min readLW link

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
101 points
15 comments5 min readLW link
(arxiv.org)

[Question] Mea­sure of com­plex­ity al­lowed by the laws of the uni­verse and rel­a­tive the­ory?

dr_s7 Sep 2023 12:21 UTC
8 points
22 comments1 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTrout22 Jun 2021 22:26 UTC
71 points
43 comments16 min readLW link
(arxiv.org)

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
107 points
5 comments4 min readLW link

Sam Alt­man and Ezra Klein on the AI Revolution

Zack_M_Davis27 Jun 2021 4:53 UTC
38 points
17 comments1 min readLW link
(www.nytimes.com)

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC
16 points
15 comments27 min readLW link

A shift in ar­gu­ments for AI risk

Richard_Ngo28 May 2019 13:47 UTC
32 points
7 comments1 min readLW link
(fragile-credences.github.io)

Ngo and Yud­kowsky on al­ign­ment difficulty

15 Nov 2021 20:31 UTC
248 points
148 comments99 min readLW link1 review

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

26 Jan 2023 21:01 UTC
39 points
80 comments2 min readLW link

Us­ing blin­ders to help you see things for what they are

Adam Zerner11 Nov 2021 7:07 UTC
13 points
2 comments2 min readLW link

“Tak­ing AI Risk Se­ri­ously” (thoughts by Critch)

Raemon29 Jan 2018 9:27 UTC
110 points
68 comments13 min readLW link

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

Jan5 Nov 2021 21:42 UTC
42 points
2 comments12 min readLW link
(universalprior.substack.com)

Re­sults from the lan­guage model hackathon

Esben Kran10 Oct 2022 8:29 UTC
22 points
1 comment4 min readLW link

Work on Se­cu­rity In­stead of Friendli­ness?

Wei Dai21 Jul 2012 18:28 UTC
65 points
107 comments2 min readLW link

[Question] Did AI pi­o­neers not worry much about AI risks?

lisperati9 Feb 2020 19:58 UTC
42 points
9 comments1 min readLW link

Cri­tiquing “What failure looks like”

Grue_Slinky27 Dec 2019 23:59 UTC
35 points
6 comments3 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
72 points
9 comments15 min readLW link

Wizards and prophets of AI [draft for com­ment]

jasoncrawford31 Mar 2023 20:22 UTC
16 points
11 comments6 min readLW link

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouarti6 Sep 2021 17:30 UTC
2 points
1 comment1 min readLW link

AI Doom Is Not (Only) Disjunctive

NickGabs30 Mar 2023 1:42 UTC
12 points
0 comments5 min readLW link

Orthog­o­nal­ity is expensive

beren3 Apr 2023 10:20 UTC
33 points
8 comments3 min readLW link

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

Akash26 Mar 2023 19:33 UTC
157 points
9 comments4 min readLW link
(arxiv.org)

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimi28 Aug 2021 13:35 UTC
12 points
2 comments1 min readLW link

Coun­ter­ar­gu­ments to the ba­sic AI x-risk case

KatjaGrace14 Oct 2022 13:00 UTC
360 points
123 comments34 min readLW link
(aiimpacts.org)

How Josiah be­came an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC
4 points
0 comments1 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
117 points
20 comments17 min readLW link

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_Sotala4 May 2018 8:56 UTC
14 points
1 comment1 min readLW link
(arxiv.org)

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021 21:42 UTC
47 points
3 comments1 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_Porter16 Oct 2012 4:33 UTC
21 points
39 comments1 min readLW link

AI Safety Micro­grant Round

Chris_Leong14 Nov 2022 4:25 UTC
22 points
1 comment1 min readLW link

Sum­mary of the Acausal At­tack Is­sue for AIXI

Diffractor13 Dec 2021 8:16 UTC
14 points
6 comments4 min readLW link

Com­plex Sys­tems are Hard to Control

jsteinhardt4 Apr 2023 0:00 UTC
42 points
5 comments10 min readLW link
(bounded-regret.ghost.io)

[Question] Why not con­strain wet­labs in­stead of AI?

Lone Pine21 Mar 2023 18:02 UTC
13 points
10 comments1 min readLW link

The Wizard of Oz Prob­lem: How in­cen­tives and nar­ra­tives can skew our per­cep­tion of AI developments

Akash20 Mar 2023 20:44 UTC
16 points
3 comments6 min readLW link

Three Sto­ries for How AGI Comes Be­fore FAI

John_Maxwell17 Sep 2019 23:26 UTC
27 points
5 comments6 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC
23 points
1 comment2 min readLW link

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
113 points
13 comments4 min readLW link

Why AI Safety is Hard

Simon Möller22 Mar 2023 10:44 UTC
3 points
0 comments6 min readLW link

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcs15 Mar 2023 17:55 UTC
178 points
42 comments1 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
65 points
27 comments27 min readLW link

Six AI Risk/​Strat­egy Ideas

Wei Dai27 Aug 2019 0:40 UTC
64 points
17 comments4 min readLW link1 review

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicq14 Nov 2021 14:08 UTC
2 points
3 comments2 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC
43 points
19 comments7 min readLW link
(plato.stanford.edu)

I Think Eliezer Should Go on Glenn Beck

Lao Mein30 Jun 2023 3:12 UTC
25 points
21 comments1 min readLW link

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

Akash23 Mar 2023 17:10 UTC
107 points
24 comments6 min readLW link

Re­view of “Fun with +12 OOMs of Com­pute”

28 Mar 2021 14:55 UTC
63 points
21 comments8 min readLW link1 review

New sur­vey: 46% of Amer­i­cans are con­cerned about ex­tinc­tion from AI; 69% sup­port a six-month pause in AI development

Akash5 Apr 2023 1:26 UTC
46 points
9 comments1 min readLW link
(today.yougov.com)

Deep­Mind and Google Brain are merg­ing [Linkpost]

Akash20 Apr 2023 18:47 UTC
55 points
5 comments1 min readLW link
(www.deepmind.com)

Ques­tions about Con­je­cure’s CoEm proposal

9 Mar 2023 19:32 UTC
51 points
4 comments2 min readLW link

Con­tra “Strong Co­her­ence”

DragonGod4 Mar 2023 20:05 UTC
39 points
24 comments1 min readLW link

[Linkpost] Scott Alexan­der re­acts to OpenAI’s lat­est post

Akash11 Mar 2023 22:24 UTC
27 points
0 comments5 min readLW link
(astralcodexten.substack.com)

The Prefer­ence Fulfill­ment Hypothesis

Kaj_Sotala26 Feb 2023 10:55 UTC
66 points
62 comments11 min readLW link

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

23 Feb 2023 12:34 UTC
138 points
89 comments75 min readLW link

Con­fu­sions in My Model of AI Risk

peterbarnett7 Jul 2022 1:05 UTC
22 points
9 comments5 min readLW link

Evil au­to­com­plete: Ex­is­ten­tial Risk and Next-To­ken Predictors

Yitz28 Feb 2023 8:47 UTC
9 points
3 comments5 min readLW link

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC
199 points
13 comments1 min readLW link

Clar­ify­ing “What failure looks like”

Sam Clarke20 Sep 2020 20:40 UTC
95 points
14 comments17 min readLW link

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not Relevant8 Apr 2022 16:07 UTC
111 points
163 comments4 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

steven04618 May 2023 22:30 UTC
33 points
44 comments2 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
52 points
3 comments28 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_Davis17 Sep 2020 2:23 UTC
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

More Is Differ­ent for AI

jsteinhardt4 Jan 2022 19:30 UTC
137 points
22 comments3 min readLW link
(bounded-regret.ghost.io)

4 ways to think about de­moc­ra­tiz­ing AI [GovAI Linkpost]

Akash13 Feb 2023 18:06 UTC
24 points
4 comments1 min readLW link
(www.governance.ai)

Challenges with Break­ing into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC
75 points
15 comments3 min readLW link

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilan31 Mar 2022 5:20 UTC
24 points
1 comment48 min readLW link

Many im­por­tant tech­nolo­gies start out as sci­ence fic­tion be­fore be­com­ing real

trevor10 Feb 2023 9:36 UTC
26 points
2 comments2 min readLW link

Ban­kless Pod­cast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesed20 Feb 2023 16:42 UTC
83 points
54 comments1 min readLW link
(www.youtube.com)

Plan for mediocre al­ign­ment of brain-like [model-based RL] AGI

Steven Byrnes13 Mar 2023 14:11 UTC
63 points
24 comments12 min readLW link

Many AI gov­er­nance pro­pos­als have a trade­off be­tween use­ful­ness and feasibility

3 Feb 2023 18:49 UTC
22 points
2 comments2 min readLW link

[Linkpost] TIME ar­ti­cle: Deep­Mind’s CEO Helped Take AI Main­stream. Now He’s Urg­ing Caution

Akash21 Jan 2023 16:51 UTC
56 points
2 comments3 min readLW link
(time.com)

Against a Gen­eral Fac­tor of Doom

Jeffrey Heninger23 Nov 2022 16:50 UTC
61 points
18 comments5 min readLW link
(aiimpacts.org)

ea.do­mains—Do­mains Free to a Good Home

plex12 Jan 2023 13:32 UTC
24 points
0 comments1 min readLW link

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

mako yass2 Mar 2022 23:47 UTC
25 points
10 comments1 min readLW link

How we could stum­ble into AI catastrophe

HoldenKarnofsky13 Jan 2023 16:20 UTC
64 points
18 comments18 min readLW link
(www.cold-takes.com)

How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:50 UTC
64 points
6 comments13 min readLW link
(www.neelnanda.io)

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
81 points
14 comments7 min readLW link

World-Model In­ter­pretabil­ity Is All We Need

Thane Ruthenis14 Jan 2023 19:37 UTC
29 points
21 comments21 min readLW link

Thoughts on re­fus­ing harm­ful re­quests to large lan­guage models

William_S19 Jan 2023 19:49 UTC
30 points
4 comments2 min readLW link

Talk to me about your sum­mer/​ca­reer plans

Akash31 Jan 2023 18:29 UTC
31 points
3 comments2 min readLW link

Ta­boo P(doom)

NathanBarnard3 Feb 2023 10:37 UTC
11 points
10 comments1 min readLW link

How evals might (or might not) pre­vent catas­trophic risks from AI

Akash7 Feb 2023 20:16 UTC
40 points
0 comments9 min readLW link

Uber Self-Driv­ing Crash

jefftk7 Nov 2019 15:00 UTC
110 points
1 comment2 min readLW link
(www.jefftk.com)

It Looks Like You’re Try­ing To Take Over The World

gwern9 Mar 2022 16:35 UTC
396 points
119 comments1 min readLW link
(www.gwern.net)

AI Safety Info Distil­la­tion Fellowship

17 Feb 2023 16:16 UTC
47 points
3 comments3 min readLW link

Paradigm-build­ing: Introduction

Cameron Berg8 Feb 2022 0:06 UTC
28 points
0 comments2 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
47 points
42 comments1 min readLW link

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron Berg8 Feb 2022 16:12 UTC
26 points
5 comments14 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei Dai7 Sep 2019 2:54 UTC
116 points
27 comments4 min readLW link1 review

Re­ply to Holden on ‘Tool AI’

Eliezer Yudkowsky12 Jun 2012 18:00 UTC
152 points
356 comments17 min readLW link

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
170 points
44 comments61 min readLW link
(thezvi.wordpress.com)

In­cen­tives and Selec­tion: A Miss­ing Frame From AI Threat Dis­cus­sions?

DragonGod26 Feb 2023 1:18 UTC
11 points
16 comments2 min readLW link

Rac­ing through a minefield: the AI de­ploy­ment problem

HoldenKarnofsky22 Dec 2022 16:10 UTC
36 points
2 comments13 min readLW link
(www.cold-takes.com)

AI #2

Zvi2 Mar 2023 14:50 UTC
66 points
18 comments55 min readLW link
(thezvi.wordpress.com)

Con­tra Han­son on AI Risk

Liron4 Mar 2023 8:02 UTC
36 points
23 comments8 min readLW link

Against ubiquitous al­ign­ment taxes

beren6 Mar 2023 19:50 UTC
56 points
10 comments2 min readLW link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
133 points
23 comments8 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
209 points
36 comments38 min readLW link2 reviews

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Akash20 Dec 2022 21:39 UTC
18 points
2 comments11 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
183 points
42 comments12 min readLW link3 reviews

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
82 points
5 comments6 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

Akash30 Dec 2022 19:33 UTC
55 points
3 comments20 min readLW link

Thoughts on AGI safety from the top

jylin042 Feb 2022 20:06 UTC
36 points
3 comments32 min readLW link

Why I’m Wor­ried About AI

peterbarnett23 May 2022 21:13 UTC
22 points
2 comments12 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

24 May 2022 0:00 UTC
57 points
2 comments21 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
102 points
17 comments13 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
80 points
48 comments12 min readLW link

Four lenses on AI risks

jasoncrawford28 Mar 2023 21:52 UTC
23 points
5 comments3 min readLW link
(rootsofprogress.org)

AI risk hub in Sin­ga­pore?

Daniel Kokotajlo29 Oct 2020 11:45 UTC
57 points
18 comments4 min readLW link

Will work­ing here ad­vance AGI? Help us not de­stroy the world!

Yonatan Cale29 May 2022 11:42 UTC
30 points
46 comments1 min readLW link

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC
33 points
4 comments6 min readLW link
(kajsotala.fi)

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

30 May 2022 20:25 UTC
51 points
3 comments25 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_Sotala2 Jun 2022 21:41 UTC
129 points
27 comments4 min readLW link

I’m try­ing out “as­ter­oid mind­set”

Alex_Altair3 Jun 2022 13:35 UTC
90 points
5 comments4 min readLW link

Episte­molog­i­cal Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC
60 points
11 comments10 min readLW link

A Quick Guide to Con­fronting Doom

Ruby13 Apr 2022 19:30 UTC
240 points
33 comments2 min readLW link

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)7 Apr 2023 20:27 UTC
6 points
3 comments4 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofsky8 Dec 2022 19:50 UTC
69 points
6 comments14 min readLW link
(www.cold-takes.com)

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
221 points
61 comments15 min readLW link2 reviews

Where’s the foom?

Fergus Fettes11 Apr 2023 15:50 UTC
34 points
27 comments2 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidad15 Dec 2022 13:42 UTC
56 points
1 comment3 min readLW link

Went­worth and Larsen on buy­ing time

9 Jan 2023 21:31 UTC
73 points
6 comments12 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC
46 points
20 comments1 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_Altair10 Jun 2022 17:38 UTC
21 points
4 comments8 min readLW link

“Care­fully Boot­strapped Align­ment” is or­ga­ni­za­tion­ally hard

Raemon17 Mar 2023 18:00 UTC
246 points
20 comments11 min readLW link

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

Akash25 Apr 2023 18:49 UTC
27 points
11 comments3 min readLW link
(childrenoficarus.substack.com)

Talk­ing pub­li­cly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC
173 points
8 comments6 min readLW link

Sen­sor Ex­po­sure can Com­pro­mise the Hu­man Brain in the 2020s

trevor26 Oct 2023 3:31 UTC
17 points
2 comments10 min readLW link

The Main Sources of AI Risk?

21 Mar 2019 18:28 UTC
115 points
26 comments2 min readLW link

Con­ti­nu­ity Assumptions

Jan_Kulveit13 Jun 2022 21:31 UTC
33 points
13 comments4 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_Critch14 Jun 2022 19:31 UTC
232 points
40 comments2 min readLW link

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMills15 Jun 2022 3:12 UTC
35 points
4 comments2 min readLW link

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_Leong15 Jun 2022 8:33 UTC
7 points
17 comments1 min readLW link

The AI Safety Game (UPDATED)

Daniel Kokotajlo5 Dec 2020 10:27 UTC
44 points
10 comments3 min readLW link

Memes and Ra­tional Decisions

inferential9 Jan 2015 6:42 UTC
35 points
18 comments10 min readLW link

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC
62 points
16 comments5 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC
78 points
20 comments1 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouarti21 Aug 2020 23:25 UTC
8 points
10 comments1 min readLW link

Univer­sal­ity and the “Filter”

maggiehayes16 Dec 2021 0:47 UTC
10 points
2 comments11 min readLW link

New AI risks re­search in­sti­tute at Oxford University

lukeprog16 Nov 2011 18:52 UTC
36 points
10 comments1 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog13 Nov 2012 21:56 UTC
63 points
92 comments8 min readLW link

Refram­ing the Prob­lem of AI Progress

Wei Dai12 Apr 2012 19:31 UTC
32 points
47 comments1 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_lika7 Dec 2021 19:37 UTC
1 point
0 comments1 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
54 points
1 comment12 min readLW link

[LINK] NYT Ar­ti­cle about Ex­is­ten­tial Risk from AI

[deleted]28 Jan 2013 10:37 UTC
38 points
23 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
120 points
495 comments8 min readLW link

AI Safety Un­con­fer­ence NeurIPS 2022

Orpheus7 Nov 2022 15:39 UTC
25 points
0 comments1 min readLW link
(aisafetyevents.org)

4 Key As­sump­tions in AI Safety

Prometheus7 Nov 2022 10:50 UTC
20 points
5 comments7 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprog16 Mar 2012 17:12 UTC
42 points
161 comments33 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
16 points
5 comments7 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika Srikumar24 Nov 2021 8:27 UTC
6 points
0 comments1 min readLW link

Q&A with Stan Fran­klin on risks from AI

XiXiDu11 Jun 2011 15:22 UTC
36 points
10 comments2 min readLW link

AI Sum­mer Fel­lows Program

colm21 Mar 2018 15:32 UTC
21 points
0 comments1 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog8 Jul 2013 14:33 UTC
17 points
8 comments1 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

23 Nov 2021 19:16 UTC
64 points
13 comments3 min readLW link
(aitracker.org)

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC
25 points
15 comments1 min readLW link

AI as a Civ­i­liza­tional Risk Part 6/​6: What can be done

PashaKamyshev3 Nov 2022 19:48 UTC
2 points
4 comments4 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair Halberstadt16 Nov 2021 19:55 UTC
10 points
2 comments6 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC
41 points
33 comments6 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC
29 points
11 comments8 min readLW link

Two Stupid AI Align­ment Ideas

aphyer16 Nov 2021 16:13 UTC
24 points
3 comments4 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDEL28 Oct 2022 1:53 UTC
−22 points
4 comments9 min readLW link

Will the world’s elites nav­i­gate the cre­ation of AI just fine?

lukeprog31 May 2013 18:49 UTC
36 points
266 comments2 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
26 points
0 comments1 min readLW link
(github.com)

What would we do if al­ign­ment were fu­tile?

Grant Demaree14 Nov 2021 8:09 UTC
75 points
39 comments3 min readLW link

Algo trad­ing is a cen­tral ex­am­ple of AI risk

Vanessa Kosoy28 Jul 2018 20:31 UTC
27 points
5 comments1 min readLW link

Course recom­men­da­tions for Friendli­ness researchers

Louie9 Jan 2013 14:33 UTC
96 points
112 comments10 min readLW link

Re­sponse to Katja Grace’s AI x-risk counterarguments

19 Oct 2022 1:17 UTC
76 points
18 comments15 min readLW link

Q&A with ex­perts on risks from AI #1

XiXiDu8 Jan 2012 11:46 UTC
45 points
67 comments9 min readLW link

Prov­ably Hon­est—A First Step

Srijanak De5 Nov 2022 19:18 UTC
10 points
2 comments8 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJules11 Nov 2021 7:04 UTC
2 points
2 comments1 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFox10 Jan 2013 8:17 UTC
39 points
187 comments4 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius Hobbhahn8 Nov 2021 12:51 UTC
29 points
15 comments12 min readLW link

What I would like the SIAI to publish

XiXiDu1 Nov 2010 14:07 UTC
36 points
225 comments3 min readLW link

What is the most evil AI that we could build, to­day?

ThomasJ1 Nov 2021 19:58 UTC
−2 points
14 comments1 min readLW link

Misal­ign­ment-by-de­fault in multi-agent systems

13 Oct 2022 15:38 UTC
19 points
8 comments20 min readLW link
(www.gladstone.ai)

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
42 points
1 comment13 min readLW link

The Evil AI Over­lord List

Stuart_Armstrong20 Nov 2012 17:02 UTC
44 points
80 comments1 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex Flint31 Mar 2018 18:37 UTC
30 points
3 comments11 min readLW link

A rant against robots

Lê Nguyên Hoang14 Jan 2020 22:03 UTC
65 points
7 comments5 min readLW link

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
112 points
9 comments56 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Pos­si­ble miracles

9 Oct 2022 18:17 UTC
62 points
33 comments8 min readLW link

1hr talk: In­tro to AGI safety

Steven Byrnes18 Jun 2019 21:41 UTC
36 points
4 comments24 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

18 Oct 2021 18:37 UTC
81 points
9 comments10 min readLW link

Alle­gory On AI Risk, Game The­ory, and Mithril

James_Miller13 Feb 2017 20:41 UTC
45 points
57 comments3 min readLW link

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
68 points
38 comments2 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Tech­nol­ogy Policy, USA (2022)

Fer32dwt34r3dfsz5 Oct 2022 16:42 UTC
9 points
4 comments2 min readLW link
(www.whitehouse.gov)

Boolean Prim­i­tives for Cou­pled Optimizers

Paul Bricman7 Oct 2022 18:02 UTC
9 points
0 comments8 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

Krieger2 Oct 2022 9:53 UTC
8 points
6 comments1 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC
17 points
2 comments12 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
31 points
7 comments10 min readLW link

A toy model of the treach­er­ous turn

Stuart_Armstrong8 Jan 2016 12:58 UTC
42 points
13 comments6 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi12 Mar 2018 12:10 UTC
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Oren’s Field Guide of Bad AGI Outcomes

Oren Montano26 Sep 2022 4:06 UTC
0 points
0 comments1 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
7 points
6 comments3 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard Harris27 Sep 2021 15:55 UTC
76 points
13 comments10 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

22 Sep 2022 13:55 UTC
22 points
0 comments27 min readLW link

Here Be AGI Dragons

Oren Montano21 Sep 2022 22:28 UTC
−1 points
3 comments5 min readLW link

On un­fix­ably un­safe AGI architectures

Steven Byrnes19 Feb 2020 21:16 UTC
33 points
8 comments5 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
27 points
1 comment27 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
56 points
24 comments6 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazz24 Jun 2018 15:09 UTC
33 points
0 comments1 min readLW link
(www.foldl.me)

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

Buck9 Sep 2021 19:46 UTC
88 points
12 comments5 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc1001414 Sep 2022 20:37 UTC
8 points
0 comments16 min readLW link

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
72 points
11 comments14 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
26 points
81 comments9 min readLW link

Be­ware of black boxes in AI al­ign­ment research

cousin_it18 Jan 2018 15:07 UTC
39 points
10 comments1 min readLW link

I Vouch For MIRI

Zvi17 Dec 2017 17:50 UTC
37 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimony14 Sep 2022 17:19 UTC
14 points
6 comments1 min readLW link

Emily Brontë on: Psy­chol­ogy Re­quired for Se­ri­ous™ AGI Safety Research

robertzk14 Sep 2022 14:47 UTC
2 points
0 comments1 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilm15 Sep 2022 8:25 UTC
10 points
4 comments12 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
105 points
23 comments15 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGod14 Sep 2022 12:15 UTC
6 points
7 comments6 min readLW link

Risk aver­sion and GPT-3

hatta_afiq13 Sep 2022 20:50 UTC
1 point
0 comments1 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

Fer32dwt34r3dfsz17 Sep 2022 22:27 UTC
17 points
4 comments1 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon Lang18 Sep 2022 13:08 UTC
44 points
3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John Nay18 Sep 2022 20:39 UTC
11 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul Bricman16 Sep 2022 14:45 UTC
30 points
0 comments16 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasper12 Sep 2022 19:07 UTC
97 points
7 comments2 min readLW link
(arxiv.org)

AI Safety field-build­ing pro­jects I’d like to see

Akash11 Sep 2022 23:43 UTC
44 points
7 comments6 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul Bricman23 Sep 2022 15:30 UTC
15 points
0 comments10 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-Perlman29 Aug 2021 18:00 UTC
5 points
2 comments11 min readLW link

On Generality

Oren Montano26 Sep 2022 4:06 UTC
2 points
0 comments5 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer Yudkowsky30 Aug 2008 2:12 UTC
30 points
56 comments12 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul Bricman30 Sep 2022 11:28 UTC
25 points
0 comments10 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

11 Sep 2022 10:57 UTC
46 points
13 comments30 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

elifland30 Sep 2022 12:21 UTC
67 points
0 comments3 min readLW link
(docs.google.com)

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul Bricman12 Sep 2022 12:00 UTC
6 points
0 comments14 min readLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc Carauleanu1 Oct 2022 1:49 UTC
10 points
0 comments1 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
20 points
6 comments10 min readLW link

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael Glass5 Oct 2022 23:18 UTC
11 points
3 comments8 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos Ramirez27 Aug 2021 1:48 UTC
29 points
17 comments8 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

Ofer11 Aug 2020 17:30 UTC
15 points
0 comments2 min readLW link

What does it mean for an AGI to be ‘safe’?

So8res7 Oct 2022 4:13 UTC
72 points
29 comments3 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC
8 points
1 comment2 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford6 Sep 2022 17:17 UTC
6 points
0 comments4 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

12 Oct 2022 12:24 UTC
31 points
4 comments8 min readLW link
(www.gladstone.ai)

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul Bricman13 Oct 2022 12:36 UTC
13 points
8 comments7 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardt5 Sep 2022 7:55 UTC
18 points
13 comments2 min readLW link

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

14 Oct 2022 15:50 UTC
15 points
0 comments17 min readLW link
(www.gladstone.ai)

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio Franca11 Oct 2022 22:50 UTC
6 points
0 comments9 min readLW link

A gen­tle apoc­a­lypse

pchvykov16 Aug 2021 5:03 UTC
3 points
5 comments3 min readLW link

Nice­ness is unnatural

So8res13 Oct 2022 1:30 UTC
121 points
19 comments8 min readLW link

Greed Is the Root of This Evil

Thane Ruthenis13 Oct 2022 20:40 UTC
18 points
7 comments8 min readLW link

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks19 Dec 2019 3:00 UTC
130 points
18 comments62 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
88 points
7 comments9 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhub2 Sep 2022 21:57 UTC
39 points
13 comments3 min readLW link

[Question] How easy is it to su­per­vise pro­cesses vs out­comes?

Noosphere8918 Oct 2022 17:48 UTC
3 points
0 comments1 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubc2 Sep 2022 18:14 UTC
9 points
10 comments6 min readLW link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC
7 points
8 comments3 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworth30 Aug 2022 20:48 UTC
181 points
27 comments10 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni Angelou3 Sep 2022 12:47 UTC
9 points
0 comments3 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane Ruthenis29 Aug 2022 18:37 UTC
13 points
2 comments3 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh Englander27 Aug 2022 17:31 UTC
13 points
4 comments1 min readLW link

AI Re­searchers On AI Risk

Scott Alexander22 May 2015 11:16 UTC
19 points
0 comments16 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshev30 Oct 2022 16:57 UTC
9 points
0 comments10 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshev31 Oct 2022 17:03 UTC
7 points
4 comments14 min readLW link

AI as a Civ­i­liza­tional Risk Part 4/​6: Bioweapons and Philos­o­phy of Modification

PashaKamyshev1 Nov 2022 20:50 UTC
7 points
1 comment8 min readLW link

AI as a Civ­i­liza­tional Risk Part 5/​6: Re­la­tion­ship be­tween C-risk and X-risk

PashaKamyshev3 Nov 2022 2:19 UTC
2 points
0 comments7 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg Severens3 Jul 2021 21:23 UTC
4 points
0 comments1 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porby29 Oct 2022 22:16 UTC
112 points
4 comments4 min readLW link

My (naive) take on Risks from Learned Optimization

artkpv31 Oct 2022 10:59 UTC
7 points
0 comments5 min readLW link

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
125 points
23 comments4 min readLW link

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
73 points
4 comments25 min readLW link

a ca­sual in­tro to AI doom and alignment

Tamsin Leake1 Nov 2022 16:38 UTC
18 points
0 comments4 min readLW link
(carado.moe)

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

Quinn27 Jun 2021 17:44 UTC
17 points
6 comments2 min readLW link

Help Un­der­stand­ing Prefer­ences And Evil

Netcentrica27 Aug 2022 3:42 UTC
6 points
7 comments2 min readLW link

Why do we post our AI safety plans on the In­ter­net?

Peter S. Park3 Nov 2022 16:02 UTC
4 points
4 comments11 min readLW link

My sum­mary of “Prag­matic AI Safety”

Eleni Angelou5 Nov 2022 12:54 UTC
3 points
0 comments5 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence Phillips27 Aug 2022 0:06 UTC
24 points
3 comments2 min readLW link
(www.metaculus.com)

Loss of con­trol of AI is not a likely source of AI x-risk

squek7 Nov 2022 18:44 UTC
−6 points
0 comments5 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett Baker26 Aug 2022 18:26 UTC
120 points
48 comments1 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane Ruthenis15 Nov 2022 17:16 UTC
26 points
20 comments34 min readLW link

Is AI Gain-of-Func­tion re­search a thing?

MadHatter12 Nov 2022 2:33 UTC
9 points
2 comments2 min readLW link

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane Ruthenis26 Aug 2022 18:49 UTC
30 points
1 comment6 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

It Looks Like You’re Try­ing To Take Over The Narrative

George3d624 Aug 2022 13:36 UTC
3 points
20 comments9 min readLW link
(www.epistem.ink)

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

Netcentrica21 Aug 2022 22:01 UTC
5 points
2 comments5 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
63 points
11 comments7 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
185 points
9 comments8 min readLW link

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane Ruthenis24 Nov 2022 17:06 UTC
17 points
5 comments9 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_Armstrong13 Jan 2015 13:35 UTC
52 points
19 comments2 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks24 Dec 2017 18:52 UTC
41 points
5 comments23 min readLW link

And the AI would have got away with it too, if...

Stuart_Armstrong22 May 2019 21:35 UTC
75 points
7 comments1 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Steer­ing systems

Max H4 Apr 2023 0:56 UTC
43 points
1 comment15 min readLW link

ICA Simulacra

Ozyrus5 Apr 2023 6:41 UTC
26 points
2 comments7 min readLW link

Against sac­ri­fic­ing AI trans­parency for gen­er­al­ity gains

Ape in the coat7 May 2023 6:52 UTC
3 points
0 comments2 min readLW link

Su­per­in­tel­li­gence will out­smart us or it isn’t superintelligence

Neil 3 Apr 2023 15:01 UTC
−7 points
4 comments1 min readLW link

Do we have a plan for the “first crit­i­cal try” prob­lem?

Christopher King3 Apr 2023 16:27 UTC
−3 points
14 comments1 min readLW link

Towards em­pa­thy in RL agents and be­yond: In­sights from cog­ni­tive sci­ence for AI Align­ment

Marc Carauleanu3 Apr 2023 19:59 UTC
14 points
6 comments1 min readLW link
(clipchamp.com)

Strate­gies to Prevent AI Annihilation

lastchanceformankind4 Apr 2023 8:59 UTC
−2 points
0 comments4 min readLW link

[Question] Isn’t safe AGI im­pos­si­ble?

lefoenix5 Apr 2023 4:01 UTC
1 point
0 comments1 min readLW link

AGI de­ploy­ment as an act of aggression

dr_s5 Apr 2023 6:39 UTC
27 points
29 comments13 min readLW link

[Question] Daisy-chain­ing ep­silon-step verifiers

Decaeneus6 Apr 2023 2:07 UTC
2 points
0 comments1 min readLW link

One Does Not Sim­ply Re­place the Hu­mans

JerkyTreats6 Apr 2023 20:56 UTC
9 points
3 comments4 min readLW link
(www.lesswrong.com)

OpenAI: Our ap­proach to AI safety

g-w15 Apr 2023 20:26 UTC
1 point
1 comment1 min readLW link
(openai.com)

Willi­ams-Beuren Syn­drome: Frendly Mutations

Takk5 Apr 2023 20:59 UTC
−1 points
1 comment1 min readLW link

Yoshua Ben­gio: “Slow­ing down de­vel­op­ment of AI sys­tems pass­ing the Tur­ing test”

Roman Leventov6 Apr 2023 3:31 UTC
49 points
2 comments5 min readLW link
(yoshuabengio.org)

Risks from GPT-4 Byproduct of Re­cur­sively Op­ti­miz­ing AIs

ben hayum7 Apr 2023 0:02 UTC
72 points
9 comments10 min readLW link
(forum.effectivealtruism.org)

A decade of lurk­ing, a month of posting

Max H9 Apr 2023 0:21 UTC
70 points
4 comments5 min readLW link

Align­ment of Au­toGPT agents

Ozyrus12 Apr 2023 12:54 UTC
14 points
1 comment4 min readLW link

Why I’m not wor­ried about im­mi­nent doom

Ariel Kwiatkowski10 Apr 2023 15:31 UTC
7 points
1 comment4 min readLW link

Mea­sur­ing ar­tifi­cial in­tel­li­gence on hu­man bench­marks is naive

Anomalous11 Apr 2023 11:34 UTC
11 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

In fa­vor of ac­cel­er­at­ing prob­lems you’re try­ing to solve

Christopher King11 Apr 2023 18:15 UTC
2 points
2 comments4 min readLW link

AI Risk US Pres­i­den­tial Candidate

Simon Berens11 Apr 2023 19:31 UTC
5 points
3 comments1 min readLW link

Open-source LLMs may prove Bostrom’s vuln­er­a­ble world hypothesis

Roope Ahvenharju15 Apr 2023 19:16 UTC
1 point
1 comment1 min readLW link

Ar­tifi­cial In­tel­li­gence as exit strat­egy from the age of acute ex­is­ten­tial risk

Arturo Macias12 Apr 2023 14:48 UTC
−7 points
15 comments7 min readLW link

AGI goal space is big, but nar­row­ing might not be as hard as it seems.

Jacy Reese Anthis12 Apr 2023 19:03 UTC
15 points
0 comments3 min readLW link

Pol­lut­ing the agen­tic commons

hamandcheese13 Apr 2023 17:42 UTC
7 points
4 comments2 min readLW link
(www.secondbest.ca)

The Virus—Short Story

Michael Soareverix13 Apr 2023 18:18 UTC
4 points
0 comments4 min readLW link

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman Yen13 May 2023 18:38 UTC
4 points
1 comment9 min readLW link

Spec­u­la­tion on map­ping the moral land­scape for fu­ture Ai Alignment

Sven Heinz (Welwordion)16 Apr 2023 13:43 UTC
1 point
0 comments1 min readLW link

On ur­gency, pri­or­ity and col­lec­tive re­ac­tion to AI-Risks: Part I

Denreik16 Apr 2023 19:14 UTC
−10 points
15 comments5 min readLW link

AGI Clinics: A Safe Haven for Hu­man­ity’s First En­coun­ters with Superintelligence

portr.17 Apr 2023 1:52 UTC
−5 points
1 comment1 min readLW link

Defin­ing Boundaries on Out­comes

Takk7 Jun 2023 17:41 UTC
1 point
0 comments1 min readLW link

No, re­ally, it pre­dicts next to­kens.

simon18 Apr 2023 3:47 UTC
57 points
37 comments3 min readLW link

Pre­dic­tion: any un­con­trol­lable AI will turn earth into a gi­ant computer

Karl von Wendt17 Apr 2023 12:30 UTC
9 points
8 comments3 min readLW link

What is your timelines for ADI (ar­tifi­cial dis­em­pow­er­ing in­tel­li­gence)?

Christopher King17 Apr 2023 17:01 UTC
3 points
3 comments2 min readLW link

Green goo is plausible

anithite18 Apr 2023 0:04 UTC
56 points
29 comments4 min readLW link

World and Mind in Ar­tifi­cial In­tel­li­gence: ar­gu­ments against the AI pause

Arturo Macias18 Apr 2023 14:40 UTC
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

18 Apr 2023 18:44 UTC
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

I Believe I Know Why AI Models Hallucinate

Richard Aragon19 Apr 2023 21:07 UTC
−10 points
6 comments7 min readLW link
(turingssolutions.com)

[Cross­post] Or­ga­niz­ing a de­bate with ex­perts and MPs to raise AI xrisk aware­ness: a pos­si­ble blueprint

otto.barten19 Apr 2023 11:45 UTC
8 points
0 comments4 min readLW link
(forum.effectivealtruism.org)

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver Siegel19 Apr 2023 20:13 UTC
−1 points
0 comments1 min readLW link

[Question] Is there any liter­a­ture on us­ing so­cial­iza­tion for AI al­ign­ment?

Nathan112319 Apr 2023 22:16 UTC
10 points
9 comments2 min readLW link

How does AI Risk Affect the Si­mu­la­tion Hy­poth­e­sis?

amelia20 Apr 2023 3:16 UTC
6 points
9 comments2 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC
11 points
3 comments1 min readLW link
(github.com)

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher King20 Apr 2023 19:57 UTC
2 points
7 comments3 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubK21 Apr 2023 10:07 UTC
13 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarks22 Apr 2023 14:36 UTC
39 points
7 comments6 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dov23 Apr 2023 5:34 UTC
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Paths to failure

25 Apr 2023 8:03 UTC
29 points
1 comment8 min readLW link

A con­cise sum-up of the ba­sic ar­gu­ment for AI doom

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC
11 points
6 comments2 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne B27 Apr 2023 18:05 UTC
0 points
5 comments3 min readLW link

Mak­ing Nanobots isn’t a one-shot pro­cess, even for an ar­tifi­cial superintelligance

dankrad25 Apr 2023 0:39 UTC
20 points
13 comments6 min readLW link

My Assess­ment of the Chi­nese AI Safety Community

Lao Mein25 Apr 2023 4:21 UTC
239 points
92 comments3 min readLW link

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 14:47 UTC
48 points
2 comments2 min readLW link

Free­dom Is All We Need

Leo Glisic27 Apr 2023 0:09 UTC
−1 points
8 comments10 min readLW link

What’s in a Name? Are you re­ally an “AI Pes­simist”?

amelia23 Oct 2023 22:41 UTC
−2 points
4 comments3 min readLW link

Hal­loween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC
−10 points
1 comment1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.barten24 Oct 2023 10:11 UTC
17 points
1 comment1 min readLW link

[Question] What if AGI had its own uni­verse to maybe wreck?

mseale26 Oct 2023 17:49 UTC
−1 points
2 comments1 min readLW link

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_c25 Oct 2023 23:46 UTC
112 points
33 comments22 min readLW link
(www.navigatingrisks.ai)

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal Naran26 Oct 2023 17:54 UTC
−7 points
2 comments2 min readLW link

[un­ti­tled post]

NeuralSystem_e5e127 Apr 2023 17:37 UTC
3 points
0 comments1 min readLW link

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew Wearden30 Oct 2023 17:27 UTC
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

Char­bel-Raphaël and Lu­cius dis­cuss Interpretability

30 Oct 2023 5:50 UTC
102 points
7 comments21 min readLW link

An In­ter­na­tional Man­hat­tan Pro­ject for Ar­tifi­cial Intelligence

Glenn Clayton27 Apr 2023 17:34 UTC
−9 points
2 comments5 min readLW link

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC
−19 points
11 comments2 min readLW link

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbance2 Nov 2023 17:43 UTC
78 points
90 comments5 min readLW link

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
243 points
40 comments3 min readLW link

AI as Su­per-Demagogue

RationalDino5 Nov 2023 21:21 UTC
−2 points
9 comments9 min readLW link

Sym­biotic self-al­ign­ment of AIs.

Spiritus Dei7 Nov 2023 17:18 UTC
1 point
0 comments3 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
35 points
2 comments2 min readLW link
(arxiv.org)

The So­cial Align­ment Problem

irving28 Apr 2023 14:16 UTC
96 points
13 comments8 min readLW link

Do you want a first-prin­ci­pled pre­pared­ness guide to pre­pare your­self and loved ones for po­ten­tial catas­tro­phes?

Ulrik Horn14 Nov 2023 12:13 UTC
9 points
5 comments15 min readLW link

[Question] Real­is­tic near-fu­ture sce­nar­ios of AI doom un­der­stand­able for non-techy peo­ple?

RomanS28 Apr 2023 14:45 UTC
4 points
4 comments1 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir Kumar16 Nov 2023 2:02 UTC
1 point
0 comments1 min readLW link

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg Weiss16 Nov 2023 18:46 UTC
11 points
2 comments29 min readLW link

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMoshe17 Nov 2023 11:14 UTC
23 points
4 comments3 min readLW link

Killswitch

Junio18 Nov 2023 22:53 UTC
1 point
0 comments3 min readLW link

Ilya: The AI sci­en­tist shap­ing the world

David Varga20 Nov 2023 13:09 UTC
11 points
0 comments4 min readLW link

A Guide to Fore­cast­ing AI Science Ca­pa­bil­ities

Eleni Angelou29 Apr 2023 23:24 UTC
6 points
1 comment4 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDAS25 Nov 2023 2:01 UTC
17 points
6 comments1 min readLW link

AISC 2024 - Pro­ject Summaries

NickyP27 Nov 2023 22:32 UTC
39 points
3 comments18 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreig29 Nov 2023 13:59 UTC
4 points
0 comments5 min readLW link

Sup­port me in a Week-Long Pick­et­ing Cam­paign Near OpenAI’s HQ: Seek­ing Sup­port and Ideas from the LessWrong Community

Percy30 Apr 2023 17:48 UTC
−26 points
15 comments1 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC
163 points
42 comments13 min readLW link

The benefits and risks of op­ti­mism (about AI safety)

Karl von Wendt3 Dec 2023 12:45 UTC
−5 points
4 comments5 min readLW link

A call for a quan­ti­ta­tive re­port card for AI bioter­ror­ism threat models

Juno4 Dec 2023 6:35 UTC
3 points
0 comments10 min readLW link

[Question] Ac­cu­racy of ar­gu­ments that are seen as ridicu­lous and in­tu­itively false but don’t have good counter-arguments

Christopher King29 Apr 2023 23:58 UTC
30 points
39 comments1 min readLW link

Call for sub­mis­sions: Choice of Fu­tures sur­vey questions

c.trout30 Apr 2023 6:59 UTC
4 points
0 comments2 min readLW link
(airtable.com)

[Question] Does agency nec­es­sar­ily im­ply self-preser­va­tion in­stinct?

Mislav Jurić1 May 2023 16:06 UTC
5 points
8 comments1 min readLW link

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

1 May 2023 16:47 UTC
93 points
10 comments30 min readLW link

AI Safety Newslet­ter #4: AI and Cy­ber­se­cu­rity, Per­sua­sive AIs, Weaponiza­tion, and Ge­offrey Hin­ton talks AI risks

2 May 2023 18:41 UTC
32 points
0 comments5 min readLW link
(newsletter.safe.ai)

Avert­ing Catas­tro­phe: De­ci­sion The­ory for COVID-19, Cli­mate Change, and Po­ten­tial Disasters of All Kinds

JakubK2 May 2023 22:50 UTC
10 points
0 comments1 min readLW link

Reg­u­late or Com­pete? The China Fac­tor in U.S. AI Policy (NAIR #2)

charles_m5 May 2023 17:43 UTC
2 points
1 comment7 min readLW link
(navigatingairisks.substack.com)

But What If We Ac­tu­ally Want To Max­i­mize Paper­clips?

snerx25 May 2023 7:13 UTC
−17 points
6 comments7 min readLW link

For­mal­iz­ing the “AI x-risk is un­likely be­cause it is ridicu­lous” argument

Christopher King3 May 2023 18:56 UTC
40 points
17 comments3 min readLW link

We don’t need AGI for an amaz­ing future

Karl von Wendt4 May 2023 12:10 UTC
18 points
32 comments5 min readLW link

White House An­nounces “New Ac­tions to Pro­mote Re­spon­si­ble AI In­no­va­tion”

lberglund4 May 2023 12:15 UTC
54 points
18 comments3 min readLW link
(www.whitehouse.gov)

[Question] Why not use ac­tive SETI to pre­vent AI Doom?

RomanS5 May 2023 14:41 UTC
13 points
13 comments1 min readLW link

CHAT Di­plo­macy: LLMs and Na­tional Security

JohnBuridan5 May 2023 19:45 UTC
25 points
5 comments7 min readLW link

Is “red” for GPT-4 the same as “red” for you?

Yusuke Hayashi6 May 2023 17:55 UTC
9 points
6 comments2 min readLW link

Oh, Think of the Bananas

Jeffs1 Jun 2023 6:46 UTC
3 points
0 comments2 min readLW link

TED talk by Eliezer Yud­kowsky: Un­leash­ing the Power of Ar­tifi­cial Intelligence

bayesed7 May 2023 5:45 UTC
49 points
36 comments1 min readLW link
(www.youtube.com)

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
2 comments7 min readLW link
(yoshuabengio.org)

H-JEPA might be tech­ni­cally al­ignable in a mod­ified form

Roman Leventov8 May 2023 23:04 UTC
12 points
2 comments7 min readLW link

[Question] How much of a con­cern are open-source LLMs in the short, medium and long terms?

JavierCC10 May 2023 9:14 UTC
5 points
0 comments1 min readLW link

AGI-Au­to­mated In­ter­pretabil­ity is Suicide

__RicG__10 May 2023 14:20 UTC
22 points
33 comments7 min readLW link

[Question] Is “brit­tle al­ign­ment” good enough?

the8thbit23 May 2023 17:35 UTC
9 points
5 comments3 min readLW link

[Question] AI in­ter­pretabil­ity could be harm­ful?

Roman Leventov10 May 2023 20:43 UTC
13 points
2 comments1 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo Putera11 May 2023 4:16 UTC
11 points
1 comment3 min readLW link

A more grounded idea of AI risk

Iknownothing11 May 2023 9:48 UTC
3 points
4 comments1 min readLW link

Separat­ing the “con­trol prob­lem” from the “al­ign­ment prob­lem”

Yi-Yang11 May 2023 9:41 UTC
10 points
0 comments4 min readLW link

Align­ment, Goals, and The Gut-Head Gap: A Re­view of Ngo. et al.

Violet Hour11 May 2023 18:06 UTC
18 points
2 comments13 min readLW link

[Question] Term/​Cat­e­gory for AI with Neu­tral Im­pact?

isomic11 May 2023 22:00 UTC
6 points
1 comment1 min readLW link

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver Sourbut15 May 2023 13:23 UTC
26 points
10 comments12 min readLW link
(www.oliversourbut.net)

For­mu­lat­ing the AI Doom Ar­gu­ment for An­a­lytic Philosophers

JonathanErhardt12 May 2023 7:54 UTC
13 points
0 comments2 min readLW link

The way AGI wins could look very stupid

Christopher King12 May 2023 16:34 UTC
42 points
22 comments1 min readLW link

PCAST Work­ing Group on Gen­er­a­tive AI In­vites Public Input

Christopher King13 May 2023 22:49 UTC
7 points
0 comments1 min readLW link
(terrytao.wordpress.com)

Co­or­di­na­tion by com­mon knowl­edge to pre­vent un­con­trol­lable AI

Karl von Wendt14 May 2023 13:37 UTC
10 points
2 comments9 min readLW link

[Question] What pro­jects and efforts are there to pro­mote AI safety re­search?

Christopher King24 May 2023 0:33 UTC
4 points
0 comments1 min readLW link

AI Risk & Policy Fore­casts from Me­tac­u­lus & FLI’s AI Path­ways Workshop

_will_16 May 2023 18:06 UTC
11 points
4 comments8 min readLW link

Tyler Cowen’s challenge to de­velop an ‘ac­tual math­e­mat­i­cal model’ for AI X-Risk

Joe Brenton16 May 2023 11:57 UTC
5 points
4 comments1 min readLW link

GPT as an “In­tel­li­gence Fork­lift.”

boazbarak19 May 2023 21:15 UTC
46 points
27 comments3 min readLW link

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher King16 May 2023 15:18 UTC
22 points
6 comments2 min readLW link

[un­ti­tled post]

[Error communicating with LW2 server]20 May 2023 3:08 UTC
1 point
0 comments1 min readLW link

Con­fu­sions and up­dates on STEM AI

Eleni Angelou19 May 2023 21:34 UTC
21 points
0 comments3 min readLW link

A&I (Rihanna ‘S&M’ par­ody lyrics)

nahoj21 May 2023 22:34 UTC
−3 points
0 comments2 min readLW link

We Shouldn’t Ex­pect AI to Ever be Fully Rational

OneManyNone18 May 2023 17:09 UTC
19 points
31 comments6 min readLW link

[Cross­post] A re­cent write-up of the case for AI (ex­is­ten­tial) risk

Timsey18 May 2023 13:13 UTC
6 points
0 comments19 min readLW link

The Po­lar­ity Prob­lem [Draft]

23 May 2023 21:05 UTC
24 points
3 comments44 min readLW link

A flaw in the A.G.I. Ruin Argument

Cole Wyeth19 May 2023 19:40 UTC
0 points
6 comments3 min readLW link
(colewyeth.com)

Yoshua Ben­gio: How Rogue AIs may Arise

harfe23 May 2023 18:28 UTC
87 points
12 comments18 min readLW link
(yoshuabengio.org)

A re­jec­tion of the Orthog­o­nal­ity Thesis

ArisC24 May 2023 16:37 UTC
−2 points
11 comments2 min readLW link
(medium.com)

Two ideas for al­ign­ment, per­pet­ual mu­tual dis­trust and induction

APaleBlueDot25 May 2023 0:56 UTC
1 point
2 comments4 min readLW link

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

Snorkelfarsan25 May 2023 16:30 UTC
2 points
0 comments25 min readLW link

Deep­Mind: Model eval­u­a­tion for ex­treme risks

Zach Stein-Perlman25 May 2023 3:00 UTC
94 points
11 comments1 min readLW link
(arxiv.org)

Align­ing an H-JEPA agent via train­ing on the out­puts of an LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:08 UTC
12 points
10 comments30 min readLW link

An LLM-based “ex­em­plary ac­tor”

Roman Leventov29 May 2023 11:12 UTC
16 points
0 comments12 min readLW link

[Question] Why is vi­o­lence against AI labs a taboo?

ArisC26 May 2023 8:00 UTC
−21 points
63 comments1 min readLW link

[Question] What’s your view­point on the like­li­hood of GPT-5 be­ing able to au­tonomously cre­ate, train, and im­ple­ment an AI su­pe­rior to GPT-5?

Super AGI26 May 2023 1:43 UTC
6 points
15 comments1 min readLW link

In­fer­ence from a Math­e­mat­i­cal De­scrip­tion of an Ex­ist­ing Align­ment Re­search: a pro­posal for an outer al­ign­ment re­search program

Christopher King2 Jun 2023 21:54 UTC
7 points
4 comments16 min readLW link

AI X-risk is a pos­si­ble solu­tion to the Fermi Paradox

magic9mushroom30 May 2023 17:42 UTC
10 points
10 comments2 min readLW link

Hands of gods

Anders L28 May 2023 15:15 UTC
1 point
0 comments9 min readLW link
(woodfromeden.substack.com)

Pro­posed Align­ment Tech­nique: OSNR (Out­put San­i­ti­za­tion via Nois­ing and Re­con­struc­tion) for Safer Usage of Po­ten­tially Misal­igned AGI

sudo29 May 2023 1:35 UTC
14 points
9 comments6 min readLW link

Without a tra­jec­tory change, the de­vel­op­ment of AGI is likely to go badly

Max H29 May 2023 23:42 UTC
16 points
2 comments13 min readLW link

On the Im­pos­si­bil­ity of In­tel­li­gent Paper­clip Maximizers

Michael Simkin29 May 2023 16:55 UTC
−17 points
5 comments4 min readLW link

Win­ners-take-how-much?

YonatanK29 May 2023 21:56 UTC
1 point
2 comments3 min readLW link

An Anal­y­sis of the ‘Digi­tal Gaia’ Pro­posal from a Safety Perspective

lukemarks31 May 2023 12:21 UTC
8 points
1 comment4 min readLW link

Limit­ing fac­tors to pre­dict AI take-off speed

Alfonso Pérez Escudero31 May 2023 23:19 UTC
1 point
0 comments6 min readLW link

A Dou­ble-Fea­ture on The Extropians

Maxwell Tabarrok3 Jun 2023 18:27 UTC
58 points
4 comments1 min readLW link

In­trin­sic vs. Ex­trin­sic Alignment

Alfonso Pérez Escudero1 Jun 2023 1:06 UTC
1 point
1 comment3 min readLW link

How will they feed us

meijer19731 Jun 2023 8:49 UTC
4 points
3 comments5 min readLW link

Un­pre­dictabil­ity and the In­creas­ing Difficulty of AI Align­ment for In­creas­ingly In­tel­li­gent AI

Max_He-Ho31 May 2023 22:25 UTC
5 points
2 comments20 min readLW link

The un­spo­ken but ridicu­lous as­sump­tion of AI doom: the hid­den doom assumption

Christopher King1 Jun 2023 17:01 UTC
−9 points
1 comment3 min readLW link

Open Source LLMs Can Now Ac­tively Lie

Josh Levy1 Jun 2023 22:03 UTC
4 points
0 comments3 min readLW link

Pro­posal: labs should pre­com­mit to paus­ing if an AI ar­gues for it­self to be improved

NickGabs2 Jun 2023 22:31 UTC
3 points
3 comments4 min readLW link

[FICTION] Prometheus Ris­ing: The Emer­gence of an AI Consciousness

Super AGI10 Jun 2023 4:41 UTC
−13 points
0 comments9 min readLW link

An­drew Ng wants to have a con­ver­sa­tion about ex­tinc­tion risk from AI

Leon Lang5 Jun 2023 22:29 UTC
32 points
2 comments1 min readLW link
(twitter.com)

The (lo­cal) unit of in­tel­li­gence is FLOPs

boazbarak5 Jun 2023 18:23 UTC
40 points
7 comments5 min readLW link

Non-loss of con­trol AGI-re­lated catas­tro­phes are out of con­trol too

12 Jun 2023 12:01 UTC
0 points
3 comments24 min readLW link

A Play­book for AI Risk Re­duc­tion (fo­cused on mis­al­igned AI)

HoldenKarnofsky6 Jun 2023 18:05 UTC
89 points
41 comments14 min readLW link

Us­ing Con­sen­sus Mechanisms as an ap­proach to Alignment

Prometheus10 Jun 2023 23:38 UTC
9 points
2 comments6 min readLW link

Agen­tic Mess (A Failure Story)

6 Jun 2023 13:09 UTC
44 points
5 comments13 min readLW link

Man­i­fold Pre­dicted the AI Ex­tinc­tion State­ment and CAIS Wanted it Deleted

David Chee12 Jun 2023 15:54 UTC
70 points
14 comments12 min readLW link

Cur­rent AI harms are also sci-fi

Christopher King8 Jun 2023 17:49 UTC
26 points
3 comments1 min readLW link

[FICTION] Un­box­ing Ely­sium: An AI’S Escape

Super AGI10 Jun 2023 4:41 UTC
−14 points
4 comments14 min readLW link

Why AI may not save the World

Alberto Zannoni9 Jun 2023 17:42 UTC
0 points
0 comments4 min readLW link
(a16z.com)

[Question] AI Rights: In your view, what would be re­quired for an AGI to gain rights and pro­tec­tions from the var­i­ous Govern­ments of the World?

Super AGI9 Jun 2023 1:24 UTC
13 points
22 comments1 min readLW link

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

12 Jun 2023 17:55 UTC
67 points
6 comments4 min readLW link

Aligned Ob­jec­tives Prize Competition

Prometheus15 Jun 2023 12:42 UTC
8 points
0 comments2 min readLW link
(app.impactmarkets.io)

Light­ning Post: Things peo­ple in AI Safety should stop talk­ing about

Prometheus20 Jun 2023 15:00 UTC
23 points
6 comments2 min readLW link

A Friendly Face (Another Failure Story)

20 Jun 2023 10:31 UTC
65 points
21 comments16 min readLW link

Ex­plor­ing Last-Re­sort Mea­sures for AI Align­ment: Hu­man­ity’s Ex­tinc­tion Switch

0xPetra23 Jun 2023 17:01 UTC
7 points
0 comments2 min readLW link

I just watched don’t look up.

ATheCoder23 Jun 2023 21:22 UTC
0 points
5 comments2 min readLW link

AI-Plans.com—a con­tributable compendium

Iknownothing25 Jun 2023 14:40 UTC
39 points
7 comments4 min readLW link
(ai-plans.com)

Cheat sheet of AI X-risk

amaury lorin29 Jun 2023 4:28 UTC
18 points
1 comment7 min readLW link

Challenge pro­posal: small­est pos­si­ble self-hard­en­ing back­door for RLHF

Christopher King29 Jun 2023 16:56 UTC
7 points
0 comments2 min readLW link

Biosafety Reg­u­la­tions (BMBL) and their rele­vance for AI

Štěpán Los29 Jun 2023 19:22 UTC
4 points
0 comments4 min readLW link

Me­taphors for AI, and why I don’t like them

boazbarak28 Jun 2023 22:47 UTC
32 points
18 comments12 min readLW link

AGI & War

Calecute29 Jun 2023 22:20 UTC
9 points
1 comment1 min readLW link

Ge­orge Hotz on AI safety: ~”cen­tral­ized power is bad”

Chipmonk30 Jun 2023 5:00 UTC
14 points
5 comments1 min readLW link
(www.youtube.com)

AI In­ci­dent Shar­ing—Best prac­tices from other fields and a com­pre­hen­sive list of ex­ist­ing platforms

Štěpán Los28 Jun 2023 17:21 UTC
20 points
0 comments4 min readLW link

AI Safety with­out Align­ment: How hu­mans can WIN against AI

vicchain29 Jun 2023 17:53 UTC
1 point
1 comment2 min readLW link

Quan­ti­ta­tive cruxes in Alignment

Martín Soto2 Jul 2023 20:38 UTC
20 points
0 comments23 min readLW link

Sources of ev­i­dence in Alignment

Martín Soto2 Jul 2023 20:38 UTC
20 points
0 comments11 min readLW link

An AGI kill switch with defined se­cu­rity properties

Peterpiper5 Jul 2023 17:40 UTC
−5 points
6 comments1 min readLW link

How I Learned To Stop Wor­ry­ing And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC
10 points
9 comments5 min readLW link

OpenAI Launches Su­per­al­ign­ment Taskforce

Zvi11 Jul 2023 13:00 UTC
140 points
38 comments49 min readLW link
(thezvi.wordpress.com)

Do you feel that AGI Align­ment could be achieved in a Type 0 civ­i­liza­tion?

Super AGI6 Jul 2023 4:52 UTC
−2 points
1 comment1 min readLW link

An Overview of AI risks—the Flyer

17 Jul 2023 12:03 UTC
20 points
0 comments1 min readLW link
(docs.google.com)

ACI#4: Seed AI is the new Per­pet­ual Mo­tion Machine

Akira Pyinya8 Jul 2023 1:17 UTC
−7 points
0 comments6 min readLW link

An­nounc­ing AI Align­ment work­shop at the ALIFE 2023 conference

rorygreig8 Jul 2023 13:52 UTC
11 points
0 comments1 min readLW link
(humanvaluesandartificialagency.com)

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric Drexler10 Jul 2023 13:42 UTC
116 points
8 comments12 min readLW link

Gear­ing Up for Long Timelines in a Hard World

Dalcy14 Jul 2023 6:11 UTC
11 points
0 comments4 min readLW link

[Question] What crite­rion would you use to se­lect com­pa­nies likely to cause AI doom?

amaury lorin13 Jul 2023 20:31 UTC
8 points
4 comments1 min readLW link

Why was the AI Align­ment com­mu­nity so un­pre­pared for this mo­ment?

Ras151315 Jul 2023 0:26 UTC
118 points
64 comments2 min readLW link

Sim­ple al­ign­ment plan that maybe works

Iknownothing18 Jul 2023 22:48 UTC
4 points
8 comments1 min readLW link

Ru­n­away Op­ti­miz­ers in Mind Space

silentbob16 Jul 2023 14:26 UTC
16 points
0 comments12 min readLW link

Quick Thoughts on Lan­guage Models

RohanS18 Jul 2023 20:38 UTC
6 points
0 comments4 min readLW link

[Question] Why don’t we con­sider large forms of so­cial or­ga­ni­za­tion (eco­nomic and poli­ti­cal forms, in par­tic­u­lar) to qual­ify as AGI and Trans­for­ma­tive AI?

T L16 Jul 2023 18:54 UTC
1 point
0 comments2 min readLW link

Proof of pos­te­ri­or­ity: a defense against AI-gen­er­ated misinformation

jchan17 Jul 2023 12:04 UTC
31 points
3 comments5 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.barten24 Jul 2023 10:07 UTC
12 points
0 comments7 min readLW link
(time.com)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2023]

smallsilo20 Jul 2023 20:20 UTC
38 points
42 comments2 min readLW link
(forum.effectivealtruism.org)

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

Justausername23 Jul 2023 16:08 UTC
4 points
1 comment3 min readLW link

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

Justausername25 Jul 2023 10:25 UTC
−9 points
0 comments4 min readLW link

A re­sponse to the Richards et al.’s “The Illu­sion of AI’s Ex­is­ten­tial Risk”

Harrison Fell26 Jul 2023 17:34 UTC
1 point
0 comments10 min readLW link

[Question] Have you ever con­sid­ered tak­ing the ‘Tur­ing Test’ your­self?

Super AGI27 Jul 2023 3:48 UTC
2 points
6 comments1 min readLW link

Eval­u­at­ing Su­per­hu­man Models with Con­sis­tency Checks

1 Aug 2023 7:51 UTC
14 points
2 comments9 min readLW link
(arxiv.org)

3 lev­els of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC
69 points
14 comments7 min readLW link

4 types of AGI se­lec­tion, and how to con­strain them

Remmelt8 Aug 2023 10:02 UTC
−4 points
3 comments3 min readLW link

Embed­ding Eth­i­cal Pri­ors into AI Sys­tems: A Bayesian Approach

Justausername3 Aug 2023 15:31 UTC
−5 points
3 comments21 min readLW link

Ilya Sutskever’s thoughts on AI safety (July 2023): a tran­script with my comments

mishka10 Aug 2023 19:07 UTC
21 points
3 comments5 min readLW link

Some al­ign­ment ideas

SelonNerias10 Aug 2023 17:51 UTC
1 point
0 comments11 min readLW link

Seek­ing In­put to AI Safety Book for non-tech­ni­cal audience

Darren McKee10 Aug 2023 17:58 UTC
10 points
4 comments1 min readLW link

Ex­is­ten­tially rele­vant thought ex­per­i­ment: To kill or not to kill, a sniper, a man and a but­ton.

AlexFromSafeTransition14 Aug 2023 10:53 UTC
−18 points
6 comments4 min readLW link

We Should Pre­pare for a Larger Rep­re­sen­ta­tion of Academia in AI Safety

Leon Lang13 Aug 2023 18:03 UTC
89 points
13 comments5 min readLW link

De­com­pos­ing in­de­pen­dent gen­er­al­iza­tions in neu­ral net­works via Hes­sian analysis

14 Aug 2023 17:04 UTC
80 points
3 comments1 min readLW link

AISN #19: US-China Com­pe­ti­tion on AI Chips, Mea­sur­ing Lan­guage Agent Devel­op­ments, Eco­nomic Anal­y­sis of Lan­guage Model Pro­pa­ganda, and White House AI Cy­ber Challenge

15 Aug 2023 16:10 UTC
21 points
0 comments5 min readLW link
(newsletter.safe.ai)

Ideas for im­prov­ing epistemics in AI safety outreach

mic21 Aug 2023 19:55 UTC
64 points
6 comments3 min readLW link

Mesa-Op­ti­miza­tion: Ex­plain it like I’m 10 Edition

brook26 Aug 2023 23:04 UTC
20 points
1 comment6 min readLW link

En­hanc­ing Cor­rigi­bil­ity in AI Sys­tems through Ro­bust Feed­back Loops

Justausername24 Aug 2023 3:53 UTC
1 point
0 comments6 min readLW link

Reflec­tions on “Mak­ing the Atomic Bomb”

boazbarak17 Aug 2023 2:48 UTC
50 points
7 comments8 min readLW link

AI Reg­u­la­tion May Be More Im­por­tant Than AI Align­ment For Ex­is­ten­tial Safety

otto.barten24 Aug 2023 11:41 UTC
59 points
38 comments5 min readLW link

When dis­cussing AI doom bar­ri­ers pro­pose spe­cific plau­si­ble scenarios

anithite18 Aug 2023 4:06 UTC
5 points
0 comments3 min readLW link

2 un­usual rea­sons for why we can avoid be­ing turned into paperclips

Artem Panush19 Aug 2023 10:28 UTC
1 point
0 comments4 min readLW link

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

Util19 Aug 2023 14:16 UTC
3 points
1 comment1 min readLW link

Will AI kill ev­ery­one? Here’s what the god­fathers of AI have to say [RA video]

Writer19 Aug 2023 17:29 UTC
56 points
8 comments1 min readLW link
(youtu.be)

Re­port on Fron­tier Model Training

YafahEdelman30 Aug 2023 20:02 UTC
119 points
18 comments21 min readLW link
(docs.google.com)

Ram­ble on STUFF: in­tel­li­gence, simu­la­tion, AI, doom, de­fault mode, the usual

Bill Benzon26 Aug 2023 15:49 UTC
5 points
0 comments4 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC
24 points
15 comments6 min readLW link

A Let­ter to the Edi­tor of MIT Tech­nol­ogy Review

Jeffs30 Aug 2023 16:59 UTC
0 points
0 comments1 min readLW link

The Epistemic Author­ity of Deep Learn­ing Pioneers

Dylan Bowman29 Aug 2023 18:14 UTC
8 points
2 comments3 min readLW link

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

29 Aug 2023 15:07 UTC
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

Video es­say: How Will We Know When AI is Con­scious?

JanPro6 Sep 2023 18:10 UTC
11 points
7 comments1 min readLW link
(www.youtube.com)

AISN #21: Google Deep­Mind’s GPT-4 Com­peti­tor, Mili­tary In­vest­ments in Au­tonomous Drones, The UK AI Safety Sum­mit, and Case Stud­ies in AI Policy

5 Sep 2023 15:03 UTC
15 points
0 comments5 min readLW link
(newsletter.safe.ai)

[Question] What is to be done? (About the profit mo­tive)

Connor Barber8 Sep 2023 19:27 UTC
1 point
21 comments1 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 8

9 Sep 2023 16:34 UTC
28 points
0 comments13 min readLW link

A con­ver­sa­tion with Pi, a con­ver­sa­tional AI.

Spiritus Dei15 Sep 2023 23:13 UTC
1 point
0 comments1 min readLW link

Jimmy Ap­ples, source of the ru­mor that OpenAI has achieved AGI in­ter­nally, is a cred­ible in­sider.

Jorterder28 Sep 2023 1:20 UTC
−6 points
2 comments1 min readLW link
(twitter.com)

MMLU’s Mo­ral Sce­nar­ios Bench­mark Doesn’t Mea­sure What You Think it Measures

corey morris27 Sep 2023 17:54 UTC
14 points
2 comments4 min readLW link
(medium.com)

Tech­ni­cal AI Safety Re­search Land­scape [Slides]

Magdalena Wache18 Sep 2023 13:56 UTC
38 points
0 comments4 min readLW link

Im­mor­tal­ity or death by AGI

ImmortalityOrDeathByAGI21 Sep 2023 23:59 UTC
46 points
30 comments4 min readLW link
(forum.effectivealtruism.org)

[Linkpost] Mark Zucker­berg con­fronted about Meta’s Llama 2 AI’s abil­ity to give users de­tailed guidance on mak­ing an­thrax—Busi­ness Insider

mic26 Sep 2023 12:05 UTC
18 points
11 comments2 min readLW link
(www.businessinsider.com)

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
144 points
50 comments1 min readLW link
(titotal.substack.com)

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre Peigné23 Sep 2023 16:21 UTC
29 points
8 comments5 min readLW link

Linkpost: Are Emer­gent Abil­ities in Large Lan­guage Models just In-Con­text Learn­ing?

Erich_Grunewald8 Oct 2023 12:14 UTC
11 points
1 comment2 min readLW link
(arxiv.org)

Paper: Iden­ti­fy­ing the Risks of LM Agents with an LM-Emu­lated Sand­box—Univer­sity of Toronto 2023 - Bench­mark con­sist­ing of 36 high-stakes tools and 144 test cases!

Singularian25019 Oct 2023 0:00 UTC
5 points
0 comments1 min readLW link

A thought ex­per­i­ment to help per­suade skep­tics that power-seek­ing AI is plausible

jacobcd5225 Nov 2023 23:26 UTC
1 point
3 comments5 min readLW link

Ideation and Tra­jec­tory Model­ling in Lan­guage Models

NickyP5 Oct 2023 19:21 UTC
14 points
2 comments10 min readLW link

The Gra­di­ent – The Ar­tifi­cial­ity of Alignment

mic8 Oct 2023 4:06 UTC
12 points
1 comment5 min readLW link
(thegradient.pub)

Be­come a PIBBSS Re­search Affiliate

10 Oct 2023 7:41 UTC
28 points
6 comments6 min readLW link

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
138 points
28 comments14 min readLW link

Back to the Past to the Future

Prometheus18 Oct 2023 16:51 UTC
5 points
0 comments1 min readLW link

Tax­on­omy of AI-risk counterarguments

Odd anon16 Oct 2023 0:12 UTC
61 points
13 comments8 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023 6:05 UTC
36 points
0 comments15 min readLW link

Ac­cess to AI: a hu­man right?

dmtea25 Jul 2020 9:38 UTC
5 points
3 comments2 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
112 points
9 comments56 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog8 Jul 2013 14:33 UTC
17 points
8 comments1 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog13 Nov 2012 21:56 UTC
63 points
92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepa1 Apr 2018 1:22 UTC
46 points
14 comments2 min readLW link

When is un­al­igned AI morally valuable?

paulfchristiano25 May 2018 1:57 UTC
73 points
53 comments10 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

29 Oct 2018 21:07 UTC
86 points
8 comments6 min readLW link

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTrout3 Jun 2018 3:16 UTC
76 points
7 comments8 min readLW link

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooen9 Nov 2018 20:57 UTC
70 points
9 comments4 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei Dai12 Jan 2019 2:19 UTC
100 points
22 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei Dai17 Aug 2019 17:38 UTC
76 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
89 points
24 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

null30 Apr 2018 13:23 UTC
35 points
10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_Armstrong22 May 2019 21:35 UTC
75 points
7 comments1 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks24 Dec 2017 18:52 UTC
41 points
5 comments23 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
88 points
7 comments9 min readLW link

I Vouch For MIRI

Zvi17 Dec 2017 17:50 UTC
37 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_it18 Jan 2018 15:07 UTC
39 points
10 comments1 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi12 Mar 2018 12:10 UTC
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
68 points
38 comments2 min readLW link

A rant against robots

Lê Nguyên Hoang14 Jan 2020 22:03 UTC
65 points
7 comments5 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex Flint31 Mar 2018 18:37 UTC
30 points
3 comments11 min readLW link

Course recom­men­da­tions for Friendli­ness researchers

Louie9 Jan 2013 14:33 UTC
96 points
112 comments10 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC
29 points
11 comments8 min readLW link

AI Sum­mer Fel­lows Program

colm21 Mar 2018 15:32 UTC
21 points
0 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
120 points
495 comments8 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC
62 points
16 comments5 min readLW link

The sim­ple pic­ture on AI safety

Alex Flint27 May 2018 19:43 UTC
31 points
10 comments2 min readLW link

Elon Musk donates $10M to the Fu­ture of Life In­sti­tute to keep AI benefi­cial

Paul Crowley15 Jan 2015 16:33 UTC
78 points
52 comments1 min readLW link

Strate­gic im­pli­ca­tions of AIs’ abil­ity to co­or­di­nate at low cost, for ex­am­ple by merging

Wei Dai25 Apr 2019 5:08 UTC
67 points
46 comments2 min readLW link1 review

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana Kumar21 Jun 2019 12:50 UTC
43 points
6 comments1 min readLW link
(arxiv.org)

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogs15 May 2018 20:11 UTC
17 points
12 comments1 min readLW link
(www.theatlantic.com)

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_Armstrong10 Oct 2017 11:06 UTC
23 points
8 comments1 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]23 Jul 2014 0:24 UTC
62 points
28 comments3 min readLW link

AI Align­ment Re­search Overview (by Ja­cob Stein­hardt)

Ben Pace6 Nov 2019 19:24 UTC
44 points
0 comments7 min readLW link
(docs.google.com)

A gen­eral model of safety-ori­ented AI development

Wei Dai11 Jun 2018 21:00 UTC
65 points
8 comments1 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei Dai10 Sep 2019 8:29 UTC
48 points
26 comments3 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong7 Apr 2014 11:00 UTC
76 points
418 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_Armstrong29 Jun 2015 20:41 UTC
68 points
45 comments2 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergal31 Oct 2019 12:10 UTC
40 points
58 comments1 min readLW link
(aiimpacts.org)

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

The Mag­ni­tude of His Own Folly

Eliezer Yudkowsky30 Sep 2008 11:31 UTC
95 points
127 comments6 min readLW link

AI al­ign­ment landscape

paulfchristiano13 Oct 2019 2:10 UTC
40 points
3 comments1 min readLW link
(ai-alignment.com)

Launched: Friend­ship is Optimal

iceman15 Nov 2012 4:57 UTC
74 points
32 comments1 min readLW link

Friend­ship is Op­ti­mal: A My Lit­tle Pony fan­fic about an op­ti­miza­tion process

iceman8 Sep 2012 6:16 UTC
104 points
151 comments1 min readLW link

Do Earths with slower eco­nomic growth have a bet­ter chance at FAI?

Eliezer Yudkowsky12 Jun 2013 19:54 UTC
58 points
175 comments4 min readLW link

Idea: Open Ac­cess AI Safety Journal

Gordon Seidoh Worley23 Mar 2018 18:27 UTC
28 points
11 comments1 min readLW link

G.K. Ch­ester­ton On AI Risk

Scott Alexander1 Apr 2017 19:00 UTC
17 points
0 comments7 min readLW link

The Friendly AI Game

bentarm15 Mar 2011 16:45 UTC
50 points
178 comments1 min readLW link

Q&A with Jür­gen Sch­mid­hu­ber on risks from AI

XiXiDu15 Jun 2011 15:51 UTC
61 points
45 comments4 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

Razied5 Aug 2020 23:52 UTC
7 points
4 comments1 min readLW link

Take­aways from safety by de­fault interviews

3 Apr 2020 17:20 UTC
28 points
2 comments13 min readLW link
(aiimpacts.org)

Field-Build­ing and Deep Models

Ben Pace13 Jan 2018 21:16 UTC
21 points
12 comments4 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooen8 Apr 2018 20:04 UTC
19 points
9 comments4 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_Armstrong13 Jan 2015 13:35 UTC
52 points
19 comments2 min readLW link

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks19 Dec 2019 3:00 UTC
130 points
18 comments62 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

Ofer11 Aug 2020 17:30 UTC
15 points
0 comments2 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer Yudkowsky30 Aug 2008 2:12 UTC
30 points
56 comments12 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
26 points
81 comments9 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazz24 Jun 2018 15:09 UTC
33 points
0 comments1 min readLW link
(www.foldl.me)

On un­fix­ably un­safe AGI architectures

Steven Byrnes19 Feb 2020 21:16 UTC
33 points
8 comments5 min readLW link

A toy model of the treach­er­ous turn

Stuart_Armstrong8 Jan 2016 12:58 UTC
42 points
13 comments6 min readLW link

Alle­gory On AI Risk, Game The­ory, and Mithril

James_Miller13 Feb 2017 20:41 UTC
45 points
57 comments3 min readLW link

1hr talk: In­tro to AGI safety

Steven Byrnes18 Jun 2019 21:41 UTC
36 points
4 comments24 min readLW link

The Evil AI Over­lord List

Stuart_Armstrong20 Nov 2012 17:02 UTC
44 points
80 comments1 min readLW link

What I would like the SIAI to publish

XiXiDu1 Nov 2010 14:07 UTC
36 points
225 comments3 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFox10 Jan 2013 8:17 UTC
39 points
187 comments4 min readLW link

Q&A with ex­perts on risks from AI #1

XiXiDu8 Jan 2012 11:46 UTC
45 points
67 comments9 min readLW link

Algo trad­ing is a cen­tral ex­am­ple of AI risk

Vanessa Kosoy28 Jul 2018 20:31 UTC
27 points
5 comments1 min readLW link

Will the world’s elites nav­i­gate the cre­ation of AI just fine?

lukeprog31 May 2013 18:49 UTC
36 points
266 comments2 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC
41 points
33 comments6 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC
25 points
15 comments1 min readLW link

Q&A with Stan Fran­klin on risks from AI

XiXiDu11 Jun 2011 15:22 UTC
36 points
10 comments2 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprog16 Mar 2012 17:12 UTC
42 points
161 comments33 min readLW link

[LINK] NYT Ar­ti­cle about Ex­is­ten­tial Risk from AI

[deleted]28 Jan 2013 10:37 UTC
38 points
23 comments1 min readLW link

Refram­ing the Prob­lem of AI Progress

Wei Dai12 Apr 2012 19:31 UTC
32 points
47 comments1 min readLW link

New AI risks re­search in­sti­tute at Oxford University

lukeprog16 Nov 2011 18:52 UTC
36 points
10 comments1 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouarti21 Aug 2020 23:25 UTC
8 points
10 comments1 min readLW link

Memes and Ra­tional Decisions

inferential9 Jan 2015 6:42 UTC
35 points
18 comments10 min readLW link

Levels of AI Self-Im­prove­ment

avturchin29 Apr 2018 11:45 UTC
11 points
1 comment39 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraper30 Apr 2020 10:47 UTC
3 points
1 comment51 min readLW link

Some Thoughts on Sin­gu­lar­ity Strategies

Wei Dai13 Jul 2011 2:41 UTC
41 points
30 comments3 min readLW link

A trick for Safer GPT-N

Razied23 Aug 2020 0:39 UTC
7 points
1 comment2 min readLW link

against “AI risk”

Wei Dai11 Apr 2012 22:46 UTC
35 points
91 comments1 min readLW link

“Smarter than us” is out!

Stuart_Armstrong25 Feb 2014 15:50 UTC
41 points
57 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC
22 points
16 comments4 min readLW link

Q&A with Abram Dem­ski on risks from AI

XiXiDu17 Jan 2012 9:43 UTC
33 points
71 comments9 min readLW link

Q&A with ex­perts on risks from AI #2

XiXiDu9 Jan 2012 19:40 UTC
22 points
29 comments7 min readLW link

AI Safety Dis­cus­sion Day

Linda Linsefors15 Sep 2020 14:40 UTC
20 points
0 comments1 min readLW link

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren Elverlin27 Sep 2020 17:51 UTC
17 points
6 comments1 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo 8 Nov 2023 20:10 UTC
1 point
0 comments8 min readLW link

On­line AI Safety Dis­cus­sion Day

Linda Linsefors8 Oct 2020 12:11 UTC
5 points
0 comments1 min readLW link

Mili­tary AI as a Con­ver­gent Goal of Self-Im­prov­ing AI

avturchin13 Nov 2017 12:17 UTC
5 points
3 comments1 min readLW link

Neu­ral pro­gram syn­the­sis is a dan­ger­ous technology

syllogism12 Jan 2018 16:19 UTC
10 points
6 comments2 min readLW link

New, Brief Pop­u­lar-Level In­tro­duc­tion to AI Risks and Superintelligence

LyleN23 Jan 2015 15:43 UTC
33 points
3 comments1 min readLW link

FAI Re­search Con­straints and AGI Side Effects

JustinShovelain3 Jun 2015 19:25 UTC
27 points
59 comments7 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AI14 Nov 2020 15:51 UTC
33 points
6 comments1 min readLW link

The mind-killer

Paul Crowley2 May 2009 16:49 UTC
29 points
160 comments2 min readLW link

[Question] Should I do it?

MrLight19 Nov 2020 1:08 UTC
−3 points
16 comments2 min readLW link

Ra­tion­al­is­ing hu­mans: an­other mug­ging, but not Pas­cal’s

Stuart_Armstrong14 Nov 2017 15:46 UTC
7 points
1 comment3 min readLW link

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d616 Dec 2020 13:32 UTC
26 points
15 comments15 min readLW link
(cerebralab.com)

[Question] What do you make of AGI:un­al­igned::space­ships:not enough food?

Ronny Fernandez22 Feb 2020 14:14 UTC
4 points
3 comments1 min readLW link

Risk Map of AI Systems

15 Dec 2020 9:16 UTC
28 points
3 comments8 min readLW link

[Question] Does it be­come eas­ier, or harder, for the world to co­or­di­nate around not build­ing AGI as time goes on?

Eli Tyre29 Jul 2019 22:59 UTC
86 points
31 comments3 min readLW link2 reviews

Grey Goo Re­quires AI

harsimony15 Jan 2021 4:45 UTC
8 points
11 comments4 min readLW link
(harsimony.wordpress.com)

AISU 2021

Linda Linsefors30 Jan 2021 17:40 UTC
28 points
2 comments1 min readLW link

Non­per­son Predicates

Eliezer Yudkowsky27 Dec 2008 1:47 UTC
53 points
177 comments6 min readLW link

En­gag­ing First In­tro­duc­tions to AI Risk

Rob Bensinger19 Aug 2013 6:26 UTC
31 points
21 comments3 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
49 points
123 comments2 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam Clarke7 Mar 2021 21:44 UTC
15 points
5 comments1 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxae25 Mar 2021 14:46 UTC
15 points
6 comments3 min readLW link

AI or­a­cles on blockchain

Caravaggio6 Apr 2021 20:13 UTC
5 points
0 comments3 min readLW link

What if AGI is near?

Wulky Wilkinsen14 Apr 2021 0:05 UTC
11 points
5 comments1 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky Wilkinsen22 Apr 2021 20:37 UTC
5 points
5 comments1 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

Quinn4 May 2021 19:10 UTC
6 points
2 comments1 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_Evans21 May 2021 13:39 UTC
58 points
2 comments3 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
63 points
11 comments7 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

Quinn27 Jun 2021 17:44 UTC
17 points
6 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg Severens3 Jul 2021 21:23 UTC
4 points
0 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykov16 Aug 2021 5:03 UTC
3 points
5 comments3 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos Ramirez27 Aug 2021 1:48 UTC
29 points
17 comments8 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-Perlman29 Aug 2021 18:00 UTC
5 points
2 comments11 min readLW link

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
72 points
11 comments14 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

Buck9 Sep 2021 19:46 UTC
88 points
12 comments5 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
56 points
24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
27 points
1 comment27 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard Harris27 Sep 2021 15:55 UTC
76 points
13 comments10 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
31 points
7 comments10 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

18 Oct 2021 18:37 UTC
81 points
9 comments10 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
42 points
1 comment13 min readLW link

What is the most evil AI that we could build, to­day?

ThomasJ1 Nov 2021 19:58 UTC
−2 points
14 comments1 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius Hobbhahn8 Nov 2021 12:51 UTC
29 points
15 comments12 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJules11 Nov 2021 7:04 UTC
2 points
2 comments1 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant Demaree14 Nov 2021 8:09 UTC
75 points
39 comments3 min readLW link

Two Stupid AI Align­ment Ideas

aphyer16 Nov 2021 16:13 UTC
24 points
3 comments4 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair Halberstadt16 Nov 2021 19:55 UTC
10 points
2 comments6 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

23 Nov 2021 19:16 UTC
64 points
13 comments3 min readLW link
(aitracker.org)

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika Srikumar24 Nov 2021 8:27 UTC
6 points
0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
16 points
5 comments7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
54 points
1 comment12 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_lika7 Dec 2021 19:37 UTC
1 point
0 comments1 min readLW link

Univer­sal­ity and the “Filter”

maggiehayes16 Dec 2021 0:47 UTC
10 points
2 comments11 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC
78 points
20 comments1 min readLW link

2+2: On­tolog­i­cal Framework

Lyrialtus1 Feb 2022 1:07 UTC
−15 points
2 comments12 min readLW link

Can the laws of physics/​na­ture pre­vent hell?

superads916 Feb 2022 20:39 UTC
−5 points
8 comments2 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

15 Feb 2022 18:16 UTC
15 points
4 comments8 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donor6 Mar 2022 18:43 UTC
39 points
11 comments5 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

mic9 Mar 2022 15:14 UTC
51 points
0 comments2 min readLW link

Is There a Valley of Bad Civ­i­liza­tional Ad­e­quacy?

lbThingrb11 Mar 2022 19:49 UTC
13 points
1 comment2 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

Yitz16 Mar 2022 2:47 UTC
8 points
8 comments1 min readLW link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy8 Apr 2022 11:40 UTC
197 points
25 comments4 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bear12 Apr 2022 13:07 UTC
6 points
2 comments2 min readLW link

The Peerless

Tamsin Leake13 Apr 2022 1:07 UTC
18 points
2 comments1 min readLW link
(carado.moe)

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouarti14 Apr 2022 20:28 UTC
10 points
7 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

Yitz15 Apr 2022 17:26 UTC
61 points
49 comments1 min readLW link

Reflec­tions on My Own Miss­ing Mood

Lone Pine21 Apr 2022 16:19 UTC
52 points
25 comments5 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not Relevant17 Apr 2022 22:27 UTC
91 points
16 comments2 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Lone Pine25 Apr 2022 18:30 UTC
7 points
6 comments1 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

Kakili27 Apr 2022 22:07 UTC
10 points
2 comments8 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe Carlsmith8 May 2022 3:50 UTC
20 points
1 comment29 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
53 points
0 comments59 min readLW link

Agency As a Nat­u­ral Abstraction

Thane Ruthenis13 May 2022 18:02 UTC
55 points
9 comments13 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenz14 May 2022 16:01 UTC
34 points
0 comments1 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

16 May 2022 21:21 UTC
57 points
6 comments6 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

17 May 2022 15:26 UTC
26 points
0 comments3 min readLW link

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimony15 May 2022 23:05 UTC
57 points
27 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon Fischer21 Aug 2022 17:13 UTC
26 points
3 comments7 min readLW link

Re­shap­ing the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC
147 points
35 comments21 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy Gillen24 May 2022 23:10 UTC
9 points
2 comments10 min readLW link

A Story of AI Risk: In­struc­tGPT-N

peterbarnett26 May 2022 23:22 UTC
24 points
0 comments8 min readLW link

We will be around in 30 years

mukashi7 Jun 2022 3:47 UTC
12 points
205 comments2 min readLW link

Re­search Ques­tions from Stained Glass Windows

StefanHex8 Jun 2022 12:38 UTC
4 points
0 comments2 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane Ruthenis16 Jun 2022 22:00 UTC
23 points
4 comments18 min readLW link

A plau­si­ble story about AI risk.

DeLesley Hutchins10 Jun 2022 2:08 UTC
14 points
2 comments4 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleese10 Jun 2022 22:35 UTC
40 points
2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane Ruthenis11 Jun 2022 18:29 UTC
48 points
5 comments4 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?