RSS

AI Risk

TagLast edit: 27 Aug 2022 21:39 UTC by Multicore

AI Risk is analysis of the risks associated with building powerful AI systems.

Su­per­in­tel­li­gence FAQ

Scott Alexander20 Sep 2016 19:00 UTC
88 points
13 comments27 min readLW link

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
302 points
49 comments8 min readLW link2 reviews

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
43 points
9 comments1 min readLW link2 reviews

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

11 Nov 2021 3:01 UTC
327 points
256 comments34 min readLW link

PreDCA: vanessa kosoy’s al­ign­ment protocol

carado20 Aug 2022 10:03 UTC
45 points
6 comments7 min readLW link
(carado.moe)

In­tu­itions about goal-di­rected behavior

Rohin Shah1 Dec 2018 4:25 UTC
50 points
15 comments6 min readLW link

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimi8 Mar 2021 22:05 UTC
53 points
7 comments9 min readLW link

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
704 points
642 comments30 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

Alexis Carlier8 Feb 2020 21:28 UTC
102 points
31 comments16 min readLW link

Devel­op­men­tal Stages of GPTs

orthonormal26 Jul 2020 22:03 UTC
140 points
74 comments7 min readLW link1 review

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimi11 Aug 2020 18:16 UTC
50 points
56 comments1 min readLW link

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
209 points
38 comments12 min readLW link

AI Safety bounty for prac­ti­cal ho­mo­mor­phic encryption

acylhalide19 Aug 2022 12:27 UTC
27 points
9 comments4 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

Buck21 Jul 2020 20:01 UTC
78 points
43 comments3 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
71 points
9 comments3 min readLW link
(github.com)

Are min­i­mal cir­cuits de­cep­tive?

evhub7 Sep 2019 18:11 UTC
65 points
11 comments8 min readLW link

Soft take­off can still lead to de­ci­sive strate­gic advantage

Daniel Kokotajlo23 Aug 2019 16:39 UTC
117 points
46 comments8 min readLW link4 reviews

Should we post­pone AGI un­til we reach safety?

otto.barten18 Nov 2020 15:43 UTC
27 points
36 comments3 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimi11 Oct 2021 8:20 UTC
95 points
11 comments15 min readLW link

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannell18 Dec 2021 2:15 UTC
32 points
22 comments4 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
65 points
14 comments13 min readLW link

MIRI an­nounces new “Death With Dig­nity” strategy

Eliezer Yudkowsky2 Apr 2022 0:43 UTC
325 points
521 comments18 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofsky9 Jun 2022 15:50 UTC
168 points
29 comments17 min readLW link
(www.cold-takes.com)

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
252 points
81 comments29 min readLW link

Cri­tiquing “What failure looks like”

Grue_Slinky27 Dec 2019 23:59 UTC
35 points
6 comments3 min readLW link

The Main Sources of AI Risk?

21 Mar 2019 18:28 UTC
78 points
25 comments2 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
78 points
12 comments9 min readLW link

“Tak­ing AI Risk Se­ri­ously” (thoughts by Critch)

Raemon29 Jan 2018 9:27 UTC
110 points
68 comments13 min readLW link

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC
33 points
4 comments6 min readLW link
(kajsotala.fi)

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC
22 points
11 comments6 min readLW link

Six AI Risk/​Strat­egy Ideas

Wei_Dai27 Aug 2019 0:40 UTC
64 points
18 comments4 min readLW link1 review

[Question] Did AI pi­o­neers not worry much about AI risks?

lisperati9 Feb 2020 19:58 UTC
42 points
9 comments1 min readLW link

Some dis­junc­tive rea­sons for ur­gency on AI risk

Wei_Dai15 Feb 2019 20:43 UTC
36 points
24 comments1 min readLW link

Drexler on AI Risk

PeterMcCluskey1 Feb 2019 5:11 UTC
34 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

A shift in ar­gu­ments for AI risk

Richard_Ngo28 May 2019 13:47 UTC
32 points
7 comments1 min readLW link
(fragile-credences.github.io)

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
129 points
23 comments8 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei_Dai7 Sep 2019 2:54 UTC
110 points
27 comments4 min readLW link1 review

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
203 points
60 comments15 min readLW link2 reviews

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin Shah2 Jan 2020 18:20 UTC
35 points
94 comments10 min readLW link
(mailchi.mp)

The strat­egy-steal­ing assumption

paulfchristiano16 Sep 2019 15:23 UTC
70 points
46 comments12 min readLW link3 reviews

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_Porter16 Oct 2012 4:33 UTC
20 points
39 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_it15 Jan 2018 14:33 UTC
80 points
68 comments2 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
78 points
14 comments7 min readLW link

Uber Self-Driv­ing Crash

jefftk7 Nov 2019 15:00 UTC
110 points
1 comment2 min readLW link
(www.jefftk.com)

Re­ply to Holden on ‘Tool AI’

Eliezer Yudkowsky12 Jun 2012 18:00 UTC
151 points
357 comments17 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC
43 points
19 comments7 min readLW link
(plato.stanford.edu)

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_Sotala4 May 2018 8:56 UTC
13 points
1 comment1 min readLW link
(arxiv.org)

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel Kokotajlo27 Feb 2020 18:10 UTC
27 points
5 comments8 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_Maxwell11 Mar 2020 16:27 UTC
43 points
20 comments4 min readLW link

The prob­lem/​solu­tion ma­trix: Calcu­lat­ing the prob­a­bil­ity of AI safety “on the back of an en­velope”

John_Maxwell20 Oct 2019 8:03 UTC
22 points
4 comments2 min readLW link

Three Sto­ries for How AGI Comes Be­fore FAI

John_Maxwell17 Sep 2019 23:26 UTC
27 points
8 comments6 min readLW link

Brain­storm­ing ad­di­tional AI risk re­duc­tion ideas

John_Maxwell14 Jun 2012 7:55 UTC
19 points
37 comments1 min readLW link

AI Align­ment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC
125 points
6 comments35 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworth8 Aug 2020 18:31 UTC
136 points
29 comments3 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
67 points
10 comments15 min readLW link

Work on Se­cu­rity In­stead of Friendli­ness?

Wei_Dai21 Jul 2012 18:28 UTC
51 points
107 comments2 min readLW link

An un­al­igned benchmark

paulfchristiano17 Nov 2018 15:51 UTC
31 points
0 comments9 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_Davis17 Sep 2020 2:23 UTC
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

Clar­ify­ing “What failure looks like”

Sam Clarke20 Sep 2020 20:40 UTC
91 points
14 comments17 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
60 points
22 comments27 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
189 points
34 comments38 min readLW link2 reviews

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
162 points
42 comments12 min readLW link3 reviews

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
77 points
4 comments6 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
92 points
11 comments17 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
95 points
17 comments13 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
73 points
47 comments12 min readLW link

AI risk hub in Sin­ga­pore?

Daniel Kokotajlo29 Oct 2020 11:45 UTC
51 points
18 comments4 min readLW link

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven Byrnes24 Nov 2019 1:40 UTC
25 points
3 comments7 min readLW link

The AI Safety Game (UPDATED)

Daniel Kokotajlo5 Dec 2020 10:27 UTC
41 points
9 comments3 min readLW link

[Question] Sugges­tions of posts on the AF to review

adamShimi16 Feb 2021 12:40 UTC
56 points
20 comments1 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrump20 Feb 2021 9:42 UTC
12 points
16 comments7 min readLW link

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimi11 Mar 2021 15:01 UTC
21 points
12 comments9 min readLW link

Re­view of “Fun with +12 OOMs of Com­pute”

28 Mar 2021 14:55 UTC
59 points
20 comments8 min readLW link

April drafts

AI Impacts1 Apr 2021 18:10 UTC
49 points
2 comments1 min readLW link
(aiimpacts.org)

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June Ku29 Apr 2021 15:38 UTC
21 points
7 comments1 min readLW link

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
101 points
13 comments4 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
70 points
9 comments3 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTrout22 Jun 2021 22:26 UTC
71 points
44 comments16 min readLW link
(arxiv.org)

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

Sam Alt­man and Ezra Klein on the AI Revolution

Zack_M_Davis27 Jun 2021 4:53 UTC
38 points
17 comments1 min readLW link
(www.nytimes.com)

Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021 15:16 UTC
16 points
8 comments8 min readLW link

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimi28 Aug 2021 13:35 UTC
12 points
2 comments1 min readLW link

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippke3 Sep 2021 14:19 UTC
5 points
14 comments1 min readLW link

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouarti6 Sep 2021 17:30 UTC
2 points
1 comment1 min readLW link

Bayeswatch 7: Wildfire

lsusr8 Sep 2021 5:35 UTC
47 points
6 comments3 min readLW link

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusr20 Sep 2021 6:36 UTC
70 points
16 comments6 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

MakoYass28 Sep 2021 1:54 UTC
10 points
3 comments1 min readLW link

In­ter­view with Skynet

lsusr30 Sep 2021 2:20 UTC
49 points
1 comment2 min readLW link

Epistemic Strate­gies of Selec­tion Theorems

adamShimi18 Oct 2021 8:57 UTC
32 points
1 comment12 min readLW link

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimi22 Oct 2021 8:22 UTC
5 points
0 comments6 min readLW link

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

Jan5 Nov 2021 21:42 UTC
41 points
2 comments12 min readLW link
(universalprior.substack.com)

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

Robbo7 Nov 2021 0:00 UTC
35 points
10 comments7 min readLW link

Us­ing blin­ders to help you see things for what they are

Adam Zerner11 Nov 2021 7:07 UTC
13 points
2 comments2 min readLW link

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicq14 Nov 2021 14:08 UTC
2 points
3 comments2 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

15 Nov 2021 20:31 UTC
231 points
143 comments99 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021 21:42 UTC
47 points
3 comments1 min readLW link

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan Zoellner3 Dec 2021 18:31 UTC
7 points
5 comments1 min readLW link

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC
6 points
15 comments27 min readLW link

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob Bensinger12 Dec 2021 2:08 UTC
66 points
37 comments7 min readLW link

Sum­mary of the Acausal At­tack Is­sue for AIXI

Diffractor13 Dec 2021 8:16 UTC
16 points
6 comments4 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
107 points
9 comments15 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
49 points
4 comments28 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskey28 Dec 2021 2:20 UTC
10 points
0 comments6 min readLW link
(www.bayesianinvestor.com)

More Is Differ­ent for AI

jsteinhardt4 Jan 2022 19:30 UTC
137 points
22 comments3 min readLW link
(bounded-regret.ghost.io)

Challenges with Break­ing into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC
71 points
15 comments3 min readLW link

Thoughts on AGI safety from the top

jylin042 Feb 2022 20:06 UTC
30 points
2 comments32 min readLW link

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron Berg8 Feb 2022 16:12 UTC
24 points
5 comments14 min readLW link

Paradigm-build­ing: Introduction

Cameron Berg8 Feb 2022 0:06 UTC
25 points
0 comments2 min readLW link

How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:50 UTC
63 points
5 comments13 min readLW link
(www.neelnanda.io)

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

MakoYass2 Mar 2022 23:47 UTC
24 points
10 comments1 min readLW link

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilan31 Mar 2022 5:20 UTC
24 points
1 comment48 min readLW link

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not Relevant8 Apr 2022 16:07 UTC
112 points
165 comments4 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
46 points
41 comments1 min readLW link

Why I’m Wor­ried About AI

peterbarnett23 May 2022 21:13 UTC
21 points
2 comments12 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

24 May 2022 0:00 UTC
46 points
2 comments21 min readLW link

Will work­ing here ad­vance AGI? Help us not de­stroy the world!

Yonatan Cale29 May 2022 11:42 UTC
29 points
46 comments1 min readLW link

Distil­led—AGI Safety from First Principles

Harrison G29 May 2022 0:57 UTC
8 points
1 comment14 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

30 May 2022 20:25 UTC
36 points
3 comments25 min readLW link

Ar­tifi­cial In­tel­li­gence Safety for the Aver­agely Intelligent

Salsabila Mahdi1 Jun 2022 6:24 UTC
5 points
2 comments1 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_Sotala2 Jun 2022 21:41 UTC
131 points
26 comments4 min readLW link

I’m try­ing out “as­ter­oid mind­set”

Alex_Altair3 Jun 2022 13:35 UTC
84 points
5 comments4 min readLW link

Episte­molog­i­cal Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC
50 points
11 comments11 min readLW link
(epistemologicalvigilance.substack.com)

A Quick Guide to Con­fronting Doom

Ruby13 Apr 2022 19:30 UTC
224 points
36 comments2 min readLW link

Why I don’t be­lieve in doom

mukashi7 Jun 2022 23:49 UTC
6 points
30 comments4 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

10 Jun 2022 2:08 UTC
48 points
3 comments35 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchin10 Jun 2022 17:24 UTC
10 points
2 comments1 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_Altair10 Jun 2022 17:38 UTC
20 points
4 comments8 min readLW link

On A List of Lethalities

Zvi13 Jun 2022 12:30 UTC
152 points
48 comments54 min readLW link
(thezvi.wordpress.com)

Con­ti­nu­ity Assumptions

Jan_Kulveit13 Jun 2022 21:31 UTC
26 points
13 comments4 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_Critch14 Jun 2022 19:31 UTC
208 points
36 comments2 min readLW link

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMills15 Jun 2022 3:12 UTC
35 points
4 comments2 min readLW link

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_Leong15 Jun 2022 8:33 UTC
7 points
17 comments1 min readLW link

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
747 points
203 comments20 min readLW link

Are we there yet?

theflowerpot20 Jun 2022 11:19 UTC
2 points
2 comments1 min readLW link

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel Nellessen22 Jun 2022 17:59 UTC
2 points
1 comment3 min readLW link
(snellessen.com)

The Ba­sics of AGI Policy (Flowchart)

Trevor126 Jun 2022 2:01 UTC
18 points
8 comments2 min readLW link

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan Hendrycks2 Jul 2022 0:09 UTC
40 points
0 comments1 min readLW link
(arxiv.org)

Con­fu­sions in My Model of AI Risk

peterbarnett7 Jul 2022 1:05 UTC
21 points
9 comments5 min readLW link

The Align­ment Problem

lsusr11 Jul 2022 3:03 UTC
45 points
20 comments3 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
37 points
5 comments3 min readLW link
(epistemologicalvigilance.substack.com)

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas Teixeira29 Jul 2022 21:06 UTC
33 points
1 comment9 min readLW link

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_Friedrich3 Aug 2022 19:17 UTC
1 point
0 comments1 min readLW link
(forms.gle)

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_Ngo10 Aug 2022 22:46 UTC
77 points
10 comments27 min readLW link

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworth12 Aug 2022 16:30 UTC
83 points
47 comments1 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC
357 points
31 comments14 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimi13 Aug 2022 10:01 UTC
30 points
1 comment2 min readLW link

Thoughts on ‘List of Lethal­ities’

Alex Lawsen 17 Aug 2022 18:33 UTC
25 points
0 comments10 min readLW link

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshen20 Aug 2022 10:29 UTC
29 points
5 comments7 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC
23 points
1 comment2 min readLW link

Beyond Hyperanthropomorphism

PointlessOne21 Aug 2022 17:55 UTC
3 points
17 comments1 min readLW link
(studio.ribbonfarm.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

plex8 Sep 2022 11:56 UTC
19 points
27 comments3 min readLW link

[Question] Why are we sure that AI will “want” some­thing?

shminux16 Sep 2022 20:35 UTC
31 points
58 comments1 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimi17 Sep 2022 17:03 UTC
18 points
0 comments1 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

Davidmanheim21 Sep 2022 19:32 UTC
12 points
0 comments1 min readLW link

Ac­cess to AI: a hu­man right?

dmtea25 Jul 2020 9:38 UTC
5 points
3 comments2 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
110 points
9 comments56 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog8 Jul 2013 14:33 UTC
17 points
8 comments1 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog13 Nov 2012 21:56 UTC
63 points
92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepa1 Apr 2018 1:22 UTC
46 points
14 comments2 min readLW link

When is un­al­igned AI morally valuable?

paulfchristiano25 May 2018 1:57 UTC
61 points
52 comments10 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

29 Oct 2018 21:07 UTC
86 points
8 comments6 min readLW link

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTrout3 Jun 2018 3:16 UTC
74 points
7 comments8 min readLW link

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooen9 Nov 2018 20:57 UTC
70 points
9 comments4 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei_Dai12 Jan 2019 2:19 UTC
100 points
22 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei_Dai17 Aug 2019 17:38 UTC
74 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei_Dai16 Dec 2018 22:13 UTC
83 points
24 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

null30 Apr 2018 13:23 UTC
35 points
10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_Armstrong22 May 2019 21:35 UTC
75 points
7 comments1 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks24 Dec 2017 18:52 UTC
41 points
5 comments23 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
87 points
6 comments9 min readLW link

I Vouch For MIRI

Zvi17 Dec 2017 17:50 UTC
36 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_it18 Jan 2018 15:07 UTC
39 points
10 comments1 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi12 Mar 2018 12:10 UTC
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Three AI Safety Re­lated Ideas

Wei_Dai13 Dec 2018 21:32 UTC
68 points
38 comments2 min readLW link

A rant against robots

Lê Nguyên Hoang14 Jan 2020 22:03 UTC
64 points
7 comments5 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex Flint31 Mar 2018 18:37 UTC
30 points
3 comments11 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace3 Nov 2019 18:20 UTC
73 points
32 comments3 min readLW link1 review
(meteuphoric.com)

Course recom­men­da­tions for Friendli­ness researchers

Louie9 Jan 2013 14:33 UTC
95 points
112 comments10 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC
29 points
11 comments8 min readLW link

AI Sum­mer Fel­lows Program

colm21 Mar 2018 15:32 UTC
21 points
0 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
89 points
519 comments8 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC
61 points
16 comments5 min readLW link

The sim­ple pic­ture on AI safety

Alex Flint27 May 2018 19:43 UTC
31 points
10 comments2 min readLW link

Elon Musk donates $10M to the Fu­ture of Life In­sti­tute to keep AI benefi­cial

Paul Crowley15 Jan 2015 16:33 UTC
78 points
52 comments1 min readLW link

Strate­gic im­pli­ca­tions of AIs’ abil­ity to co­or­di­nate at low cost, for ex­am­ple by merging

Wei_Dai25 Apr 2019 5:08 UTC
60 points
45 comments2 min readLW link1 review

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana Kumar21 Jun 2019 12:50 UTC
43 points
6 comments1 min readLW link
(arxiv.org)

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogs15 May 2018 20:11 UTC
17 points
12 comments1 min readLW link
(www.theatlantic.com)

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_Armstrong10 Oct 2017 11:06 UTC
25 points
8 comments1 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]23 Jul 2014 0:24 UTC
62 points
28 comments3 min readLW link

AI Align­ment Re­search Overview (by Ja­cob Stein­hardt)

Ben Pace6 Nov 2019 19:24 UTC
43 points
0 comments7 min readLW link
(docs.google.com)

A gen­eral model of safety-ori­ented AI development

Wei_Dai11 Jun 2018 21:00 UTC
65 points
8 comments1 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei_Dai10 Sep 2019 8:29 UTC
46 points
26 comments3 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong7 Apr 2014 11:00 UTC
73 points
417 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_Armstrong29 Jun 2015 20:41 UTC
68 points
46 comments2 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergal31 Oct 2019 12:10 UTC
40 points
58 comments1 min readLW link
(aiimpacts.org)

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

The Mag­ni­tude of His Own Folly

Eliezer Yudkowsky30 Sep 2008 11:31 UTC
82 points
128 comments6 min readLW link

AI al­ign­ment landscape

paulfchristiano13 Oct 2019 2:10 UTC
40 points
3 comments1 min readLW link
(ai-alignment.com)

Launched: Friend­ship is Optimal

iceman15 Nov 2012 4:57 UTC
72 points
32 comments1 min readLW link

Friend­ship is Op­ti­mal: A My Lit­tle Pony fan­fic about an op­ti­miza­tion process

iceman8 Sep 2012 6:16 UTC
109 points
152 comments1 min readLW link

Do Earths with slower eco­nomic growth have a bet­ter chance at FAI?

Eliezer Yudkowsky12 Jun 2013 19:54 UTC
55 points
176 comments4 min readLW link

Idea: Open Ac­cess AI Safety Journal

G Gordon Worley III23 Mar 2018 18:27 UTC
28 points
11 comments1 min readLW link

G.K. Ch­ester­ton On AI Risk

Scott Alexander1 Apr 2017 19:00 UTC
15 points
0 comments7 min readLW link

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky24 Nov 2007 0:12 UTC
132 points
136 comments7 min readLW link

The Friendly AI Game

bentarm15 Mar 2011 16:45 UTC
50 points
178 comments1 min readLW link

Q&A with Jür­gen Sch­mid­hu­ber on risks from AI

XiXiDu15 Jun 2011 15:51 UTC
59 points
45 comments4 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

Razied5 Aug 2020 23:52 UTC
3 points
3 comments1 min readLW link

Take­aways from safety by de­fault interviews

3 Apr 2020 17:20 UTC
23 points
2 comments13 min readLW link
(aiimpacts.org)

Field-Build­ing and Deep Models

Ben Pace13 Jan 2018 21:16 UTC
21 points
12 comments4 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooen8 Apr 2018 20:04 UTC
19 points
9 comments4 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_Armstrong13 Jan 2015 13:35 UTC
52 points
19 comments2 min readLW link

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks19 Dec 2019 3:00 UTC
130 points
18 comments62 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

ofer11 Aug 2020 17:30 UTC
15 points
0 comments2 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer Yudkowsky30 Aug 2008 2:12 UTC
29 points
56 comments12 min readLW link

Dreams of Friendliness

Eliezer Yudkowsky31 Aug 2008 1:20 UTC
31 points
81 comments9 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazz24 Jun 2018 15:09 UTC
33 points
0 comments1 min readLW link
(www.foldl.me)

On un­fix­ably un­safe AGI architectures

Steven Byrnes19 Feb 2020 21:16 UTC
33 points
8 comments5 min readLW link

A toy model of the treach­er­ous turn

Stuart_Armstrong8 Jan 2016 12:58 UTC
36 points
13 comments6 min readLW link

Alle­gory On AI Risk, Game The­ory, and Mithril

James_Miller13 Feb 2017 20:41 UTC
45 points
57 comments3 min readLW link

1hr talk: In­tro to AGI safety

Steven Byrnes18 Jun 2019 21:41 UTC
34 points
4 comments24 min readLW link

The Evil AI Over­lord List

Stuart_Armstrong20 Nov 2012 17:02 UTC
44 points
80 comments1 min readLW link

What I would like the SIAI to publish

XiXiDu1 Nov 2010 14:07 UTC
36 points
225 comments3 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFox10 Jan 2013 8:17 UTC
38 points
188 comments4 min readLW link

Q&A with ex­perts on risks from AI #1

XiXiDu8 Jan 2012 11:46 UTC
45 points
67 comments9 min readLW link

Algo trad­ing is a cen­tral ex­am­ple of AI risk

Vanessa Kosoy28 Jul 2018 20:31 UTC
27 points
5 comments1 min readLW link

Will the world’s elites nav­i­gate the cre­ation of AI just fine?

lukeprog31 May 2013 18:49 UTC
36 points
266 comments2 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC
36 points
33 comments6 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC
25 points
15 comments1 min readLW link

Q&A with Stan Fran­klin on risks from AI

XiXiDu11 Jun 2011 15:22 UTC
36 points
10 comments2 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprog16 Mar 2012 17:12 UTC
42 points
161 comments33 min readLW link

[LINK] NYT Ar­ti­cle about Ex­is­ten­tial Risk from AI

[deleted]28 Jan 2013 10:37 UTC
38 points
23 comments1 min readLW link

Refram­ing the Prob­lem of AI Progress

Wei_Dai12 Apr 2012 19:31 UTC
32 points
47 comments1 min readLW link

New AI risks re­search in­sti­tute at Oxford University

lukeprog16 Nov 2011 18:52 UTC
36 points
10 comments1 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouarti21 Aug 2020 23:25 UTC
8 points
10 comments1 min readLW link

Memes and Ra­tional Decisions

inferential9 Jan 2015 6:42 UTC
35 points
17 comments10 min readLW link

Levels of AI Self-Im­prove­ment

avturchin29 Apr 2018 11:45 UTC
11 points
0 comments39 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraper30 Apr 2020 10:47 UTC
3 points
1 comment51 min readLW link

Some Thoughts on Sin­gu­lar­ity Strategies

Wei_Dai13 Jul 2011 2:41 UTC
40 points
29 comments3 min readLW link

A trick for Safer GPT-N

Razied23 Aug 2020 0:39 UTC
7 points
1 comment2 min readLW link

against “AI risk”

Wei_Dai11 Apr 2012 22:46 UTC
35 points
91 comments1 min readLW link

“Smarter than us” is out!

Stuart_Armstrong25 Feb 2014 15:50 UTC
41 points
57 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC
22 points
16 comments4 min readLW link

Q&A with Abram Dem­ski on risks from AI

XiXiDu17 Jan 2012 9:43 UTC
33 points
71 comments9 min readLW link

Q&A with ex­perts on risks from AI #2

XiXiDu9 Jan 2012 19:40 UTC
22 points
29 comments7 min readLW link

AI Safety Dis­cus­sion Day

Linda Linsefors15 Sep 2020 14:40 UTC
20 points
0 comments1 min readLW link

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren Elverlin27 Sep 2020 17:51 UTC
17 points
6 comments1 min readLW link

On­line AI Safety Dis­cus­sion Day

Linda Linsefors8 Oct 2020 12:11 UTC
5 points
0 comments1 min readLW link

Mili­tary AI as a Con­ver­gent Goal of Self-Im­prov­ing AI

avturchin13 Nov 2017 12:17 UTC
5 points
3 comments1 min readLW link

Neu­ral pro­gram syn­the­sis is a dan­ger­ous technology

syllogism12 Jan 2018 16:19 UTC
10 points
6 comments2 min readLW link

New, Brief Pop­u­lar-Level In­tro­duc­tion to AI Risks and Superintelligence

LyleN23 Jan 2015 15:43 UTC
33 points
3 comments1 min readLW link

FAI Re­search Con­straints and AGI Side Effects

JustinShovelain3 Jun 2015 19:25 UTC
26 points
59 comments7 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AI14 Nov 2020 15:51 UTC
32 points
8 comments1 min readLW link

The mind-killer

Paul Crowley2 May 2009 16:49 UTC
29 points
160 comments2 min readLW link

[Question] Should I do it?

MrLight19 Nov 2020 1:08 UTC
−3 points
16 comments2 min readLW link

Ra­tion­al­is­ing hu­mans: an­other mug­ging, but not Pas­cal’s

Stuart_Armstrong14 Nov 2017 15:46 UTC
7 points
1 comment3 min readLW link

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d616 Dec 2020 13:32 UTC
26 points
15 comments15 min readLW link
(cerebralab.com)

[Question] What do you make of AGI:un­al­igned::space­ships:not enough food?

Ronny22 Feb 2020 14:14 UTC
4 points
3 comments1 min readLW link

Risk Map of AI Systems

15 Dec 2020 9:16 UTC
25 points
3 comments8 min readLW link

Edge of the Cliff

akaTrickster5 Jan 2021 17:21 UTC
1 point
0 comments5 min readLW link

[Question] Does it be­come eas­ier, or harder, for the world to co­or­di­nate around not build­ing AGI as time goes on?

Eli Tyre29 Jul 2019 22:59 UTC
86 points
31 comments3 min readLW link2 reviews

Grey Goo Re­quires AI

harsimony15 Jan 2021 4:45 UTC
8 points
11 comments4 min readLW link
(harsimony.wordpress.com)

AISU 2021

Linda Linsefors30 Jan 2021 17:40 UTC
28 points
2 comments1 min readLW link

Non­per­son Predicates

Eliezer Yudkowsky27 Dec 2008 1:47 UTC
50 points
176 comments6 min readLW link

En­gag­ing First In­tro­duc­tions to AI Risk

Rob Bensinger19 Aug 2013 6:26 UTC
31 points
21 comments3 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
46 points
123 comments2 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam Clarke7 Mar 2021 21:44 UTC
15 points
5 comments1 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxae25 Mar 2021 14:46 UTC
15 points
7 comments3 min readLW link

AI or­a­cles on blockchain

Caravaggio6 Apr 2021 20:13 UTC
5 points
0 comments3 min readLW link

What if AGI is near?

Wulky Wilkinsen14 Apr 2021 0:05 UTC
11 points
5 comments1 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky Wilkinsen22 Apr 2021 20:37 UTC
5 points
5 comments1 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

Quinn4 May 2021 19:10 UTC
6 points
2 comments1 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_Evans21 May 2021 13:39 UTC
58 points
2 comments3 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
60 points
11 comments7 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

Quinn27 Jun 2021 17:44 UTC
16 points
6 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg Severens3 Jul 2021 21:23 UTC
4 points
0 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykov16 Aug 2021 5:03 UTC
3 points
5 comments3 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos Ramirez27 Aug 2021 1:48 UTC
29 points
17 comments8 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-Perlman29 Aug 2021 18:00 UTC
5 points
2 comments12 min readLW link

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
64 points
11 comments14 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

Buck9 Sep 2021 19:46 UTC
87 points
12 comments5 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
56 points
24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
27 points
1 comment27 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard Harris27 Sep 2021 15:55 UTC
75 points
13 comments10 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
27 points
7 comments10 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

18 Oct 2021 18:37 UTC
81 points
9 comments10 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
41 points
1 comment13 min readLW link

What is the most evil AI that we could build, to­day?

ThomasJ1 Nov 2021 19:58 UTC
−2 points
14 comments1 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius Hobbhahn8 Nov 2021 12:51 UTC
26 points
15 comments12 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJules11 Nov 2021 7:04 UTC
2 points
2 comments1 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant Demaree14 Nov 2021 8:09 UTC
73 points
43 comments3 min readLW link

Two Stupid AI Align­ment Ideas

aphyer16 Nov 2021 16:13 UTC
24 points
3 comments4 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair Halberstadt16 Nov 2021 19:55 UTC
10 points
2 comments6 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

23 Nov 2021 19:16 UTC
64 points
13 comments3 min readLW link
(aitracker.org)

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika Srikumar24 Nov 2021 8:27 UTC
6 points
0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
15 points
3 comments7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
54 points
1 comment12 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_lika7 Dec 2021 19:37 UTC
1 point
0 comments1 min readLW link

Univer­sal­ity and the “Filter”

maggiehayes16 Dec 2021 0:47 UTC
10 points
3 comments11 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC
72 points
20 comments1 min readLW link

2+2: On­tolog­i­cal Framework

Lyrialtus1 Feb 2022 1:07 UTC
−15 points
2 comments15 min readLW link

Can the laws of physics/​na­ture pre­vent hell?

superads916 Feb 2022 20:39 UTC
−7 points
10 comments2 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

15 Feb 2022 18:16 UTC
15 points
4 comments8 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donor6 Mar 2022 18:43 UTC
36 points
11 comments5 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

Michael Chen9 Mar 2022 15:14 UTC
50 points
0 comments2 min readLW link

Is There a Valley of Bad Civ­i­liza­tional Ad­e­quacy?

lbThingrb11 Mar 2022 19:49 UTC
13 points
1 comment2 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

Yitz16 Mar 2022 2:47 UTC
8 points
9 comments1 min readLW link

It Looks Like You’re Try­ing To Take Over The World

gwern9 Mar 2022 16:35 UTC
385 points
125 comments1 min readLW link
(www.gwern.net)

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy8 Apr 2022 11:40 UTC
185 points
24 comments4 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bear12 Apr 2022 13:07 UTC
4 points
2 comments2 min readLW link

The Peerless

carado13 Apr 2022 1:07 UTC
18 points
2 comments1 min readLW link
(carado.moe)

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouarti14 Apr 2022 20:28 UTC
10 points
7 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

Yitz15 Apr 2022 17:26 UTC
60 points
53 comments1 min readLW link

Reflec­tions on My Own Miss­ing Mood

Conor Sullivan21 Apr 2022 16:19 UTC
51 points
25 comments5 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not Relevant17 Apr 2022 22:27 UTC
91 points
16 comments2 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Conor Sullivan25 Apr 2022 18:30 UTC
6 points
7 comments1 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

Kakili27 Apr 2022 22:07 UTC
10 points
2 comments8 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe Carlsmith8 May 2022 3:50 UTC
20 points
1 comment29 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
45 points
0 comments59 min readLW link

Agency As a Nat­u­ral Abstraction

Thane Ruthenis13 May 2022 18:02 UTC
54 points
8 comments13 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenz14 May 2022 16:01 UTC
34 points
0 comments1 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

16 May 2022 21:21 UTC
57 points
6 comments6 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

17 May 2022 15:26 UTC
24 points
0 comments3 min readLW link

In defence of flailing

acylhalide18 Jun 2022 5:26 UTC
10 points
14 comments4 min readLW link

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimony15 May 2022 23:05 UTC
49 points
28 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon Fischer21 Aug 2022 17:13 UTC
26 points
3 comments8 min readLW link

Re­shap­ing the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC
142 points
34 comments21 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy Gillen24 May 2022 23:10 UTC
9 points
2 comments10 min readLW link

Right now, you’re sit­ting on a REDONKULOUS op­por­tu­nity to help solve AGI (and rake in $$$)

Trevor126 May 2022 21:55 UTC
41 points
13 comments2 min readLW link

A Story of AI Risk: In­struc­tGPT-N

peterbarnett26 May 2022 23:22 UTC
24 points
0 comments8 min readLW link

We will be around in 30 years

mukashi7 Jun 2022 3:47 UTC
12 points
205 comments2 min readLW link

Trans­former Re­search Ques­tions from Stained Glass Windows

StefanHex8 Jun 2022 12:38 UTC
4 points
0 comments2 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane Ruthenis16 Jun 2022 22:00 UTC
15 points
4 comments18 min readLW link

A plau­si­ble story about AI risk.

DeLesley Hutchins10 Jun 2022 2:08 UTC
14 points
1 comment4 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleese10 Jun 2022 22:35 UTC
32 points
2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane Ruthenis11 Jun 2022 18:29 UTC
41 points
5 comments4 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

Nikita Sokolsky13 Jun 2022 18:26 UTC
123 points
76 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafety13 Jun 2022 19:24 UTC
−15 points
1 comment5 min readLW link

Align­ment re­search for “meta” purposes

acylhalide16 Jun 2022 14:03 UTC
12 points
0 comments1 min readLW link

Spe­cific prob­lems with spe­cific an­i­mal com­par­i­sons for AI policy

Trevor119 Jun 2022 1:27 UTC
3 points
1 comment2 min readLW link

[Question] AI mis­al­ign­ment risk from GPT-like sys­tems?

fiso6419 Jun 2022 17:35 UTC
10 points
8 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
83 points
30 comments18 min readLW link

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGod25 Jun 2022 13:53 UTC
5 points
3 comments6 min readLW link

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

16 Jul 2022 12:57 UTC
78 points
125 comments3 min readLW link

Refram­ing the AI Risk

Thane Ruthenis1 Jul 2022 18:44 UTC
25 points
6 comments6 min readLW link

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanS2 Jul 2022 17:45 UTC
3 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben Smith3 Jul 2022 2:51 UTC
11 points
1 comment4 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. Murphy4 Jul 2022 1:25 UTC
35 points
12 comments1 min readLW link
(www.hsgac.senate.gov)

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermist4 Jul 2022 17:08 UTC
58 points
24 comments4 min readLW link
(forum.effectivealtruism.org)

Please help us com­mu­ni­cate AI xrisk. It could save the world.

otto.barten4 Jul 2022 21:47 UTC
4 points
7 comments2 min readLW link

When is it ap­pro­pri­ate to use statis­ti­cal mod­els and prob­a­bil­ities for de­ci­sion mak­ing ?

Younes Kamel5 Jul 2022 12:34 UTC
10 points
6 comments4 min readLW link
(youneskamel.substack.com)

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
42 points
0 comments1 min readLW link
(docs.google.com)

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane Ruthenis13 Jul 2022 20:23 UTC
38 points
12 comments4 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
39 points
5 comments22 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCeph19 Jul 2022 2:28 UTC
8 points
1 comment9 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane Ruthenis23 Jul 2022 19:27 UTC
24 points
1 comment12 min readLW link

Align­ment be­ing im­pos­si­ble might be bet­ter than it be­ing re­ally difficult

Martín Soto25 Jul 2022 23:57 UTC
12 points
2 comments2 min readLW link

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8res27 Jul 2022 3:21 UTC
137 points
37 comments6 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span11 Aug 2022 4:08 UTC
3 points
3 comments1 min readLW link

Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

Alex Lintz3 Aug 2022 23:15 UTC
17 points
0 comments12 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane Ruthenis4 Aug 2022 23:31 UTC
36 points
1 comment13 min readLW link

Com­plex­ity No Bar to AI (Or, why Com­pu­ta­tional Com­plex­ity doesn’t mat­ter for real life prob­lems)

Noosphere897 Aug 2022 19:55 UTC
17 points
14 comments3 min readLW link
(www.gwern.net)

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworth10 Aug 2022 16:08 UTC
122 points
30 comments3 min readLW link

Anti-squat­ted AI x-risk do­mains index

plex12 Aug 2022 12:01 UTC
49 points
3 comments1 min readLW link

In­fant AI Scenario

Nathan112312 Aug 2022 21:20 UTC
1 point
0 comments3 min readLW link

The Dumbest Pos­si­ble Gets There First

Artaxerxes13 Aug 2022 10:20 UTC
35 points
7 comments2 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane Ruthenis17 Aug 2022 18:47 UTC
39 points
22 comments1 min readLW link

Align­ment’s phlo­gis­ton

Eleni Angelou18 Aug 2022 22:27 UTC
8 points
2 comments2 min readLW link

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul Bricman20 Aug 2022 10:01 UTC
25 points
2 comments14 min readLW link

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

Algon20 Aug 2022 19:48 UTC
75 points
80 comments1 min readLW link

My Plan to Build Aligned Superintelligence

apollonianblues21 Aug 2022 13:16 UTC
18 points
7 comments8 min readLW link

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

Netcentrica21 Aug 2022 22:01 UTC
4 points
2 comments5 min readLW link

It Looks Like You’re Try­ing To Take Over The Narrative

George3d624 Aug 2022 13:36 UTC
2 points
20 comments9 min readLW link
(www.epistem.ink)

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane Ruthenis26 Aug 2022 18:49 UTC
29 points
1 comment6 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett Baker26 Aug 2022 18:26 UTC
115 points
48 comments1 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence Phillips27 Aug 2022 0:06 UTC
24 points
3 comments2 min readLW link
(www.metaculus.com)

Help Un­der­stand­ing Prefer­ences And Evil

Netcentrica27 Aug 2022 3:42 UTC
6 points
7 comments2 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh Englander27 Aug 2022 17:31 UTC
13 points
4 comments1 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane Ruthenis29 Aug 2022 18:37 UTC
12 points
2 comments3 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni Angelou3 Sep 2022 12:47 UTC
9 points
0 comments3 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworth30 Aug 2022 20:48 UTC
130 points
14 comments10 min readLW link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC
7 points
8 comments3 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubc2 Sep 2022 18:14 UTC
9 points
10 comments7 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhub2 Sep 2022 21:57 UTC
34 points
12 comments3 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardt5 Sep 2022 7:55 UTC
18 points
13 comments2 min readLW link

How Josiah be­came an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC
4 points
0 comments1 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford6 Sep 2022 17:17 UTC
6 points
0 comments3 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC
8 points
1 comment2 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
19 points
6 comments10 min readLW link

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul Bricman12 Sep 2022 12:00 UTC
5 points
0 comments14 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

11 Sep 2022 10:57 UTC
42 points
12 comments30 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Akash11 Sep 2022 23:43 UTC
39 points
6 comments6 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasper12 Sep 2022 19:07 UTC
94 points
7 comments2 min readLW link
(arxiv.org)

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul Bricman16 Sep 2022 14:45 UTC
29 points
0 comments16 min readLW link

Risk aver­sion and GPT-3

hatta_afiq13 Sep 2022 20:50 UTC
1 point
0 comments1 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGod14 Sep 2022 12:15 UTC
6 points
7 comments6 min readLW link

Emily Brontë on: Psy­chol­ogy Re­quired for Se­ri­ous™ AGI Safety Research

robertzk14 Sep 2022 14:47 UTC
2 points
0 comments1 min readLW link

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimony14 Sep 2022 17:19 UTC
18 points
6 comments1 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc1001414 Sep 2022 20:37 UTC
7 points
0 comments16 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilm15 Sep 2022 8:25 UTC
10 points
4 comments12 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
101 points
24 comments15 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

rodeo_flagellum17 Sep 2022 22:27 UTC
17 points
4 comments1 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon Lang18 Sep 2022 13:08 UTC
43 points
3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John Nay18 Sep 2022 20:39 UTC
8 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Here Be AGI Dragons

Oren Montano21 Sep 2022 22:28 UTC
−2 points
0 comments5 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

22 Sep 2022 13:55 UTC
11 points
0 comments27 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul Bricman23 Sep 2022 15:30 UTC
15 points
0 comments10 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
5 points
4 comments3 min readLW link

On Generality

Oren Montano26 Sep 2022 4:06 UTC
2 points
0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Oren Montano26 Sep 2022 4:06 UTC
0 points
0 comments1 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC
4 points
0 comments12 min readLW link