RSS

AI Risk

AI Risk is anal­y­sis of the risks as­so­ci­ated with build­ing pow­er­ful AI sys­tems.

What failure looks like

paulfchristiano
17 Mar 2019 20:18 UTC
222 points
44 comments8 min readLW link

Su­per­in­tel­li­gence FAQ

Scott Alexander
20 Sep 2016 19:00 UTC
33 points
5 comments27 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika
3 Apr 2018 12:30 UTC
82 points
9 comments1 min readLW link2 nominations2 reviews

[Question] Is OpenAI in­creas­ing the ex­is­ten­tial risks re­lated to AI?

adamShimi
11 Aug 2020 18:16 UTC
33 points
16 comments1 min readLW link

Devel­op­men­tal Stages of GPTs

orthonormal
26 Jul 2020 22:03 UTC
116 points
57 comments7 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

Buck
21 Jul 2020 20:01 UTC
62 points
36 comments3 min readLW link

Cri­tiquing “What failure looks like”

Grue_Slinky
27 Dec 2019 23:59 UTC
26 points
4 comments3 min readLW link

The Main Sources of AI Risk?

21 Mar 2019 18:28 UTC
78 points
22 comments2 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
73 points
4 comments9 min readLW link

“Tak­ing AI Risk Se­ri­ously” (thoughts by Critch)

Raemon
29 Jan 2018 9:27 UTC
174 points
68 comments13 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

Alexis Carlier
8 Feb 2020 21:28 UTC
99 points
31 comments16 min readLW link

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala
12 Feb 2018 12:30 UTC
68 points
4 comments6 min readLW link
(kajsotala.fi)

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim
27 Mar 2018 1:39 UTC
65 points
9 comments6 min readLW link

Six AI Risk/​Strat­egy Ideas

Wei_Dai
27 Aug 2019 0:40 UTC
63 points
15 comments4 min readLW link

[Question] Did AI pi­o­neers not worry much about AI risks?

lisperati
9 Feb 2020 19:58 UTC
42 points
9 comments1 min readLW link

Some dis­junc­tive rea­sons for ur­gency on AI risk

Wei_Dai
15 Feb 2019 20:43 UTC
38 points
24 comments1 min readLW link

Drexler on AI Risk

PeterMcCluskey
1 Feb 2019 5:11 UTC
34 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

A shift in ar­gu­ments for AI risk

Richard_Ngo
28 May 2019 13:47 UTC
33 points
7 comments1 min readLW link
(fragile-credences.github.io)

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo
21 Jan 2019 12:41 UTC
125 points
23 comments8 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei_Dai
7 Sep 2019 2:54 UTC
105 points
24 comments4 min readLW link

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace
4 Oct 2019 4:08 UTC
186 points
49 comments15 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

rohinmshah
2 Jan 2020 18:20 UTC
34 points
93 comments10 min readLW link
(mailchi.mp)

In­tu­itions about goal-di­rected behavior

rohinmshah
1 Dec 2018 4:25 UTC
41 points
13 comments6 min readLW link

The strat­egy-steal­ing assumption

paulfchristiano
16 Sep 2019 15:23 UTC
68 points
39 comments12 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_Porter
16 Oct 2012 4:33 UTC
12 points
39 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_it
15 Jan 2018 14:33 UTC
167 points
68 comments2 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben Pace
29 Jul 2020 21:49 UTC
61 points
10 comments7 min readLW link

Uber Self-Driv­ing Crash

jefftk
7 Nov 2019 15:00 UTC
113 points
1 comment2 min readLW link
(www.jefftk.com)

Re­ply to Holden on ‘Tool AI’

Eliezer Yudkowsky
12 Jun 2012 18:00 UTC
112 points
357 comments17 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala
2 May 2020 7:35 UTC
42 points
19 comments7 min readLW link
(plato.stanford.edu)

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_Sotala
4 May 2018 8:56 UTC
37 points
1 comment1 min readLW link
(arxiv.org)

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel Kokotajlo
27 Feb 2020 18:10 UTC
29 points
5 comments8 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_Maxwell
11 Mar 2020 16:27 UTC
33 points
19 comments4 min readLW link

The prob­lem/​solu­tion ma­trix: Calcu­lat­ing the prob­a­bil­ity of AI safety “on the back of an en­velope”

John_Maxwell
20 Oct 2019 8:03 UTC
24 points
4 comments2 min readLW link

Three Sto­ries for How AGI Comes Be­fore FAI

John_Maxwell
17 Sep 2019 23:26 UTC
28 points
8 comments6 min readLW link

Brain­storm­ing ad­di­tional AI risk re­duc­tion ideas

John_Maxwell
14 Jun 2012 7:55 UTC
12 points
37 comments1 min readLW link

AI Align­ment 2018-19 Review

rohinmshah
28 Jan 2020 2:19 UTC
143 points
6 comments35 min readLW link

Ac­cess to AI: a hu­man right?

dmtea
25 Jul 2020 9:38 UTC
4 points
3 comments2 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCode
1 Aug 2020 18:03 UTC
11 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal
11 Sep 2019 23:20 UTC
48 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic
14 Nov 2011 17:02 UTC
83 points
9 comments56 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog
8 Jul 2013 14:33 UTC
11 points
8 comments1 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog
13 Nov 2012 21:56 UTC
48 points
92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

capybaralet
6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepa
1 Apr 2018 1:22 UTC
99 points
14 comments2 min readLW link

When is un­al­igned AI morally valuable?

paulfchristiano
25 May 2018 1:57 UTC
102 points
52 comments10 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

29 Oct 2018 21:07 UTC
92 points
8 comments6 min readLW link

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTrout
3 Jun 2018 3:16 UTC
118 points
7 comments8 min readLW link

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooen
9 Nov 2018 20:57 UTC
82 points
9 comments4 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei_Dai
12 Jan 2019 2:19 UTC
87 points
21 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei_Dai
17 Aug 2019 17:38 UTC
84 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei_Dai
16 Dec 2018 22:13 UTC
82 points
23 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it
20 Jan 2019 14:46 UTC
80 points
41 comments1 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

toonalfrink
30 Apr 2018 13:23 UTC
84 points
10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_Armstrong
22 May 2019 21:35 UTC
77 points
7 comments1 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks
24 Dec 2017 18:52 UTC
76 points
5 comments23 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky
12 Dec 2018 1:40 UTC
74 points
6 comments9 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi
28 Jul 2018 21:27 UTC
69 points
9 comments3 min readLW link
(github.com)

I Vouch For MIRI

Zvi
17 Dec 2017 17:50 UTC
66 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_it
18 Jan 2018 15:07 UTC
71 points
10 comments1 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi
12 Mar 2018 12:10 UTC
71 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Three AI Safety Re­lated Ideas

Wei_Dai
13 Dec 2018 21:32 UTC
64 points
38 comments2 min readLW link

A rant against robots

Lê Nguyên Hoang
14 Jan 2020 22:03 UTC
66 points
7 comments5 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

alexflint
31 Mar 2018 18:37 UTC
63 points
3 comments11 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace
3 Nov 2019 18:20 UTC
68 points
22 comments3 min readLW link
(meteuphoric.com)

Course recom­men­da­tions for Friendli­ness researchers

Louie
9 Jan 2013 14:33 UTC
67 points
112 comments10 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson
2 Feb 2018 4:25 UTC
64 points
11 comments8 min readLW link

AI Sum­mer Fel­lows Program

colm
21 Mar 2018 15:32 UTC
61 points
0 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger
6 Sep 2013 6:42 UTC
58 points
518 comments8 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

rohinmshah
2 Jul 2018 16:10 UTC
74 points
12 comments8 min readLW link
(mailchi.mp)

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi
1 Jul 2019 15:26 UTC
59 points
14 comments5 min readLW link

The sim­ple pic­ture on AI safety

alexflint
27 May 2018 19:43 UTC
59 points
10 comments2 min readLW link

Elon Musk donates $10M to the Fu­ture of Life In­sti­tute to keep AI benefi­cial

Paul Crowley
15 Jan 2015 16:33 UTC
56 points
52 comments1 min readLW link

Strate­gic im­pli­ca­tions of AIs’ abil­ity to co­or­di­nate at low cost, for ex­am­ple by merging

Wei_Dai
25 Apr 2019 5:08 UTC
57 points
42 comments2 min readLW link

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

xrchz
21 Jun 2019 12:50 UTC
47 points
6 comments1 min readLW link
(arxiv.org)

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogs
15 May 2018 20:11 UTC
46 points
12 comments1 min readLW link
(www.theatlantic.com)

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_Armstrong
10 Oct 2017 11:06 UTC
44 points
8 comments1 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]
23 Jul 2014 0:24 UTC
44 points
28 comments3 min readLW link

AI Align­ment Re­search Overview (by Ja­cob Stein­hardt)

Ben Pace
6 Nov 2019 19:24 UTC
44 points
0 comments7 min readLW link
(docs.google.com)

A gen­eral model of safety-ori­ented AI development

Wei_Dai
11 Jun 2018 21:00 UTC
71 points
8 comments1 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhub
7 Sep 2019 18:11 UTC
51 points
8 comments8 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei_Dai
10 Sep 2019 8:29 UTC
47 points
26 comments3 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong
7 Apr 2014 11:00 UTC
45 points
415 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_Armstrong
29 Jun 2015 20:41 UTC
44 points
46 comments2 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergal
31 Oct 2019 12:10 UTC
42 points
58 comments1 min readLW link
(aiimpacts.org)

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong
6 Feb 2020 11:50 UTC
41 points
25 comments3 min readLW link

The Mag­ni­tude of His Own Folly

Eliezer Yudkowsky
30 Sep 2008 11:31 UTC
44 points
128 comments6 min readLW link

AI al­ign­ment landscape

paulfchristiano
13 Oct 2019 2:10 UTC
43 points
3 comments1 min readLW link
(ai-alignment.com)

Launched: Friend­ship is Optimal

iceman
15 Nov 2012 4:57 UTC
40 points
31 comments1 min readLW link

Friend­ship is Op­ti­mal: A My Lit­tle Pony fan­fic about an op­ti­miza­tion process

iceman
8 Sep 2012 6:16 UTC
74 points
150 comments1 min readLW link

Do Earths with slower eco­nomic growth have a bet­ter chance at FAI?

Eliezer Yudkowsky
12 Jun 2013 19:54 UTC
39 points
176 comments4 min readLW link

Idea: Open Ac­cess AI Safety Journal

G Gordon Worley III
23 Mar 2018 18:27 UTC
65 points
11 comments1 min readLW link

G.K. Ch­ester­ton On AI Risk

Scott Alexander
1 Apr 2017 19:00 UTC
9 points
0 comments7 min readLW link

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky
24 Nov 2007 0:12 UTC
87 points
135 comments7 min readLW link

The Friendly AI Game

bentarm
15 Mar 2011 16:45 UTC
38 points
178 comments1 min readLW link

Q&A with Jür­gen Sch­mid­hu­ber on risks from AI

XiXiDu
15 Jun 2011 15:51 UTC
37 points
45 comments4 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

Razied
5 Aug 2020 23:52 UTC
3 points
3 comments1 min readLW link

Take­aways from safety by de­fault interviews

3 Apr 2020 17:20 UTC
24 points
2 comments13 min readLW link
(aiimpacts.org)

Field-Build­ing and Deep Models

Ben Pace
13 Jan 2018 21:16 UTC
55 points
12 comments4 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooen
8 Apr 2018 20:04 UTC
51 points
9 comments4 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_Armstrong
13 Jan 2015 13:35 UTC
33 points
19 comments2 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworth
8 Aug 2020 18:31 UTC
84 points
20 comments3 min readLW link

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks
19 Dec 2019 3:00 UTC
129 points
18 comments62 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

ofer
11 Aug 2020 17:30 UTC
13 points
0 comments2 min readLW link