RSS

There’s No Fire Alarm for Ar­tifi­cial Gen­eral Intelligence

Eliezer Yudkowsky
13 Oct 2017 21:38 UTC
281 points
67 comments25 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub
29 May 2020 20:38 UTC
149 points
29 comments38 min readLW link

What failure looks like

paulfchristiano
17 Mar 2019 20:18 UTC
222 points
43 comments8 min readLW link

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer Yudkowsky
19 May 2018 18:18 UTC
176 points
42 comments23 min readLW link3 nominations1 review

Embed­ded Agents

29 Oct 2018 19:53 UTC
195 points
41 comments1 min readLW link6 nominations2 reviews

The Rocket Align­ment Problem

Eliezer Yudkowsky
4 Oct 2018 0:38 UTC
167 points
41 comments15 min readLW link6 nominations2 reviews

Su­per­in­tel­li­gence FAQ

Scott Alexander
20 Sep 2016 19:00 UTC
28 points
5 comments27 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
128 points
32 comments12 min readLW link

AI Align­ment 2018-19 Review

rohinmshah
28 Jan 2020 2:19 UTC
140 points
6 comments35 min readLW link

That Alien Message

Eliezer Yudkowsky
22 May 2008 5:55 UTC
172 points
171 comments10 min readLW link

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
96 points
7 comments43 min readLW link

Good­hart Taxonomy

Scott Garrabrant
30 Dec 2017 16:38 UTC
206 points
32 comments10 min readLW link

Ro­bust­ness to Scale

Scott Garrabrant
21 Feb 2018 22:55 UTC
170 points
21 comments2 min readLW link3 nominations1 review

Chris Olah’s views on AGI safety

evhub
1 Nov 2019 20:13 UTC
143 points
34 comments12 min readLW link

[AN #96]: Buck and I dis­cuss/​ar­gue about AI Alignment

rohinmshah
22 Apr 2020 17:20 UTC
17 points
4 comments10 min readLW link
(mailchi.mp)

Co­her­ence ar­gu­ments do not im­ply goal-di­rected behavior

rohinmshah
3 Dec 2018 3:26 UTC
78 points
62 comments7 min readLW link2 nominations3 reviews

AlphaGo Zero and the Foom Debate

Eliezer Yudkowsky
21 Oct 2017 2:18 UTC
290 points
15 comments3 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

Vika
4 Jul 2020 11:56 UTC
31 points
3 comments5 min readLW link

Dis­con­tin­u­ous progress in his­tory: an update

KatjaGrace
14 Apr 2020 0:00 UTC
161 points
20 comments31 min readLW link
(aiimpacts.org)

Repli­ca­tion Dy­nam­ics Bridge to RL in Ther­mo­dy­namic Limit

Zachary Robertson
18 May 2020 1:02 UTC
6 points
1 comment2 min readLW link

The ground of optimization

alexflint
20 Jun 2020 0:38 UTC
108 points
59 comments27 min readLW link

Model­ling Con­tin­u­ous Progress

SDM
23 Jun 2020 18:06 UTC
27 points
3 comments7 min readLW link

An Un­trol­lable Math­e­mat­i­cian Illustrated

abramdemski
20 Mar 2018 0:00 UTC
271 points
38 comments1 min readLW link2 nominations1 review

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
59 points
41 comments12 min readLW link

Thoughts on Hu­man Models

21 Feb 2019 9:10 UTC
125 points
22 comments10 min readLW link

In­ner al­ign­ment in the brain

steve2152
22 Apr 2020 13:14 UTC
63 points
6 comments15 min readLW link

Prob­lem re­lax­ation as a tactic

TurnTrout
22 Apr 2020 23:44 UTC
102 points
8 comments7 min readLW link

[Question] How should po­ten­tial AI al­ign­ment re­searchers gauge whether the field is right for them?

TurnTrout
6 May 2020 12:24 UTC
20 points
4 comments1 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

6 May 2020 23:51 UTC
46 points
8 comments6 min readLW link

Les­sons from Isaac: Pit­falls of Reason

adamShimi
8 May 2020 20:44 UTC
10 points
0 comments8 min readLW link

Cor­rigi­bil­ity as out­side view

TurnTrout
8 May 2020 21:56 UTC
31 points
11 comments4 min readLW link

[Question] How to choose a PhD with AI Safety in mind

Ariel Kwiatkowski
15 May 2020 22:19 UTC
9 points
1 comment1 min readLW link

Re­ward func­tions and up­dat­ing as­sump­tions can hide a mul­ti­tude of sins

Stuart_Armstrong
18 May 2020 15:18 UTC
16 points
2 comments9 min readLW link

Pos­si­ble take­aways from the coro­n­avirus pan­demic for slow AI takeoff

Vika
31 May 2020 17:51 UTC
105 points
32 comments3 min readLW link

Fo­cus: you are al­lowed to be bad at ac­com­plish­ing your goals

adamShimi
3 Jun 2020 21:04 UTC
20 points
14 comments3 min readLW link

Re­ply to Paul Chris­ti­ano’s “Inac­cessible In­for­ma­tion”

alexflint
5 Jun 2020 9:10 UTC
75 points
15 comments6 min readLW link

My take on CHAI’s re­search agenda in un­der 1500 words

alexflint
17 Jun 2020 12:24 UTC
61 points
8 comments5 min readLW link

[Question] Ques­tion on GPT-3 Ex­cel Demo

Zhitao Hou
22 Jun 2020 20:31 UTC
1 point
0 comments1 min readLW link

[Question] What are the rel­a­tive speeds of AI ca­pa­bil­ities and AI safety?

NunoSempere
24 Apr 2020 18:21 UTC
8 points
2 comments1 min readLW link

“Don’t even think about hell”

emmab
2 May 2020 8:06 UTC
6 points
2 comments1 min readLW link

Point­ing to a Flower

johnswentworth
18 May 2020 18:54 UTC
51 points
18 comments9 min readLW link

Learn­ing and ma­nipu­lat­ing learning

Stuart_Armstrong
19 May 2020 13:02 UTC
38 points
4 comments10 min readLW link

[Question] Why aren’t we test­ing gen­eral in­tel­li­gence dis­tri­bu­tion?

Bob Jacobs
26 May 2020 16:07 UTC
24 points
7 comments1 min readLW link

[Question] Pes­simism over AGI/​ASI caus­ing psy­cholog­i­cal dis­tress?

Anirandis
2 Jun 2020 18:28 UTC
7 points
4 comments1 min readLW link

OpenAI an­nounces GPT-3

gwern
29 May 2020 1:49 UTC
64 points
23 comments1 min readLW link
(arxiv.org)

GPT-3: a dis­ap­point­ing paper

nostalgebraist
29 May 2020 19:06 UTC
40 points
32 comments8 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_Armstrong
17 Jun 2020 17:44 UTC
53 points
2 comments1 min readLW link

[Question] Like­li­hood of hy­per­ex­is­ten­tial catas­tro­phe from a bug?

Anirandis
18 Jun 2020 16:23 UTC
10 points
24 comments1 min readLW link

AI Benefits Post 1: In­tro­duc­ing “AI Benefits”

Cullen_OKeefe
22 Jun 2020 16:59 UTC
11 points
3 comments3 min readLW link

Goals and short descriptions

Michele Campolo
2 Jul 2020 17:41 UTC
9 points
8 comments5 min readLW link

Re­search ideas to study hu­mans with AI Safety in mind

Riccardo Volpato
3 Jul 2020 16:01 UTC
22 points
2 comments5 min readLW link

Real­ism about rationality

ricraz
16 Sep 2018 10:46 UTC
174 points
136 comments4 min readLW link3 nominations3 reviews
(thinkingcomplete.blogspot.com)

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace
4 Oct 2019 4:08 UTC
176 points
49 comments15 min readLW link

The Parable of Pre­dict-O-Matic

abramdemski
15 Oct 2019 0:49 UTC
194 points
19 comments14 min readLW link

2018 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks
18 Dec 2018 4:46 UTC
195 points
26 comments62 min readLW link2 nominations1 review

[AN #94]: AI al­ign­ment as trans­la­tion be­tween hu­mans and machines

rohinmshah
8 Apr 2020 17:10 UTC
11 points
0 comments7 min readLW link
(mailchi.mp)

An Ortho­dox Case Against Utility Functions

abramdemski
7 Apr 2020 19:18 UTC
114 points
49 comments8 min readLW link

“How con­ser­va­tive” should the par­tial max­imisers be?

Stuart_Armstrong
13 Apr 2020 15:50 UTC
20 points
8 comments2 min readLW link

[AN #95]: A frame­work for think­ing about how to make AI go well

rohinmshah
15 Apr 2020 17:10 UTC
20 points
2 comments10 min readLW link
(mailchi.mp)

AI Align­ment Pod­cast: An Overview of Tech­ni­cal AI Align­ment in 2018 and 2019 with Buck Sh­legeris and Ro­hin Shah

Palus Astra
16 Apr 2020 0:50 UTC
46 points
27 comments89 min readLW link

Open ques­tion: are min­i­mal cir­cuits dae­mon-free?

paulfchristiano
5 May 2018 22:40 UTC
122 points
69 comments2 min readLW link2 nominations1 review

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

ricraz
21 Jan 2019 12:41 UTC
124 points
23 comments8 min readLW link

[AI Align­ment Fo­rum] Database Main­te­nance Today

habryka
16 Apr 2020 19:11 UTC
9 points
0 comments1 min readLW link

In­te­grat­ing Hid­den Vari­ables Im­proves Approximation

johnswentworth
16 Apr 2020 21:43 UTC
15 points
2 comments1 min readLW link

AI Ser­vices as a Re­search Paradigm

VojtaKovarik
20 Apr 2020 13:00 UTC
27 points
12 comments4 min readLW link
(docs.google.com)

Databases of hu­man be­havi­our and prefer­ences?

Stuart_Armstrong
21 Apr 2020 18:06 UTC
10 points
9 comments1 min readLW link

Critch on ca­reer ad­vice for ju­nior AI-x-risk-con­cerned researchers

Rob Bensinger
12 May 2018 2:13 UTC
208 points
25 comments4 min readLW link

Refram­ing Impact

TurnTrout
20 Sep 2019 19:03 UTC
90 points
11 comments3 min readLW link

De­scrip­tion vs simu­lated prediction

Richard Korzekwa
22 Apr 2020 16:40 UTC
27 points
0 comments5 min readLW link
(aiimpacts.org)

Deep­Mind team on speci­fi­ca­tion gaming

JoshuaFox
23 Apr 2020 8:01 UTC
29 points
2 comments1 min readLW link
(deepmind.com)

[Question] Does Agent-like Be­hav­ior Im­ply Agent-like Ar­chi­tec­ture?

Scott Garrabrant
23 Aug 2019 2:01 UTC
45 points
7 comments1 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
65 points
4 comments6 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
64 points
9 comments17 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
73 points
17 comments13 min readLW link

How the MtG Color Wheel Ex­plains AI Safety

Scott Garrabrant
15 Feb 2019 23:42 UTC
66 points
4 comments6 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott Garrabrant
2 Feb 2019 0:14 UTC
71 points
19 comments4 min readLW link

For­mal Open Prob­lem in De­ci­sion Theory

Scott Garrabrant
29 Nov 2018 3:25 UTC
30 points
11 comments4 min readLW link

The Ubiquitous Con­verse Law­vere Problem

Scott Garrabrant
29 Nov 2018 3:16 UTC
19 points
0 comments2 min readLW link

Embed­ded Curiosities

8 Nov 2018 14:19 UTC
86 points
1 comment2 min readLW link

Sub­sys­tem Alignment

6 Nov 2018 16:16 UTC
121 points
12 comments1 min readLW link

Ro­bust Delegation

4 Nov 2018 16:38 UTC
120 points
10 comments1 min readLW link

Embed­ded World-Models

2 Nov 2018 16:07 UTC
91 points
15 comments1 min readLW link

De­ci­sion Theory

31 Oct 2018 18:41 UTC
101 points
37 comments1 min readLW link

(A → B) → A

Scott Garrabrant
11 Sep 2018 22:38 UTC
46 points
10 comments2 min readLW link

His­tory of the Devel­op­ment of Log­i­cal Induction

Scott Garrabrant
29 Aug 2018 3:15 UTC
94 points
4 comments5 min readLW link

Op­ti­miza­tion Amplifies

Scott Garrabrant
27 Jun 2018 1:51 UTC
102 points
12 comments4 min readLW link2 nominations

What makes coun­ter­fac­tu­als com­pa­rable?

Chris_Leong
24 Apr 2020 22:47 UTC
11 points
6 comments3 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott Garrabrant
14 Mar 2018 9:01 UTC
50 points
4 comments1 min readLW link
(arxiv.org)

Sources of in­tu­itions and data on AGI

Scott Garrabrant
31 Jan 2018 23:30 UTC
154 points
26 comments3 min readLW link

Corrigibility

paulfchristiano
27 Nov 2018 21:50 UTC
40 points
3 comments6 min readLW link

AI pre­dic­tion case study 5: Omo­hun­dro’s AI drives

Stuart_Armstrong
15 Mar 2013 9:09 UTC
5 points
5 comments8 min readLW link

Toy model: con­ver­gent in­stru­men­tal goals

Stuart_Armstrong
25 Feb 2016 14:03 UTC
8 points
2 comments4 min readLW link

AI-cre­ated pseudo-deontology

Stuart_Armstrong
12 Feb 2015 21:11 UTC
6 points
35 comments1 min readLW link

Eth­i­cal Injunctions

Eliezer Yudkowsky
20 Oct 2008 23:00 UTC
46 points
76 comments9 min readLW link

Seek­ing Power is In­stru­men­tally Con­ver­gent in MDPs

5 Dec 2019 2:33 UTC
109 points
23 comments11 min readLW link
(arxiv.org)

Mo­ti­vat­ing Ab­strac­tion-First De­ci­sion Theory

johnswentworth
29 Apr 2020 17:47 UTC
38 points
16 comments5 min readLW link

[AN #97]: Are there his­tor­i­cal ex­am­ples of large, ro­bust dis­con­ti­nu­ities?

rohinmshah
29 Apr 2020 17:30 UTC
15 points
0 comments10 min readLW link
(mailchi.mp)

My Up­dat­ing Thoughts on AI policy

Ben Pace
1 Mar 2020 7:06 UTC
22 points
1 comment9 min readLW link

Use­ful Does Not Mean Secure

Ben Pace
30 Nov 2019 2:05 UTC
49 points
12 comments11 min readLW link

[Question] What is the al­ter­na­tive to in­tent al­ign­ment called?

ricraz
30 Apr 2020 2:16 UTC
10 points
5 comments1 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraper
30 Apr 2020 10:47 UTC
3 points
0 comments51 min readLW link

[Question] Juke­box: how to up­date from AI imi­tat­ing hu­mans?

Michaël Trazzi
30 Apr 2020 20:50 UTC
9 points
0 comments1 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala
2 May 2020 7:35 UTC
42 points
19 comments7 min readLW link
(plato.stanford.edu)

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissa
2 May 2020 23:44 UTC
21 points
9 comments2 min readLW link

How uniform is the neo­cor­tex?

zhukeepa
4 May 2020 2:16 UTC
69 points
22 comments11 min readLW link

Scott Garrabrant’s prob­lem on re­cov­er­ing Brouwer as a corol­lary of Lawvere

Rupert
4 May 2020 10:01 UTC
25 points
2 comments2 min readLW link

“AI and Effi­ciency”, OA (44✕ im­prove­ment in CNNs since 2012)

gwern
5 May 2020 16:32 UTC
47 points
0 comments1 min readLW link
(openai.com)

Com­pet­i­tive safety via gra­dated curricula

ricraz
5 May 2020 18:11 UTC
34 points
5 comments5 min readLW link

Model­ing nat­u­ral­ized de­ci­sion prob­lems in lin­ear logic

jessicata
6 May 2020 0:15 UTC
15 points
2 comments6 min readLW link
(unstableontology.com)

[AN #98]: Un­der­stand­ing neu­ral net train­ing by see­ing which gra­di­ents were helpful

rohinmshah
6 May 2020 17:10 UTC
20 points
3 comments9 min readLW link
(mailchi.mp)

[Question] AI Box­ing for Hard­ware-bound agents (aka the China al­ign­ment prob­lem)

Logan Zoellner
8 May 2020 15:50 UTC
11 points
27 comments10 min readLW link

[Question] Is AI safety re­search less par­alleliz­able than AI re­search?

Mati_Roy
10 May 2020 20:43 UTC
9 points
5 comments1 min readLW link

Thoughts on im­ple­ment­ing cor­rigible ro­bust alignment

steve2152
26 Nov 2019 14:06 UTC
26 points
2 comments6 min readLW link

Wire­head­ing is in the eye of the beholder

Stuart_Armstrong
30 Jan 2019 18:23 UTC
26 points
10 comments1 min readLW link

Wire­head­ing as a po­ten­tial prob­lem with the new im­pact measure

Stuart_Armstrong
25 Sep 2018 14:15 UTC
25 points
20 comments4 min readLW link

Wire­head­ing and discontinuity

Michele Campolo
18 Feb 2020 10:49 UTC
22 points
4 comments3 min readLW link

[AN #99]: Dou­bling times for the effi­ciency of AI algorithms

rohinmshah
13 May 2020 17:20 UTC
30 points
0 comments10 min readLW link
(mailchi.mp)

How should AIs up­date a prior over hu­man prefer­ences?

Stuart_Armstrong
15 May 2020 13:14 UTC
17 points
9 comments2 min readLW link

Could We Give an AI a Solu­tion?

Liam Goddard
15 May 2020 21:38 UTC
3 points
2 comments2 min readLW link

Con­jec­ture Workshop

johnswentworth
15 May 2020 22:41 UTC
34 points
2 comments2 min readLW link

Multi-agent safety

ricraz
16 May 2020 1:59 UTC
21 points
7 comments5 min readLW link

Allow­ing Ex­ploita­bil­ity in Game Theory

Liam Goddard
17 May 2020 23:19 UTC
2 points
3 comments2 min readLW link

The Mechanis­tic and Nor­ma­tive Struc­ture of Agency

G Gordon Worley III
18 May 2020 16:03 UTC
14 points
4 comments1 min readLW link
(philpapers.org)

“Star­wink” by Alicorn

Zack_M_Davis
18 May 2020 8:17 UTC
40 points
1 comment1 min readLW link
(alicorn.elcenia.com)

[AN #100]: What might go wrong if you learn a re­ward func­tion while acting

rohinmshah
20 May 2020 17:30 UTC
33 points
2 comments12 min readLW link
(mailchi.mp)

Prob­a­bil­ities, weights, sums: pretty much the same for re­ward functions

Stuart_Armstrong
20 May 2020 15:19 UTC
11 points
1 comment2 min readLW link

[Question] Source code size vs learned model size in ML and in hu­mans?

riceissa
20 May 2020 8:47 UTC
11 points
6 comments1 min readLW link

Com­par­ing re­ward learn­ing/​re­ward tam­per­ing formalisms

Stuart_Armstrong
21 May 2020 12:03 UTC
9 points
1 comment3 min readLW link

AGIs as populations

ricraz
22 May 2020 20:36 UTC
20 points
23 comments4 min readLW link

[AN #101]: Why we should rigor­ously mea­sure and fore­cast AI progress

rohinmshah
27 May 2020 17:20 UTC
15 points
0 comments10 min readLW link
(mailchi.mp)

AI Safety Dis­cus­sion Days

Linda Linsefors
27 May 2020 16:54 UTC
11 points
0 comments3 min readLW link

In­tro­duc­tion to Ex­is­ten­tial Risks from Ar­tifi­cial In­tel­li­gence, for an EA audience

JoshuaFox
2 Jun 2020 8:30 UTC
9 points
1 comment1 min readLW link

Build­ing brain-in­spired AGI is in­finitely eas­ier than un­der­stand­ing the brain

steve2152
2 Jun 2020 14:13 UTC
37 points
6 comments7 min readLW link

Spar­sity and in­ter­pretabil­ity?

1 Jun 2020 13:25 UTC
41 points
3 comments7 min readLW link

GPT-3: A Summary

leogao
2 Jun 2020 18:14 UTC
20 points
0 comments1 min readLW link
(leogao.dev)

Inac­cessible information

paulfchristiano
3 Jun 2020 5:10 UTC
77 points
15 comments14 min readLW link
(ai-alignment.com)

[AN #102]: Meta learn­ing by GPT-3, and a list of full pro­pos­als for AI alignment

rohinmshah
3 Jun 2020 17:20 UTC
38 points
5 comments10 min readLW link
(mailchi.mp)

Feed­back is cen­tral to agency

alexflint
1 Jun 2020 12:56 UTC
29 points
0 comments3 min readLW link

Defin­ing AGI

lsusr
4 Jun 2020 10:59 UTC
7 points
1 comment2 min readLW link

Think­ing About Su­per-Hu­man AI: An Ex­am­i­na­tion of Likely Paths and Ul­ti­mate Constitution

meanderingmoose
4 Jun 2020 23:22 UTC
−3 points
0 comments7 min readLW link

Emer­gence and Con­trol: An ex­am­i­na­tion of our abil­ity to gov­ern the be­hav­ior of in­tel­li­gent systems

meanderingmoose
5 Jun 2020 17:10 UTC
1 point
0 comments6 min readLW link

GAN Discrim­i­na­tors Don’t Gen­er­al­ize?

tryactions
8 Jun 2020 20:36 UTC
18 points
7 comments2 min readLW link

More on dis­am­biguat­ing “dis­con­ti­nu­ity”

alenglander
9 Jun 2020 15:16 UTC
16 points
1 comment3 min readLW link

[AN #103]: ARCHES: an agenda for ex­is­ten­tial safety, and com­bin­ing nat­u­ral lan­guage with deep RL

rohinmshah
10 Jun 2020 17:20 UTC
22 points
1 comment10 min readLW link
(mailchi.mp)

Dutch-Book­ing CDT: Re­vised Argument

abramdemski
11 Jun 2020 23:34 UTC
45 points
6 comments16 min readLW link

Prepar­ing for “The Talk” with AI projects

Daniel Kokotajlo
13 Jun 2020 23:01 UTC
60 points
16 comments3 min readLW link

[Question] List of pub­lic pre­dic­tions of what GPT-X can or can’t do?

Daniel Kokotajlo
14 Jun 2020 14:25 UTC
20 points
9 comments1 min readLW link

Achiev­ing AI al­ign­ment through de­liber­ate un­cer­tainty in mul­ti­a­gent systems

Florian Dietz
15 Jun 2020 12:19 UTC
3 points
10 comments7 min readLW link

Su­per­ex­po­nen­tial His­toric Growth, by David Roodman

Ben Pace
15 Jun 2020 21:49 UTC
43 points
6 comments5 min readLW link
(www.openphilanthropy.org)

[Question] What are the high-level ap­proaches to AI al­ign­ment?

G Gordon Worley III
16 Jun 2020 17:10 UTC
13 points
13 comments1 min readLW link

Re­lat­ing HCH and Log­i­cal Induction

abramdemski
16 Jun 2020 22:08 UTC
47 points
4 comments5 min readLW link

Image GPT

Daniel Kokotajlo
18 Jun 2020 11:41 UTC
30 points
27 comments1 min readLW link
(openai.com)

[AN #104]: The per­ils of in­ac­cessible in­for­ma­tion, and what we can learn about AI al­ign­ment from COVID

rohinmshah
18 Jun 2020 17:10 UTC
19 points
5 comments8 min readLW link
(mailchi.mp)

[Question] If AI is based on GPT, how to en­sure its safety?

avturchin
18 Jun 2020 20:33 UTC
20 points
11 comments1 min readLW link

What’s Your Cog­ni­tive Al­gorithm?

Raemon
18 Jun 2020 22:16 UTC
68 points
23 comments13 min readLW link

Rele­vant pre-AGI possibilities

Daniel Kokotajlo
20 Jun 2020 10:52 UTC
16 points
1 comment19 min readLW link
(aiimpacts.org)

Plau­si­ble cases for HRAD work, and lo­cat­ing the crux in the “re­al­ism about ra­tio­nal­ity” debate

riceissa
22 Jun 2020 1:10 UTC
68 points
12 comments10 min readLW link

The In­dex­ing Problem

johnswentworth
22 Jun 2020 19:11 UTC
38 points
2 comments4 min readLW link

[Question] Re­quest­ing feed­back/​ad­vice: what Type The­ory to study for AI safety?

rvnnt
23 Jun 2020 17:03 UTC
7 points
4 comments3 min readLW link

Lo­cal­ity of goals

adamShimi
22 Jun 2020 21:56 UTC
15 points
7 comments6 min readLW link

[Question] What is “In­stru­men­tal Cor­rigi­bil­ity”?

joebernstein
23 Jun 2020 20:24 UTC
3 points
1 comment1 min readLW link

Models, myths, dreams, and Cheshire cat grins

Stuart_Armstrong
24 Jun 2020 10:50 UTC
21 points
7 comments2 min readLW link

[AN #105]: The eco­nomic tra­jec­tory of hu­man­ity, and what we might mean by optimization

rohinmshah
24 Jun 2020 17:30 UTC
24 points
3 comments11 min readLW link
(mailchi.mp)

There’s an Awe­some AI Ethics List and it’s a lit­tle thin

AABoyles
25 Jun 2020 13:43 UTC
13 points
1 comment1 min readLW link
(github.com)

GPT-3 Fic­tion Samples

gwern
25 Jun 2020 16:12 UTC
61 points
15 comments1 min readLW link
(www.gwern.net)

Walk­through: The Trans­former Ar­chi­tec­ture [Part 1/​2]

Matthew Barnett
30 Jul 2019 13:54 UTC
35 points
0 comments6 min readLW link

Ro­bust­ness as a Path to AI Alignment

abramdemski
10 Oct 2017 8:14 UTC
66 points
9 comments9 min readLW link

Rad­i­cal Prob­a­bil­ism [Tran­script]

26 Jun 2020 22:14 UTC
45 points
11 comments6 min readLW link

AI safety via mar­ket making

evhub
26 Jun 2020 23:07 UTC
47 points
4 comments11 min readLW link

[Question] Have gen­eral de­com­posers been for­mal­ized?

Quinn
27 Jun 2020 18:09 UTC
6 points
3 comments1 min readLW link

Gary Mar­cus vs Cor­ti­cal Uniformity

steve2152
28 Jun 2020 18:18 UTC
18 points
0 comments8 min readLW link

Web AI dis­cus­sion Groups

Donald Hobson
30 Jun 2020 11:22 UTC
10 points
0 comments2 min readLW link

Com­par­ing AI Align­ment Ap­proaches to Min­i­mize False Pos­i­tive Risk

G Gordon Worley III
30 Jun 2020 19:34 UTC
6 points
0 comments9 min readLW link

AvE: As­sis­tance via Empowerment

FactorialCode
30 Jun 2020 22:07 UTC
12 points
1 comment1 min readLW link
(arxiv.org)

Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

Palus Astra
1 Jul 2020 17:30 UTC
35 points
3 comments67 min readLW link

[AN #106]: Eval­u­at­ing gen­er­al­iza­tion abil­ity of learned re­ward models

rohinmshah
1 Jul 2020 17:20 UTC
14 points
2 comments11 min readLW link
(mailchi.mp)

The “AI De­bate” Debate

michaelcohen
2 Jul 2020 10:16 UTC
19 points
11 comments3 min readLW link

Noise on the Channel

abramdemski
2 Jul 2020 1:58 UTC
29 points
8 comments10 min readLW link

Idea: Imi­ta­tion/​Value Learn­ing AIXI

Zachary Robertson
3 Jul 2020 17:10 UTC
3 points
6 comments1 min readLW link

Split­ting De­bate up into Two Subsystems

Nandi
3 Jul 2020 20:11 UTC
11 points
5 comments4 min readLW link

AI Un­safety via Non-Zero-Sum Debate

VojtaKovarik
3 Jul 2020 22:03 UTC
10 points
7 comments5 min readLW link

Clas­sify­ing games like the Pri­soner’s Dilemma

philh
4 Jul 2020 17:10 UTC
51 points
13 comments6 min readLW link
(reasonableapproximation.net)

AI-Feyn­man as a bench­mark for what we should be aiming for

Faustus2
4 Jul 2020 9:24 UTC
8 points
1 comment2 min readLW link

Learn­ing the prior

paulfchristiano
5 Jul 2020 21:00 UTC
51 points
13 comments8 min readLW link
(ai-alignment.com)

Bet­ter pri­ors as a safety problem

paulfchristiano
5 Jul 2020 21:20 UTC
38 points
2 comments5 min readLW link
(ai-alignment.com)

[Question] How far is AGI?

Roko Jelavić
5 Jul 2020 17:58 UTC
6 points
5 comments1 min readLW link

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

19 Aug 2019 20:40 UTC
71 points
2 comments5 min readLW link

New safety re­search agenda: scal­able agent al­ign­ment via re­ward modeling

Vika
20 Nov 2018 17:29 UTC
35 points
13 comments1 min readLW link
(medium.com)

De­sign­ing agent in­cen­tives to avoid side effects

11 Mar 2019 20:55 UTC
31 points
0 comments2 min readLW link
(medium.com)

Dis­cus­sion on the ma­chine learn­ing ap­proach to AI safety

Vika
1 Nov 2018 20:54 UTC
28 points
3 comments4 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika
3 Apr 2018 12:30 UTC
82 points
9 comments1 min readLW link2 nominations2 reviews

[Question] Has any­one writ­ten up a con­sid­er­a­tion of Downs’s “Para­dox of Vot­ing” from the per­spec­tive of MIRI-ish de­ci­sion the­o­ries (UDT, FDT, or even just EDT)?

Jameson Quinn
6 Jul 2020 18:26 UTC
6 points
8 comments1 min readLW link

The Hu­man’s Hid­den Utility Func­tion (Maybe)

lukeprog
23 Jan 2012 19:39 UTC
47 points
88 comments3 min readLW link

Us­ing vec­tor fields to vi­su­al­ise prefer­ences and make them consistent

28 Jan 2020 19:44 UTC
41 points
32 comments11 min readLW link

[Ar­ti­cle re­view] Ar­tifi­cial In­tel­li­gence, Values, and Alignment

MichaelA
9 Mar 2020 12:42 UTC
13 points
5 comments10 min readLW link

Ap­prov­ing re­in­forces low-effort behaviors

Scott Alexander
17 Jul 2011 20:43 UTC
104 points
25 comments4 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
72 points
4 comments9 min readLW link

Failures in tech­nol­ogy fore­cast­ing? A re­ply to Ord and Yudkowsky

MichaelA
8 May 2020 12:41 UTC
46 points
19 comments11 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

Alexis Carlier
8 Feb 2020 21:28 UTC
98 points
31 comments16 min readLW link

[Link and com­men­tary] The Offense-Defense Balance of Scien­tific Knowl­edge: Does Pub­lish­ing AI Re­search Re­duce Mi­suse?

MichaelA
16 Feb 2020 19:56 UTC
24 points
4 comments3 min readLW link

How can In­ter­pretabil­ity help Align­ment?

23 May 2020 16:16 UTC
31 points
3 comments9 min readLW link

A Prob­lem With Patternism

Bob Jacobs
19 May 2020 20:16 UTC
5 points
52 comments1 min readLW link

Goal-di­rect­ed­ness is be­hav­ioral, not structural

adamShimi
8 Jun 2020 23:05 UTC
7 points
12 comments3 min readLW link

Learn­ing Deep Learn­ing: Join­ing data sci­ence re­search as a mathematician

magfrump
19 Oct 2017 19:14 UTC
29 points
4 comments3 min readLW link

Will AI un­dergo dis­con­tin­u­ous progress?

SDM
21 Feb 2020 22:16 UTC
22 points
20 comments20 min readLW link

The Value Defi­ni­tion Problem

SDM
18 Nov 2019 19:56 UTC
14 points
6 comments11 min readLW link

Life at Three Tails of the Bell Curve

lsusr
27 Jun 2020 8:49 UTC
49 points
7 comments4 min readLW link

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

KR
29 Jun 2020 22:06 UTC
15 points
1 comment8 min readLW link

AI Benefits Post 2: How AI Benefits Differs from AI Align­ment & AI for Good

Cullen_OKeefe
29 Jun 2020 17:00 UTC
8 points
0 comments2 min readLW link

[Question] Most prob­a­ble AGI sce­nar­ios?

Anirandis
6 Jul 2020 17:20 UTC
3 points
0 comments1 min readLW link

AI Benefits Post 3: Direct and Indi­rect Ap­proaches to AI Benefits

Cullen_OKeefe
6 Jul 2020 18:48 UTC
2 points
0 comments2 min readLW link