RSS

Scott Garrabrant

Karma: 5,078 (LW), 229 (AF)
Page 1

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

evhub
7 Jun 2019 19:53 UTC
52 points
0 comments6 min readLW link

De­cep­tive Alignment

evhub
5 Jun 2019 20:16 UTC
55 points
4 comments17 min readLW link

The In­ner Align­ment Problem

evhub
4 Jun 2019 1:20 UTC
60 points
13 comments13 min readLW link

Con­di­tions for Mesa-Optimization

evhub
1 Jun 2019 20:52 UTC
48 points
27 comments12 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

evhub
31 May 2019 23:44 UTC
101 points
27 comments12 min readLW link

Yes Re­quires the Pos­si­bil­ity of No

Scott Garrabrant
17 May 2019 22:39 UTC
125 points
36 comments2 min readLW link

Thoughts on Hu­man Models

xrchz
21 Feb 2019 9:10 UTC
122 points
21 comments10 min readLW link

Epistemic Tenure

Scott Garrabrant
18 Feb 2019 22:56 UTC
66 points
27 comments3 min readLW link

How the MtG Color Wheel Ex­plains AI Safety

Scott Garrabrant
15 Feb 2019 23:42 UTC
66 points
4 comments6 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott Garrabrant
2 Feb 2019 0:14 UTC
70 points
19 comments4 min readLW link