RSS

Scott Garrabrant

Karma: 5,417
Page 1

[Question] Does Agent-like Be­hav­ior Im­ply Agent-like Ar­chi­tec­ture?

Scott Garrabrant
23 Aug 2019 2:01 UTC
45 points
7 comments1 min readLW link

In­ten­tional Bucket Errors

Scott Garrabrant
22 Aug 2019 20:02 UTC
61 points
6 comments3 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

evhub
7 Jun 2019 19:53 UTC
63 points
2 comments6 min readLW link

De­cep­tive Alignment

evhub
5 Jun 2019 20:16 UTC
61 points
7 comments17 min readLW link

The In­ner Align­ment Problem

evhub
4 Jun 2019 1:20 UTC
66 points
16 comments13 min readLW link

Con­di­tions for Mesa-Optimization

evhub
1 Jun 2019 20:52 UTC
57 points
37 comments12 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

evhub
31 May 2019 23:44 UTC
117 points
31 comments12 min readLW link

Yes Re­quires the Pos­si­bil­ity of No

Scott Garrabrant
17 May 2019 22:39 UTC
127 points
39 comments2 min readLW link

Thoughts on Hu­man Models

xrchz
21 Feb 2019 9:10 UTC
124 points
22 comments10 min readLW link