RSS

Rohin Shah(Rohin Shah)

Karma: 14,230

Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://​​rohinshah.com/​​

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

13 May 2022 12:17 UTC
150 points
34 comments9 min readLW link

AI Align­ment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC
126 points
6 comments35 min readLW link

Co­her­ence ar­gu­ments do not en­tail goal-di­rected behavior

Rohin Shah3 Dec 2018 3:26 UTC
123 points
69 comments7 min readLW link3 reviews

Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral Intelligence

Rohin Shah8 Jan 2019 7:12 UTC
121 points
77 comments5 min readLW link2 reviews
(www.fhi.ox.ac.uk)

The Align­ment Prob­lem: Ma­chine Learn­ing and Hu­man Values

Rohin Shah6 Oct 2020 17:41 UTC
120 points
7 comments6 min readLW link1 review
(www.amazon.com)

Align­ment Newslet­ter One Year Retrospective

Rohin Shah10 Apr 2019 6:58 UTC
94 points
31 comments21 min readLW link

Cat­e­go­riz­ing failures as “outer” or “in­ner” mis­al­ign­ment is of­ten confused

Rohin Shah6 Jan 2023 15:48 UTC
86 points
21 comments8 min readLW link

Shah and Yud­kowsky on al­ign­ment failures

28 Feb 2022 19:18 UTC
85 points
39 comments91 min readLW link1 review

Pre­face to the se­quence on value learning

Rohin Shah30 Oct 2018 22:04 UTC
70 points
6 comments3 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

FAQ: Ad­vice for AI Align­ment Researchers

Rohin Shah26 Apr 2021 18:59 UTC
67 points
2 comments1 min readLW link
(rohinshah.com)

AI safety with­out goal-di­rected behavior

Rohin Shah7 Jan 2019 7:48 UTC
66 points
15 comments4 min readLW link

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin Shah19 Oct 2019 0:30 UTC
60 points
12 comments15 min readLW link
(mailchi.mp)

Will hu­mans build goal-di­rected agents?

Rohin Shah5 Jan 2019 1:33 UTC
60 points
43 comments5 min readLW link

BASALT: A Bench­mark for Learn­ing from Hu­man Feedback

Rohin Shah8 Jul 2021 17:40 UTC
56 points
20 comments2 min readLW link
(bair.berkeley.edu)