Rohin Shah(Rohin Shah)

Karma: 14,316

Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://rohinshah.com/

DeepMind is hiring for the Scalable Alignment and Alignment Teams

Rohin Shah and Geoffrey Irving

13 May 2022 12:17 UTC

150 points

34 comments9 min readLW link

AI Alignment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC

126 points

6 comments35 min readLW link

Coherence arguments do not entail goal-directed behavior

Rohin Shah3 Dec 2018 3:26 UTC

123 points

69 comments7 min readLW link 3 reviews

Reframing Superintelligence: Comprehensive AI Services as General Intelligence

Rohin Shah8 Jan 2019 7:12 UTC

121 points

77 comments5 min readLW link 2 reviews

(www.fhi.ox.ac.uk)

The Alignment Problem: Machine Learning and Human Values

Rohin Shah6 Oct 2020 17:41 UTC

120 points

7 comments6 min readLW link 1 review

(www.amazon.com)

Alignment Newsletter One Year Retrospective

Rohin Shah10 Apr 2019 6:58 UTC

94 points

31 comments21 min readLW link

Categorizing failures as “outer” or “inner” misalignment is often confused

Rohin Shah6 Jan 2023 15:48 UTC

86 points

21 comments8 min readLW link

Shah and Yudkowsky on alignment failures

Rohin Shah and Eliezer Yudkowsky

28 Feb 2022 19:18 UTC

85 points

39 comments91 min readLW link 1 review

Preface to the sequence on value learning

Rohin Shah30 Oct 2018 22:04 UTC

70 points

6 comments3 min readLW link

Alignment Newsletter #13: 07/02/18

Rohin Shah2 Jul 2018 16:10 UTC

70 points

12 comments8 min readLW link

(mailchi.mp)

FAQ: Advice for AI Alignment Researchers

Rohin Shah26 Apr 2021 18:59 UTC

67 points

2 comments1 min readLW link

(rohinshah.com)

AI safety without goal-directed behavior

Rohin Shah7 Jan 2019 7:48 UTC

66 points

15 comments4 min readLW link

Will humans build goal-directed agents?

Rohin Shah5 Jan 2019 1:33 UTC

60 points

43 comments5 min readLW link

[AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI

Rohin Shah19 Oct 2019 0:30 UTC

60 points

12 comments15 min readLW link

(mailchi.mp)

BASALT: A Benchmark for Learning from Human Feedback

Rohin Shah8 Jul 2021 17:40 UTC

56 points

20 comments2 min readLW link

(bair.berkeley.edu)

[AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah24 Jun 2019 16:10 UTC

55 points

10 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter Three Year Retrospective

Rohin Shah7 Apr 2021 14:39 UTC

55 points

0 comments5 min readLW link

What is ambitious value learning?

Rohin Shah1 Nov 2018 16:20 UTC

55 points

28 comments2 min readLW link

[AN #172] Sorry for the long hiatus!

Rohin Shah5 Jul 2022 6:20 UTC

54 points

0 comments3 min readLW link

(mailchi.mp)

Intuitions about goal-directed behavior

Rohin Shah1 Dec 2018 4:25 UTC

54 points

15 comments6 min readLW link