Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Rohin Shah
(Rohin Shah)
Karma:
14,316
Research Scientist at DeepMind. Creator of the Alignment Newsletter.
http://rohinshah.com/
All
Posts
Comments
New
Top
Old
Page
1
DeepMind is hiring for the Scalable Alignment and Alignment Teams
Rohin Shah
and
Geoffrey Irving
13 May 2022 12:17 UTC
150
points
34
comments
9
min read
LW
link
AI Alignment 2018-19 Review
Rohin Shah
28 Jan 2020 2:19 UTC
126
points
6
comments
35
min read
LW
link
Coherence arguments do not entail goal-directed behavior
Rohin Shah
3 Dec 2018 3:26 UTC
123
points
69
comments
7
min read
LW
link
3
reviews
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Rohin Shah
8 Jan 2019 7:12 UTC
121
points
77
comments
5
min read
LW
link
2
reviews
(www.fhi.ox.ac.uk)
The Alignment Problem: Machine Learning and Human Values
Rohin Shah
6 Oct 2020 17:41 UTC
120
points
7
comments
6
min read
LW
link
1
review
(www.amazon.com)
Alignment Newsletter One Year Retrospective
Rohin Shah
10 Apr 2019 6:58 UTC
94
points
31
comments
21
min read
LW
link
Categorizing failures as “outer” or “inner” misalignment is often confused
Rohin Shah
6 Jan 2023 15:48 UTC
86
points
21
comments
8
min read
LW
link
Shah and Yudkowsky on alignment failures
Rohin Shah
and
Eliezer Yudkowsky
28 Feb 2022 19:18 UTC
85
points
39
comments
91
min read
LW
link
1
review
Preface to the sequence on value learning
Rohin Shah
30 Oct 2018 22:04 UTC
70
points
6
comments
3
min read
LW
link
Alignment Newsletter #13: 07/02/18
Rohin Shah
2 Jul 2018 16:10 UTC
70
points
12
comments
8
min read
LW
link
(mailchi.mp)
FAQ: Advice for AI Alignment Researchers
Rohin Shah
26 Apr 2021 18:59 UTC
67
points
2
comments
1
min read
LW
link
(rohinshah.com)
AI safety without goal-directed behavior
Rohin Shah
7 Jan 2019 7:48 UTC
66
points
15
comments
4
min read
LW
link
Will humans build goal-directed agents?
Rohin Shah
5 Jan 2019 1:33 UTC
60
points
43
comments
5
min read
LW
link
[AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI
Rohin Shah
19 Oct 2019 0:30 UTC
60
points
12
comments
15
min read
LW
link
(mailchi.mp)
BASALT: A Benchmark for Learning from Human Feedback
Rohin Shah
8 Jul 2021 17:40 UTC
56
points
20
comments
2
min read
LW
link
(bair.berkeley.edu)
[AN #58] Mesa optimization: what it is, and why we should care
Rohin Shah
24 Jun 2019 16:10 UTC
55
points
10
comments
8
min read
LW
link
(mailchi.mp)
Alignment Newsletter Three Year Retrospective
Rohin Shah
7 Apr 2021 14:39 UTC
55
points
0
comments
5
min read
LW
link
What is ambitious value learning?
Rohin Shah
1 Nov 2018 16:20 UTC
55
points
28
comments
2
min read
LW
link
[AN #172] Sorry for the long hiatus!
Rohin Shah
5 Jul 2022 6:20 UTC
54
points
0
comments
3
min read
LW
link
(mailchi.mp)
Intuitions about goal-directed behavior
Rohin Shah
1 Dec 2018 4:25 UTC
54
points
15
comments
6
min read
LW
link
Back to top
Next