Archive
Sequences
About
Search
Log In
Home
Featured
All
Meta
Recent
Comments
RSS
rohinmshah
Karma:
4,467
All
Posts
Comments
New
Top
Page
1
[AN #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations
rohinmshah
4 Dec 2019 18:10 UTC
13
points
6
comments
9
min read
LW
link
(mailchi.mp)
[AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee
rohinmshah
27 Nov 2019 18:10 UTC
38
points
1
comment
10
min read
LW
link
(mailchi.mp)
[AN #74]: Separating beneficial AI into competence, alignment, and coping with impacts
rohinmshah
20 Nov 2019 18:20 UTC
19
points
0
comments
7
min read
LW
link
(mailchi.mp)
[AN #73]: Detecting catastrophic failures by learning how agents tend to break
rohinmshah
13 Nov 2019 18:10 UTC
11
points
0
comments
7
min read
LW
link
(mailchi.mp)
[AN #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety
rohinmshah
6 Nov 2019 18:10 UTC
28
points
4
comments
10
min read
LW
link
(mailchi.mp)
[AN #71]: Avoiding reward tampering through current-RF optimization
rohinmshah
30 Oct 2019 17:10 UTC
12
points
0
comments
7
min read
LW
link
(mailchi.mp)
[AN #70]: Agents that help humans who are still learning about their own preferences
rohinmshah
23 Oct 2019 17:10 UTC
18
points
0
comments
9
min read
LW
link
(mailchi.mp)
Human-AI Collaboration
rohinmshah
22 Oct 2019 6:32 UTC
39
points
7
comments
2
min read
LW
link
(bair.berkeley.edu)
[AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI
rohinmshah
19 Oct 2019 0:30 UTC
64
points
12
comments
15
min read
LW
link
(mailchi.mp)
[AN #68]: The attainable utility theory of impact
rohinmshah
14 Oct 2019 17:00 UTC
19
points
0
comments
8
min read
LW
link
(mailchi.mp)
[AN #67]: Creating environments in which to study inner alignment failures
rohinmshah
7 Oct 2019 17:10 UTC
17
points
0
comments
8
min read
LW
link
(mailchi.mp)
[AN #66]: Decomposing robustness into capability robustness and alignment robustness
rohinmshah
30 Sep 2019 18:00 UTC
12
points
1
comment
7
min read
LW
link
(mailchi.mp)
[AN #65]: Learning useful skills by watching humans “play”
rohinmshah
23 Sep 2019 17:30 UTC
12
points
0
comments
9
min read
LW
link
(mailchi.mp)
[AN #64]: Using Deep RL and Reward Uncertainty to Incentivize Preference Learning
rohinmshah
16 Sep 2019 17:10 UTC
11
points
8
comments
7
min read
LW
link
(mailchi.mp)
[AN #63] How architecture search, meta learning, and environment design could lead to general intelligence
rohinmshah
10 Sep 2019 19:10 UTC
24
points
12
comments
8
min read
LW
link
(mailchi.mp)
[AN #62] Are adversarial examples caused by real but imperceptible features?
rohinmshah
22 Aug 2019 17:10 UTC
28
points
10
comments
9
min read
LW
link
(mailchi.mp)
Call for contributors to the Alignment Newsletter
rohinmshah
21 Aug 2019 18:21 UTC
39
points
0
comments
4
min read
LW
link
Clarifying some key hypotheses in AI alignment
Ben Cottier
15 Aug 2019 21:29 UTC
68
points
3
comments
9
min read
LW
link
[AN #61] AI policy and governance, from two people in the field
rohinmshah
5 Aug 2019 17:00 UTC
11
points
0
comments
9
min read
LW
link
(mailchi.mp)
[AN #60] A new AI challenge: Minecraft agents that assist human players in creative mode
rohinmshah
22 Jul 2019 17:00 UTC
25
points
6
comments
9
min read
LW
link
(mailchi.mp)
Back to top
Next