RSS

rohinmshah

Karma: 4,467
Page 1

[AN #76]: How dataset size af­fects ro­bust­ness, and bench­mark­ing safe ex­plo­ra­tion by mea­sur­ing con­straint violations

rohinmshah
4 Dec 2019 18:10 UTC
13 points
6 comments9 min readLW link
(mailchi.mp)

[AN #75]: Solv­ing Atari and Go with learned game mod­els, and thoughts from a MIRI employee

rohinmshah
27 Nov 2019 18:10 UTC
38 points
1 comment10 min readLW link
(mailchi.mp)

[AN #74]: Separat­ing benefi­cial AI into com­pe­tence, al­ign­ment, and cop­ing with impacts

rohinmshah
20 Nov 2019 18:20 UTC
19 points
0 comments7 min readLW link
(mailchi.mp)

[AN #73]: De­tect­ing catas­trophic failures by learn­ing how agents tend to break

rohinmshah
13 Nov 2019 18:10 UTC
11 points
0 comments7 min readLW link
(mailchi.mp)

[AN #72]: Align­ment, ro­bust­ness, method­ol­ogy, and sys­tem build­ing as re­search pri­ori­ties for AI safety

rohinmshah
6 Nov 2019 18:10 UTC
28 points
4 comments10 min readLW link
(mailchi.mp)

[AN #71]: Avoid­ing re­ward tam­per­ing through cur­rent-RF optimization

rohinmshah
30 Oct 2019 17:10 UTC
12 points
0 comments7 min readLW link
(mailchi.mp)

[AN #70]: Agents that help hu­mans who are still learn­ing about their own preferences

rohinmshah
23 Oct 2019 17:10 UTC
18 points
0 comments9 min readLW link
(mailchi.mp)

Hu­man-AI Collaboration

rohinmshah
22 Oct 2019 6:32 UTC
39 points
7 comments2 min readLW link
(bair.berkeley.edu)

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

rohinmshah
19 Oct 2019 0:30 UTC
64 points
12 comments15 min readLW link
(mailchi.mp)

[AN #68]: The at­tain­able util­ity the­ory of impact

rohinmshah
14 Oct 2019 17:00 UTC
19 points
0 comments8 min readLW link
(mailchi.mp)

[AN #67]: Creat­ing en­vi­ron­ments in which to study in­ner al­ign­ment failures

rohinmshah
7 Oct 2019 17:10 UTC
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #66]: De­com­pos­ing ro­bust­ness into ca­pa­bil­ity ro­bust­ness and al­ign­ment robustness

rohinmshah
30 Sep 2019 18:00 UTC
12 points
1 comment7 min readLW link
(mailchi.mp)

[AN #65]: Learn­ing use­ful skills by watch­ing hu­mans “play”

rohinmshah
23 Sep 2019 17:30 UTC
12 points
0 comments9 min readLW link
(mailchi.mp)

[AN #64]: Us­ing Deep RL and Re­ward Uncer­tainty to In­cen­tivize Prefer­ence Learning

rohinmshah
16 Sep 2019 17:10 UTC
11 points
8 comments7 min readLW link
(mailchi.mp)

[AN #63] How ar­chi­tec­ture search, meta learn­ing, and en­vi­ron­ment de­sign could lead to gen­eral intelligence

rohinmshah
10 Sep 2019 19:10 UTC
24 points
12 comments8 min readLW link
(mailchi.mp)

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

rohinmshah
22 Aug 2019 17:10 UTC
28 points
10 comments9 min readLW link
(mailchi.mp)

Call for con­trib­u­tors to the Align­ment Newsletter

rohinmshah
21 Aug 2019 18:21 UTC
39 points
0 comments4 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

Ben Cottier
15 Aug 2019 21:29 UTC
68 points
3 comments9 min readLW link

[AN #61] AI policy and gov­er­nance, from two peo­ple in the field

rohinmshah
5 Aug 2019 17:00 UTC
11 points
0 comments9 min readLW link
(mailchi.mp)

[AN #60] A new AI challenge: Minecraft agents that as­sist hu­man play­ers in cre­ative mode

rohinmshah
22 Jul 2019 17:00 UTC
25 points
6 comments9 min readLW link
(mailchi.mp)