RSS

Rohin Shah

Karma: 15,620

Research Scientist at Google DeepMind. Creator of the Alignment Newsletter. http://​​rohinshah.com/​​

Defi­ni­tions of “ob­jec­tive” should be Prob­a­ble and Predictive

Rohin ShahJan 6, 2023, 3:40 PM
43 points
27 comments12 min readLW link

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

Nov 25, 2022, 2:36 PM
39 points
9 comments6 min readLW link
(vkrakovna.wordpress.com)

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
78 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
127 points
24 comments4 min readLW link1 review

More ex­am­ples of goal misgeneralization

Oct 7, 2022, 2:38 PM
56 points
8 comments2 min readLW link
(deepmindsafetyresearch.medium.com)

[AN #173] Re­cent lan­guage model re­sults from DeepMind

Rohin ShahJul 21, 2022, 2:30 AM
37 points
9 comments8 min readLW link
(mailchi.mp)

[AN #172] Sorry for the long hi­a­tus!

Rohin ShahJul 5, 2022, 6:20 AM
54 points
0 comments3 min readLW link
(mailchi.mp)

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

May 13, 2022, 12:17 PM
150 points
34 comments9 min readLW link

Learn­ing the smooth prior

Apr 29, 2022, 9:10 PM
35 points
0 comments12 min readLW link

Shah and Yud­kowsky on al­ign­ment failures

Feb 28, 2022, 7:18 PM
91 points
47 comments91 min readLW link1 review

[AN #171]: Disagree­ments be­tween al­ign­ment “op­ti­mists” and “pes­simists”

Rohin ShahJan 21, 2022, 6:30 PM
32 points
1 comment7 min readLW link
(mailchi.mp)

Con­ver­sa­tion on tech­nol­ogy fore­cast­ing and gradualism

Dec 9, 2021, 9:23 PM
108 points
30 comments31 min readLW link

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin ShahDec 8, 2021, 6:10 PM
21 points
1 comment7 min readLW link
(mailchi.mp)

[AN #169]: Col­lab­o­rat­ing with hu­mans with­out hu­man data

Rohin ShahNov 24, 2021, 6:30 PM
33 points
0 comments8 min readLW link
(mailchi.mp)

[AN #168]: Four tech­ni­cal top­ics for which Open Phil is so­lic­it­ing grant proposals

Rohin ShahOct 28, 2021, 5:20 PM
15 points
0 comments9 min readLW link
(mailchi.mp)

[AN #167]: Con­crete ML safety prob­lems and their rele­vance to x-risk

Rohin ShahOct 20, 2021, 5:10 PM
21 points
4 comments9 min readLW link
(mailchi.mp)

[AN #166]: Is it crazy to claim we’re in the most im­por­tant cen­tury?

Rohin ShahOct 8, 2021, 5:30 PM
52 points
5 comments8 min readLW link
(mailchi.mp)

[AN #165]: When large mod­els are more likely to lie

Rohin ShahSep 22, 2021, 5:30 PM
23 points
0 comments8 min readLW link
(mailchi.mp)

[AN #164]: How well can lan­guage mod­els write code?

Rohin ShahSep 15, 2021, 5:20 PM
13 points
7 comments9 min readLW link
(mailchi.mp)

[AN #163]: Us­ing finite fac­tored sets for causal and tem­po­ral inference

Rohin ShahSep 8, 2021, 5:20 PM
41 points
0 comments10 min readLW link
(mailchi.mp)