RSS

Geoffrey Irving

Karma: 97

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

20 Jul 2023 10:50 UTC
43 points
3 comments2 min readLW link
(arxiv.org)

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

13 May 2022 12:17 UTC
150 points
34 comments9 min readLW link

Learn­ing the smooth prior

29 Apr 2022 21:10 UTC
35 points
0 comments12 min readLW link