RSS

Neel Nanda

Karma: 5,313

Paper Walk­through: Au­to­mated Cir­cuit Dis­cov­ery with Arthur Conmy

Neel Nanda29 Aug 2023 22:07 UTC
36 points
1 comment1 min readLW link
(www.youtube.com)

An In­ter­pretabil­ity Illu­sion for Ac­ti­va­tion Patch­ing of Ar­bi­trary Subspaces

29 Aug 2023 1:04 UTC
57 points
1 comment1 min readLW link

Mech In­terp Puz­zle 2: Word2Vec Style Embeddings

Neel Nanda28 Jul 2023 0:50 UTC
39 points
4 comments2 min readLW link

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

20 Jul 2023 10:50 UTC
43 points
3 comments2 min readLW link
(arxiv.org)