RSS

Lee Sharkey(Lee Sharkey)

Karma: 1,329

Apollo Research (London).

My main research interests are mechanistic interpretability and inner alignment.

De­com­pos­ing the QK cir­cuit with Bilin­ear Sparse Dic­tionary Learning

2 Jul 2024 13:17 UTC
73 points
6 comments12 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
92 points
0 comments7 min readLW link

Iden­ti­fy­ing Func­tion­ally Im­por­tant Fea­tures with End-to-End Sparse Dic­tionary Learning

17 May 2024 16:25 UTC
57 points
5 comments4 min readLW link
(arxiv.org)

Gated At­ten­tion Blocks: Pre­limi­nary Progress to­ward Re­mov­ing At­ten­tion Head Superposition

8 Apr 2024 11:14 UTC
36 points
4 comments15 min readLW link

Spar­sify: A mechanis­tic in­ter­pretabil­ity re­search agenda

Lee Sharkey3 Apr 2024 12:34 UTC
93 points
22 comments22 min readLW link

Ad­dress­ing Fea­ture Sup­pres­sion in SAEs

16 Feb 2024 18:32 UTC
85 points
3 comments10 min readLW link