RSS

Lee Sharkey

Karma: 1,899

Goodfire (London). Formerly cofounded Apollo Research.

My main research interests are mechanistic interpretability and inner alignment.

Mech in­terp is not pre-paradigmatic

Lee Sharkey10 Jun 2025 13:39 UTC
186 points
6 comments13 min readLW link

Paper: Open Prob­lems in Mechanis­tic Interpretability

29 Jan 2025 10:25 UTC
68 points
0 comments1 min readLW link
(arxiv.org)

At­tri­bu­tion-based pa­ram­e­ter decomposition

25 Jan 2025 13:12 UTC
108 points
22 comments4 min readLW link
(publications.apolloresearch.ai)

Show­ing SAE La­tents Are Not Atomic Us­ing Meta-SAEs

24 Aug 2024 0:56 UTC
68 points
10 comments20 min readLW link

In­ter­pretabil­ity as Com­pres­sion: Re­con­sid­er­ing SAE Ex­pla­na­tions of Neu­ral Ac­ti­va­tions with MDL-SAEs

23 Aug 2024 18:52 UTC
42 points
8 comments16 min readLW link

A List of 45+ Mech In­terp Pro­ject Ideas from Apollo Re­search’s In­ter­pretabil­ity Team

18 Jul 2024 14:15 UTC
122 points
18 comments18 min readLW link

De­com­pos­ing the QK cir­cuit with Bilin­ear Sparse Dic­tionary Learning

2 Jul 2024 13:17 UTC
86 points
7 comments12 min readLW link