RSS

Kola Ayonrinde

Karma: 126

The Strange Science of In­ter­pretabil­ity: Re­cent Papers and a Read­ing List for the Philos­o­phy of Interpretability

Kola Ayonrinde17 Aug 2025 23:38 UTC
9 points
0 comments2 min readLW link
(arxiv.org)

SAEBench: A Com­pre­hen­sive Bench­mark for Sparse Autoencoders

11 Dec 2024 6:30 UTC
82 points
6 comments2 min readLW link
(www.neuronpedia.org)

Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC
27 points
0 comments12 min readLW link

In­ter­pretabil­ity as Com­pres­sion: Re­con­sid­er­ing SAE Ex­pla­na­tions of Neu­ral Ac­ti­va­tions with MDL-SAEs

23 Aug 2024 18:52 UTC
42 points
8 comments16 min readLW link