RSS

Bart Bussmann

Karma: 954

[Linkpost] In­ter­pret­ing Lan­guage Model Parameters

5 May 2026 17:37 UTC
162 points
2 comments2 min readLW link
(www.goodfire.ai)

Can we in­ter­pret la­tent rea­son­ing us­ing cur­rent mechanis­tic in­ter­pretabil­ity tools?

22 Dec 2025 16:56 UTC
44 points
1 comment9 min readLW link

Cur­rent LLMs seem to rarely de­tect CoT tampering

19 Nov 2025 15:27 UTC
56 points
0 comments20 min readLW link

Learn­ing Multi-Level Fea­tures with Ma­tryoshka SAEs

19 Dec 2024 15:59 UTC
46 points
6 comments11 min readLW link