RSS

Bart Bussmann

Karma: 786

Can we in­ter­pret la­tent rea­son­ing us­ing cur­rent mechanis­tic in­ter­pretabil­ity tools?

22 Dec 2025 16:56 UTC
37 points
0 comments9 min readLW link

Cur­rent LLMs seem to rarely de­tect CoT tampering

19 Nov 2025 15:27 UTC
55 points
0 comments20 min readLW link

Learn­ing Multi-Level Fea­tures with Ma­tryoshka SAEs

19 Dec 2024 15:59 UTC
46 points
6 comments11 min readLW link