RSS

Joseph Bloom

Karma: 816

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC
94 points
37 comments15 min readLW link

De­ci­sion Trans­former Interpretability

6 Feb 2023 7:29 UTC
84 points
13 comments24 min readLW link

Un­der­stand­ing SAE Fea­tures with the Logit Lens

11 Mar 2024 0:16 UTC
53 points
0 comments14 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of a GridWorld Agent-Si­mu­la­tor (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC
36 points
2 comments16 min readLW link

Fea­tures and Ad­ver­saries in MemoryDT

20 Oct 2023 7:32 UTC
31 points
6 comments25 min readLW link