RSS

Joseph Bloom

Karma: 816

A Selec­tion of Ran­domly Selected SAE Features

1 Apr 2024 9:09 UTC
106 points
2 comments4 min readLW link

SAE-VIS: An­nounce­ment Post

31 Mar 2024 15:30 UTC
73 points
8 comments1 min readLW link

An­nounc­ing Neu­ron­pe­dia: Plat­form for ac­cel­er­at­ing re­search into Sparse Autoencoders

25 Mar 2024 21:17 UTC
89 points
5 comments7 min readLW link

Un­der­stand­ing SAE Fea­tures with the Logit Lens

11 Mar 2024 0:16 UTC
53 points
0 comments14 min readLW link

Ex­am­in­ing Lan­guage Model Perfor­mance with Re­con­structed Ac­ti­va­tions us­ing Sparse Au­toen­coders

27 Feb 2024 2:43 UTC
41 points
16 comments15 min readLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC
94 points
37 comments15 min readLW link

Lin­ear en­cod­ing of char­ac­ter-level in­for­ma­tion in GPT-J to­ken embeddings

10 Nov 2023 22:19 UTC
34 points
4 comments28 min readLW link

Fea­tures and Ad­ver­saries in MemoryDT

20 Oct 2023 7:32 UTC
31 points
6 comments25 min readLW link

Joseph Bloom on choos­ing AI Align­ment over bio, what many as­piring re­searchers get wrong, and more (in­ter­view)

17 Sep 2023 18:45 UTC
27 points
2 comments8 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of a GridWorld Agent-Si­mu­la­tor (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC
36 points
2 comments16 min readLW link

De­ci­sion Trans­former Interpretability

6 Feb 2023 7:29 UTC
84 points
13 comments24 min readLW link