Connor Kissane

Karma: 211

SAEs (usually) Transfer Between Base and Chat Models

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

18 Jul 2024 10:29 UTC

40 points

0 comments10 min readLW link

Attention Output SAEs Improve Circuit Analysis

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

21 Jun 2024 12:56 UTC

31 points

0 comments19 min readLW link

Connor Kissane 17 May 2024 8:53 UTC
1 point
1
in reply to: Ali Shehper’s comment on: Sparse Autoencoders Work on Attention Layer Outputs
Thanks for the comment! We always use the pre-ReLU feature activation, which is equal to the post-ReLU activation (given that the feature is activate), and is purely linear function of z. Edited the post for clarity.

Connor Kissane 31 Mar 2024 16:35 UTC
7 points
0
on: SAE-VIS: Announcement Post
Amazing! We found your original library super useful for our Attention SAEs research, so thanks for making this!

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To

robertzk, Connor Kissane, Arthur Conmy and Neel Nanda

6 Mar 2024 5:03 UTC

57 points

0 comments12 min readLW link

Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

3 Feb 2024 6:50 UTC

76 points

4 comments8 min readLW link

Sparse Autoencoders Work on Attention Layer Outputs

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

16 Jan 2024 0:26 UTC

82 points

9 comments18 min readLW link

Connor Kissane 14 Aug 2023 14:20 UTC
1 point
0
on: Mech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo
These puzzles are great, thanks for making them!

Connor Kissane 19 Jul 2023 19:57 UTC
1 point
0
on: Causal scrubbing: results on induction heads
Code for this token filtering can be found in the appendix and the exact token list is linked.
Maybe I just missed it, but I’m not seeing this. Is the code still available?