Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Connor Kissane
Karma:
375
All
Posts
Comments
New
Top
Old
SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane
,
robertzk
,
Neel Nanda
and
Arthur Conmy
Nov 7, 2024, 5:22 AM
66
points
4
comments
14
min read
LW
link
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Oct 27, 2024, 6:46 PM
48
points
4
comments
5
min read
LW
link
Base LLMs refuse too
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Sep 29, 2024, 4:04 PM
60
points
20
comments
10
min read
LW
link
SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jul 18, 2024, 10:29 AM
67
points
0
comments
10
min read
LW
link
Attention Output SAEs Improve Circuit Analysis
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jun 21, 2024, 12:56 PM
33
points
3
comments
19
min read
LW
link
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk
,
Connor Kissane
,
Arthur Conmy
and
Neel Nanda
6 Mar 2024 5:03 UTC
63
points
0
comments
12
min read
LW
link
Attention SAEs Scale to GPT-2 Small
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
3 Feb 2024 6:50 UTC
78
points
4
comments
8
min read
LW
link
Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
16 Jan 2024 0:26 UTC
85
points
9
comments
18
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel