Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
robertzk
Karma:
669
All
Posts
Comments
New
Top
Old
SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane
,
robertzk
,
Neel Nanda
and
Arthur Conmy
Nov 7, 2024, 5:22 AM
66
points
4
comments
14
min read
LW
link
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Oct 27, 2024, 6:46 PM
47
points
4
comments
5
min read
LW
link
Base LLMs refuse too
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Sep 29, 2024, 4:04 PM
60
points
20
comments
10
min read
LW
link
SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jul 18, 2024, 10:29 AM
67
points
0
comments
10
min read
LW
link
Attention Output SAEs Improve Circuit Analysis
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jun 21, 2024, 12:56 PM
33
points
3
comments
19
min read
LW
link
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk
,
Connor Kissane
,
Arthur Conmy
and
Neel Nanda
Mar 6, 2024, 5:03 AM
63
points
0
comments
12
min read
LW
link
Attention SAEs Scale to GPT-2 Small
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Feb 3, 2024, 6:50 AM
78
points
4
comments
8
min read
LW
link
Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jan 16, 2024, 12:26 AM
84
points
9
comments
18
min read
LW
link
Training Process Transparency through Gradient Interpretability: Early experiments on toy language models
robertzk
and
evhub
21 Jul 2023 14:52 UTC
56
points
1
comment
1
min read
LW
link
Getting up to Speed on the Speed Prior in 2022
robertzk
28 Dec 2022 7:49 UTC
36
points
5
comments
65
min read
LW
link
Emily Brontë on: Psychology Required for Serious™ AGI Safety Research
robertzk
14 Sep 2022 14:47 UTC
2
points
0
comments
1
min read
LW
link
Ask LW: ω-self-aware systems
robertzk
16 Dec 2012 22:18 UTC
0
points
10
comments
1
min read
LW
link
The rationalist’s checklist
robertzk
16 Dec 2011 16:21 UTC
44
points
8
comments
1
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel