Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
robertzk
Karma:
466
All
Posts
Comments
New
Top
Old
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk
,
Connor Kissane
,
Arthur Conmy
and
Neel Nanda
6 Mar 2024 5:03 UTC
56
points
0
comments
12
min read
LW
link
Attention SAEs Scale to GPT-2 Small
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
3 Feb 2024 6:50 UTC
76
points
4
comments
8
min read
LW
link
Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
16 Jan 2024 0:26 UTC
82
points
5
comments
19
min read
LW
link
Training Process Transparency through Gradient Interpretability: Early experiments on toy language models
robertzk
and
evhub
21 Jul 2023 14:52 UTC
56
points
1
comment
1
min read
LW
link
Getting up to Speed on the Speed Prior in 2022
robertzk
28 Dec 2022 7:49 UTC
36
points
5
comments
65
min read
LW
link
Emily Brontë on: Psychology Required for Serious™ AGI Safety Research
robertzk
14 Sep 2022 14:47 UTC
2
points
0
comments
1
min read
LW
link
Ask LW: ω-self-aware systems
robertzk
16 Dec 2012 22:18 UTC
0
points
10
comments
1
min read
LW
link
The rationalist’s checklist
robertzk
16 Dec 2011 16:21 UTC
44
points
8
comments
1
min read
LW
link
Back to top