Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
János Kramár
Karma:
531
All
Posts
Comments
New
Top
Old
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
CallumMcDougall
,
Tom Lieberum
,
János Kramár
,
Rohin Shah
and
Neel Nanda
Mar 26, 2025, 7:07 PM
113
points
15
comments
29
min read
LW
link
(deepmindsafetyresearch.medium.com)
JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan
,
Tom Lieberum
,
nps29
,
Arthur Conmy
,
Vikrant Varma
,
János Kramár
and
Neel Nanda
Jul 19, 2024, 4:10 PM
49
points
10
comments
1
min read
LW
link
(storage.googleapis.com)
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
lewis smith
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
and
Neel Nanda
Apr 25, 2024, 6:43 PM
63
points
38
comments
1
min read
LW
link
(arxiv.org)
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
Apr 19, 2024, 7:06 PM
79
points
10
comments
8
min read
LW
link
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
Apr 19, 2024, 7:06 PM
72
points
0
comments
3
min read
LW
link
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Neel Nanda
,
János Kramár
,
Tom Lieberum
and
Rohin Shah
Mar 18, 2024, 5:28 PM
19
points
0
comments
1
min read
LW
link
(arxiv.org)
Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
18
points
0
comments
4
min read
LW
link
Fact Finding: How to Think About Interpreting Memorisation (Post 4)
Senthooran Rajamanoharan
,
Neel Nanda
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
22
points
0
comments
9
min read
LW
link
Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
10
points
1
comment
16
min read
LW
link
Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan
,
Neel Nanda
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:45 AM
25
points
3
comments
14
min read
LW
link
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:44 AM
106
points
10
comments
22
min read
LW
link
2
reviews
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Neel Nanda
,
Tom Lieberum
,
Matthew Rahtz
,
János Kramár
,
Geoffrey Irving
,
Rohin Shah
and
Vlad Mikulik
Jul 20, 2023, 10:50 AM
44
points
3
comments
2
min read
LW
link
(arxiv.org)
Infinite Modal Combat: some observations
János Kramár
Jul 29, 2015, 4:05 AM
3
points
0
comments
3
min read
LW
link
A tractable, interpretable formulation of approximate conditioning for pairwise-specified probability distributions over truth values
János Kramár
Jun 3, 2015, 7:08 PM
3
points
3
comments
2
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel