Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Joseph Bloom
Karma:
821
All
Posts
Comments
New
Top
Old
Decision Transformer Interpretability
Joseph Bloom
and
Paul Colognese
6 Feb 2023 7:29 UTC
84
points
13
comments
24
min read
LW
link
A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)
Joseph Bloom
16 May 2023 22:59 UTC
36
points
2
comments
16
min read
LW
link
Features and Adversaries in MemoryDT
Joseph Bloom
and
Jay Bailey
20 Oct 2023 7:32 UTC
31
points
6
comments
25
min read
LW
link
Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom
2 Feb 2024 6:54 UTC
94
points
37
comments
15
min read
LW
link
Understanding SAE Features with the Logit Lens
Joseph Bloom
and
Johnny Lin
11 Mar 2024 0:16 UTC
54
points
0
comments
14
min read
LW
link
Back to top