Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jason Gross
Karma:
271
All
Posts
Comments
New
Top
Old
[Replication] Crosscoder-based Stage-Wise Model Diffing
Anna Soligo
,
Thomas Read
,
Oliver Clive-Griffin
,
dmanningcoe
,
Chun Hei Yip
,
rajashree
and
Jason Gross
22 Mar 2025 18:35 UTC
24
points
0
comments
7
min read
LW
link
Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal]
Jason Gross
and
rajashree
6 Jan 2025 4:22 UTC
19
points
0
comments
12
min read
LW
link
Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC
,
rajashree
,
Adrià Garriga-alonso
and
Jason Gross
24 Jun 2024 19:27 UTC
104
points
4
comments
8
min read
LW
link
(arxiv.org)
Back to top