Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Santiago Aranguri
Karma:
61
All
Posts
Comments
New
Top
Old
Reproducing steering against evaluation awareness in a large open-weight model
Thomas Read
,
Bronson Schoen
,
Santiago Aranguri
and
Joseph Bloom
10 Apr 2026 10:45 UTC
88
points
15
comments
15
min read
LW
link
SAE on activation differences
Santiago Aranguri
,
jacob_drori
and
Neel Nanda
30 Jun 2025 17:50 UTC
45
points
3
comments
5
min read
LW
link
Tied Crosscoders: Explaining Chat Behavior from Base Model
Santiago Aranguri
22 Mar 2025 18:07 UTC
9
points
0
comments
12
min read
LW
link
Back to top