Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jonathan Kutasov
Karma:
84
All
Posts
Comments
New
Top
Old
Model Spec Midtraining: Improving How Alignment Training Generalizes
Chloe Li
,
Nevan Wichers
,
saraprice
,
Sam Marks
and
Jonathan Kutasov
5 May 2026 21:55 UTC
71
points
7
comments
7
min read
LW
link
(alignment.anthropic.com)
Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov
5 Oct 2024 20:43 UTC
27
points
2
comments
8
min read
LW
link
Back to top