RSS

Jonathan Kutasov

Karma: 75

Model Spec Mid­train­ing: Im­prov­ing How Align­ment Train­ing Generalizes

5 May 2026 21:55 UTC
62 points
6 comments7 min readLW link

In­ter­pretabil­ity of SAE Fea­tures Rep­re­sent­ing Check in ChessGPT

Jonathan Kutasov5 Oct 2024 20:43 UTC
27 points
2 comments8 min readLW link