RSS

saraprice

Karma: 125

Model Spec Mid­train­ing: Im­prov­ing How Align­ment Train­ing Generalizes

5 May 2026 21:55 UTC
71 points
6 comments7 min readLW link
(alignment.anthropic.com)

Best-of-N Jailbreaking

14 Dec 2024 4:58 UTC
79 points
5 comments2 min readLW link
(arxiv.org)