Working on Alignment Science at Anthropic
gasteigerjo
Karma: 280
AI Safety at the Frontier: Paper Highlights of October 2025
Training fails to elicit subtle reasoning in current language models
AI Safety at the Frontier: Paper Highlights, September ’25
AI Safety at the Frontier: Paper Highlights, August ’25
AI Safety at the Frontier: Paper Highlights, July ’25
AI Safety at the Frontier: Paper Highlights, June ’25
AI Safety at the Frontier: Paper Highlights, May ’25
AI Safety at the Frontier: Paper Highlights, April ’25
AI Safety at the Frontier: Paper Highlights, March ’25
Automated Researchers Can Subtly Sandbag
AI Safety at the Frontier: Paper Highlights, February ’25
AI Safety at the Frontier: Paper Highlights, January ’25
I’ll also give at least £5k if tax deductibility is set up in the UK.
This is normal. Workshops are non-archival and conferences only require that the work hasn’t been submitted to any archival venues.
[edit to extend]: Researchers will often submit their work to a conference and one or even multiple workshops in parallel. Workshops are great at getting a more targeted audience and discussion. It’s also a strategy to get more people to see your paper.