RSS

Fabien Roger

Karma: 6,238

I am working on empirical AI safety.

Book a call with me if you want advice on a concrete empirical safety project.

Anonymous feedback form.

Train­ing Qwen-1.5B with a CoT leg­i­bil­ity penalty

Fabien Roger9 Oct 2025 21:33 UTC
63 points
5 comments4 min readLW link

Train­ing fails to elicit sub­tle rea­son­ing in cur­rent lan­guage models

9 Oct 2025 19:04 UTC
48 points
2 comments4 min readLW link
(alignment.anthropic.com)

Inoc­u­la­tion prompt­ing: In­struct­ing mod­els to mis­be­have at train-time can im­prove run-time behavior

8 Oct 2025 22:02 UTC
139 points
24 comments2 min readLW link