RSS

Fabien Roger

Karma: 6,753

I am working on empirical AI safety.

Book a call with me if you want advice on a concrete empirical safety project.

Anonymous feedback form.

Eval­u­at­ing hon­esty and lie de­tec­tion tech­niques on a di­verse suite of dishon­est models

25 Nov 2025 19:33 UTC
40 points
0 comments4 min readLW link
(alignment.anthropic.com)

Think­ing about rea­son­ing mod­els made me less wor­ried about scheming

Fabien Roger20 Nov 2025 18:20 UTC
79 points
7 comments12 min readLW link

Steer­ing Lan­guage Models with Weight Arithmetic

11 Nov 2025 16:30 UTC
81 points
2 comments5 min readLW link