RSS

Fabien Roger

Karma: 8,351

I am working on empirical AI safety.

Anonymous feedback form.

(Mis)gen­er­al­iza­tion of Helpful-Only Fine-tuning

4 Jun 2026 18:40 UTC
51 points
4 comments11 min readLW link

Clas­sifier Con­text Rot: Mon­i­tor Perfor­mance De­grades with Con­text Length

18 May 2026 14:05 UTC
54 points
1 comment4 min readLW link

How use­ful is cross-do­main gen­er­al­iza­tion for train­ing LLM mon­i­tors?

18 May 2026 13:52 UTC
21 points
0 comments4 min readLW link

A Re­search Agenda for Se­cret Loyalties

13 May 2026 17:34 UTC
34 points
3 comments3 min readLW link