RSS

Fabien Roger

Karma: 8,393

I am working on empirical AI safety.

Anonymous feedback form.

(Mis)gen­er­al­iza­tion of Helpful-Only Fine-tuning

4 Jun 2026 18:40 UTC
55 points
7 comments11 min readLW link

Clas­sifier Con­text Rot: Mon­i­tor Perfor­mance De­grades with Con­text Length

18 May 2026 14:05 UTC
54 points
1 comment4 min readLW link

How use­ful is cross-do­main gen­er­al­iza­tion for train­ing LLM mon­i­tors?

18 May 2026 13:52 UTC
21 points
0 comments4 min readLW link