RSS

Florian_Dietz

Karma: 582

Con­text Mod­ifi­ca­tion as a Nega­tive Align­ment Tax

Florian_Dietz10 May 2026 11:32 UTC
7 points
0 comments4 min readLW link

Con­text Mod­ifi­ca­tion as a Nega­tive Align­ment Tax

Florian_Dietz8 May 2026 10:56 UTC
5 points
0 comments5 min readLW link

Con­text Mod­ifi­ca­tion as a Nega­tive Align­ment Tax

Florian_Dietz7 May 2026 17:34 UTC
5 points
0 comments5 min readLW link

Pos­i­tive Feed­back Only

Florian_Dietz5 May 2026 21:28 UTC
19 points
0 comments8 min readLW link

Split Per­son­al­ity Train­ing can de­tect Align­ment Faking

Florian_Dietz4 Mar 2026 11:49 UTC
34 points
0 comments6 min readLW link