RSS

Florian_Dietz

Karma: 270

Split Per­son­al­ity Train­ing: Re­veal­ing La­tent Knowl­edge Through Per­son­al­ity-Shift Tokens

Florian_Dietz10 Mar 2025 16:07 UTC
35 points
3 comments9 min readLW link

Do we want al­ign­ment fak­ing?

Florian_Dietz28 Feb 2025 21:50 UTC
7 points
4 comments1 min readLW link

Re­veal­ing al­ign­ment fak­ing with a sin­gle prompt

Florian_Dietz29 Jan 2025 21:01 UTC
9 points
5 comments4 min readLW link