RSS

Aleksandr Kedrik

Karma: 26

I repli­cated the An­thropic al­ign­ment fak­ing ex­per­i­ment on other mod­els, and they didn’t fake alignment

30 May 2025 18:57 UTC
31 points
0 comments2 min readLW link