RSS

ollie

Karma: 82

Misal­ign­ment clas­sifiers: Why they’re hard to eval­u­ate ad­ver­sar­i­ally, and why we’re study­ing them anyway

15 Aug 2025 11:48 UTC
58 points
3 comments17 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

25 Sep 2024 14:52 UTC
37 points
2 comments4 min readLW link
(arxiv.org)