RSS

Marcus Williams

Karma: 148

OpenAI: Sidestep­ping Eval­u­a­tion Aware­ness and An­ti­ci­pat­ing Misal­ign­ment with Pro­duc­tion Evaluations

18 Dec 2025 22:55 UTC
25 points
0 comments1 min readLW link
(alignment.openai.com)

Mar­cus Willi­ams’s Shortform

Marcus Williams18 Nov 2024 22:49 UTC
2 points
2 comments1 min readLW link

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

7 Nov 2024 15:39 UTC
51 points
7 comments11 min readLW link