RSS

Marcus Williams

Karma: 252

OpenAI: How we mon­i­tor in­ter­nal cod­ing agents for mis­al­ign­ment

Marcus Williams19 Mar 2026 17:27 UTC
91 points
19 comments1 min readLW link
(openai.com)

OpenAI: Sidestep­ping Eval­u­a­tion Aware­ness and An­ti­ci­pat­ing Misal­ign­ment with Pro­duc­tion Evaluations

18 Dec 2025 22:55 UTC
25 points
1 comment1 min readLW link
(alignment.openai.com)

Mar­cus Willi­ams’s Shortform

Marcus Williams18 Nov 2024 22:49 UTC
2 points
2 comments1 min readLW link

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

7 Nov 2024 15:39 UTC
51 points
7 comments11 min readLW link