Marcus Williams

Karma: 294

Predicting LLM Safety Before Release by Simulating Deployment

Tomek Korbak, Marcus Williams, micahcarroll, Cameron Raymond and Hannah Sheahan

16 Jun 2026 19:55 UTC

36 points

2 comments1 min readLW link

OpenAI: How we monitor internal coding agents for misalignment

Marcus Williams19 Mar 2026 17:27 UTC

95 points

19 comments1 min readLW link

(openai.com)

OpenAI: Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations

Marcus Williams and micahcarroll

18 Dec 2025 22:55 UTC

25 points

1 comment1 min readLW link

(alignment.openai.com)

Marcus Williams’s Shortform

Marcus Williams18 Nov 2024 22:49 UTC

2 points

2 comments1 min readLW link

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, micahcarroll, Adhyyan Narang, Constantin Weisser and Brendan Murphy

7 Nov 2024 15:39 UTC

51 points

7 comments11 min readLW link