micahcarroll

Karma: 226

https://micahcarroll.github.io/

Paper: Prompt Optimization Makes Misalignment Legible

Caleb Biddulph and micahcarroll

12 Feb 2026 19:45 UTC

63 points

8 comments8 min readLW link

OpenAI: Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations

Marcus Williams and micahcarroll

18 Dec 2025 22:55 UTC

25 points

1 comment1 min readLW link

(alignment.openai.com)

Is the evidence in “Language Models Learn to Mislead Humans via RLHF” valid?

Aaryan Chandna, Lukas Fluri and micahcarroll

1 Dec 2025 6:50 UTC

37 points

0 comments19 min readLW link

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, micahcarroll, Adhyyan Narang, Constantin Weisser and Brendan Murphy

7 Nov 2024 15:39 UTC

51 points

7 comments11 min readLW link