David Lindner

Karma: 547

Alignment researcher at Google DeepMind

Understanding when and why agents scheme

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad and David Lindner

21 Mar 2026 20:33 UTC

33 points

0 comments4 min readLW link

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley and David Lindner

10 Feb 2026 17:29 UTC

17 points

0 comments1 min readLW link

(arxiv.org)

Practical challenges of control monitoring in frontier AI deployments

David Lindner and Charlie Griffin

12 Jan 2026 16:45 UTC

19 points

0 comments1 min readLW link

(arxiv.org)

Current LLM agents need strong pressure to engage in scheming behavior

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner and LASR Labs

20 Nov 2025 20:45 UTC

21 points

0 comments11 min readLW link

Early Signs of Steganographic Capabilities in Frontier LLMs

Kei Nishimura-Gasparian, Artur Zolkowski, robert mccarthy and David Lindner

4 Jul 2025 16:36 UTC

33 points

5 comments2 min readLW link

MONA: Three Month Later—Updates and Steganography Without Optimization Pressure

David Lindner and Vikrant Varma

12 Apr 2025 23:15 UTC

31 points

0 comments5 min readLW link

Can LLMs learn Steganographic Reasoning via RL?

robert mccarthy, Vasil Georgiev, Steven Basart and David Lindner

11 Apr 2025 16:33 UTC

30 points

3 comments6 min readLW link

MONA: Managed Myopia with Approval Feedback

Seb Farquhar, David Lindner and Rohin Shah

23 Jan 2025 12:24 UTC

81 points

30 comments9 min readLW link

On scalable oversight with weak LLMs judging strong LLMs

zac_kenton, Noah Siegel, janos, Jonah Brown-Cohen, Samuel Albanie, David Lindner and Rohin Shah

8 Jul 2024 8:59 UTC

49 points

18 comments7 min readLW link

(arxiv.org)

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

23 Oct 2023 14:11 UTC

20 points

2 comments5 min readLW link

(far.ai)

Practical Pitfalls of Causal Scrubbing

Jérémy Scheurer, Phil3, tony, jacquesthibs and David Lindner

27 Mar 2023 7:47 UTC

87 points

17 comments13 min readLW link

Threat Model Literature Review

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

1 Nov 2022 11:03 UTC

79 points

4 comments25 min readLW link

Clarifying AI X-risk

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

1 Nov 2022 11:03 UTC

127 points

24 comments4 min readLW link 1 review