Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
David Lindner
Karma:
547
Alignment researcher at Google DeepMind
All
Posts
Comments
New
Top
Old
Understanding when and why agents scheme
Mia Hopman
,
Jannes Elstner
,
Maria Avramidou
,
Amritanshu Prasad
and
David Lindner
21 Mar 2026 20:33 UTC
33
points
0
comments
4
min read
LW
link
Stress-Testing Alignment Audits With Prompt-Level Strategic Deception
Oliver Daniels
,
Perusha Moodley
and
David Lindner
10 Feb 2026 17:29 UTC
17
points
0
comments
1
min read
LW
link
(arxiv.org)
Practical challenges of control monitoring in frontier AI deployments
David Lindner
and
Charlie Griffin
12 Jan 2026 16:45 UTC
19
points
0
comments
1
min read
LW
link
(arxiv.org)
Current LLM agents need strong pressure to engage in scheming behavior
Mia Hopman
,
Jannes Elstner
,
Maria Avramidou
,
Amritanshu Prasad
,
David Lindner
and
LASR Labs
20 Nov 2025 20:45 UTC
21
points
0
comments
11
min read
LW
link
Early Signs of Steganographic Capabilities in Frontier LLMs
Kei Nishimura-Gasparian
,
Artur Zolkowski
,
robert mccarthy
and
David Lindner
4 Jul 2025 16:36 UTC
33
points
5
comments
2
min read
LW
link
MONA: Three Month Later—Updates and Steganography Without Optimization Pressure
David Lindner
and
Vikrant Varma
12 Apr 2025 23:15 UTC
31
points
0
comments
5
min read
LW
link
Can LLMs learn Steganographic Reasoning via RL?
robert mccarthy
,
Vasil Georgiev
,
Steven Basart
and
David Lindner
11 Apr 2025 16:33 UTC
30
points
3
comments
6
min read
LW
link
MONA: Managed Myopia with Approval Feedback
Seb Farquhar
,
David Lindner
and
Rohin Shah
23 Jan 2025 12:24 UTC
81
points
30
comments
9
min read
LW
link
On scalable oversight with weak LLMs judging strong LLMs
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
and
Rohin Shah
8 Jul 2024 8:59 UTC
49
points
18
comments
7
min read
LW
link
(arxiv.org)
VLM-RM: Specifying Rewards with Natural Language
ChengCheng
,
David Lindner
and
Ethan Perez
23 Oct 2023 14:11 UTC
20
points
2
comments
5
min read
LW
link
(far.ai)
Practical Pitfalls of Causal Scrubbing
Jérémy Scheurer
,
Phil3
,
tony
,
jacquesthibs
and
David Lindner
27 Mar 2023 7:47 UTC
87
points
17
comments
13
min read
LW
link
Threat Model Literature Review
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
1 Nov 2022 11:03 UTC
79
points
4
comments
25
min read
LW
link
Clarifying AI X-risk
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
1 Nov 2022 11:03 UTC
127
points
24
comments
4
min read
LW
link
1
review
Back to top