RSS

David Lindner

Karma: 480

Alignment researcher at Google DeepMind

Early Signs of Stegano­graphic Ca­pa­bil­ities in Fron­tier LLMs

4 Jul 2025 16:36 UTC
30 points
4 comments2 min readLW link

MONA: Three Month Later—Up­dates and Steganog­ra­phy Without Op­ti­miza­tion Pressure

12 Apr 2025 23:15 UTC
31 points
0 comments5 min readLW link

Can LLMs learn Stegano­graphic Rea­son­ing via RL?

11 Apr 2025 16:33 UTC
28 points
2 comments6 min readLW link

MONA: Man­aged My­opia with Ap­proval Feedback

23 Jan 2025 12:24 UTC
80 points
30 comments9 min readLW link

On scal­able over­sight with weak LLMs judg­ing strong LLMs

8 Jul 2024 8:59 UTC
49 points
18 comments7 min readLW link
(arxiv.org)

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Prac­ti­cal Pit­falls of Causal Scrubbing

27 Mar 2023 7:47 UTC
87 points
17 comments13 min readLW link

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
78 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
127 points
24 comments4 min readLW link1 review