RSS

David Lindner

Karma: 483

Alignment researcher at Google DeepMind

Early Signs of Stegano­graphic Ca­pa­bil­ities in Fron­tier LLMs

Jul 4, 2025, 4:36 PM
30 points

9 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

MONA: Three Month Later—Up­dates and Steganog­ra­phy Without Op­ti­miza­tion Pressure

Apr 12, 2025, 11:15 PM
31 points

11 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Can LLMs learn Stegano­graphic Rea­son­ing via RL?

Apr 11, 2025, 4:33 PM
29 points

10 votes

Overall karma indicates overall quality.

3 comments6 min readLW link

MONA: Man­aged My­opia with Ap­proval Feedback

Jan 23, 2025, 12:24 PM
81 points

26 votes

Overall karma indicates overall quality.

30 comments9 min readLW link

On scal­able over­sight with weak LLMs judg­ing strong LLMs

Jul 8, 2024, 8:59 AM
49 points

17 votes

Overall karma indicates overall quality.

18 comments7 min readLW link
(arxiv.org)

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

Oct 23, 2023, 2:11 PM
20 points

6 votes

Overall karma indicates overall quality.

2 comments5 min readLW link
(far.ai)

Prac­ti­cal Pit­falls of Causal Scrubbing

Mar 27, 2023, 7:47 AM
87 points

36 votes

Overall karma indicates overall quality.

17 comments13 min readLW link

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
79 points

42 votes

Overall karma indicates overall quality.

4 comments25 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
127 points

67 votes

Overall karma indicates overall quality.

24 comments4 min readLW link1 review