Buck

Karma: 15,589

CEO at Redwood Research.

AI safety is a highly collaborative field—almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I’m saying this here because it would feel repetitive to say “these ideas were developed in collaboration with various people” in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

If we are ever arguing on LessWrong and you feel like it’s kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I’ll probably be willing to call to discuss briefly.

Rogue internal deployments via external APIs

Fabien Roger and Buck

15 Oct 2025 19:34 UTC

34 points

4 comments6 min readLW link

The Thinking Machines Tinker API is good news for AI control and security

Buck9 Oct 2025 15:22 UTC

91 points

10 comments6 min readLW link

Christian homeschoolers in the year 3000

Buck17 Sep 2025 14:44 UTC

196 points

64 comments7 min readLW link

I enjoyed most of IABIED

Buck17 Sep 2025 4:34 UTC

208 points

46 comments8 min readLW link

An epistemic advantage of working as a moderate

Buck20 Aug 2025 17:47 UTC

215 points

96 comments4 min readLW link

Four places where you can put LLM monitoring

Fabien Roger and Buck

9 Aug 2025 23:10 UTC

48 points

0 comments7 min readLW link

Research Areas in AI Control (The Alignment Project by UK AISI)

Julian Stastny, Tomek Korbak, Mojmir, Buck and Alan Cooney

1 Aug 2025 10:27 UTC

25 points

0 comments18 min readLW link

(alignmentproject.aisi.gov.uk)

Why it’s hard to make settings for high-stakes control research

Buck18 Jul 2025 16:33 UTC

49 points

6 comments4 min readLW link

Recent Redwood Research project proposals

ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman , Tyler Tracy, Aryan Bhatt and Joey Yudelson

14 Jul 2025 22:27 UTC

91 points

0 comments3 min readLW link

Lessons from the Iraq War for AI policy

Buck10 Jul 2025 18:52 UTC

197 points

25 comments4 min readLW link

What’s worse, spies or schemers?

Buck and Julian Stastny

9 Jul 2025 14:37 UTC

51 points

2 comments5 min readLW link

How much novel security-critical infrastructure do you need during the singularity?

Buck4 Jul 2025 16:54 UTC

56 points

7 comments5 min readLW link

There are two fundamentally different constraints on schemers

Buck2 Jul 2025 15:51 UTC

62 points

0 comments4 min readLW link

Comparing risk from internally-deployed AI to insider and outsider threats from humans

Buck23 Jun 2025 17:47 UTC

150 points

22 comments3 min readLW link

Making deals with early schemers

Julian Stastny, Olli Järviniemi and Buck

20 Jun 2025 18:21 UTC

121 points

41 comments15 min readLW link

Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking

Buck and Julian Stastny

8 May 2025 19:06 UTC

77 points

3 comments15 min readLW link

Handling schemers if shutdown is not an option

Buck18 Apr 2025 14:39 UTC

39 points

2 comments14 min readLW link

Ctrl-Z: Controlling AI Agents via Resampling

Aryan Bhatt, Buck, Adam Kaufman , Cody Rushing and Tyler Tracy

16 Apr 2025 16:21 UTC

124 points

0 comments20 min readLW link

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence

Tomek Korbak, Mikita Balesni, Buck and Geoffrey Irving

14 Apr 2025 16:45 UTC

29 points

1 comment2 min readLW link

Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

Alex Mallen, charlie_griffin and Buck

24 Mar 2025 17:55 UTC

35 points

0 comments8 min readLW link