RSS

Buck

Karma: 15,589

CEO at Redwood Research.

AI safety is a highly collaborative field—almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I’m saying this here because it would feel repetitive to say “these ideas were developed in collaboration with various people” in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

If we are ever arguing on LessWrong and you feel like it’s kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I’ll probably be willing to call to discuss briefly.

Rogue in­ter­nal de­ploy­ments via ex­ter­nal APIs

15 Oct 2025 19:34 UTC
34 points
4 comments6 min readLW link

The Think­ing Machines Tinker API is good news for AI con­trol and security

Buck9 Oct 2025 15:22 UTC
91 points
10 comments6 min readLW link

Chris­tian home­school­ers in the year 3000

Buck17 Sep 2025 14:44 UTC
196 points
64 comments7 min readLW link

I en­joyed most of IABIED

Buck17 Sep 2025 4:34 UTC
208 points
46 comments8 min readLW link

An epistemic ad­van­tage of work­ing as a moderate

Buck20 Aug 2025 17:47 UTC
215 points
96 comments4 min readLW link

Four places where you can put LLM monitoring

9 Aug 2025 23:10 UTC
48 points
0 comments7 min readLW link

Re­search Areas in AI Con­trol (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
25 points
0 comments18 min readLW link
(alignmentproject.aisi.gov.uk)

Why it’s hard to make set­tings for high-stakes con­trol research

Buck18 Jul 2025 16:33 UTC
49 points
6 comments4 min readLW link

Re­cent Red­wood Re­search pro­ject proposals

14 Jul 2025 22:27 UTC
91 points
0 comments3 min readLW link

Les­sons from the Iraq War for AI policy

Buck10 Jul 2025 18:52 UTC
197 points
25 comments4 min readLW link

What’s worse, spies or schemers?

9 Jul 2025 14:37 UTC
51 points
2 comments5 min readLW link

How much novel se­cu­rity-crit­i­cal in­fras­truc­ture do you need dur­ing the sin­gu­lar­ity?

Buck4 Jul 2025 16:54 UTC
56 points
7 comments5 min readLW link

There are two fun­da­men­tally differ­ent con­straints on schemers

Buck2 Jul 2025 15:51 UTC
62 points
0 comments4 min readLW link

Com­par­ing risk from in­ter­nally-de­ployed AI to in­sider and out­sider threats from humans

Buck23 Jun 2025 17:47 UTC
150 points
22 comments3 min readLW link

Mak­ing deals with early schemers

20 Jun 2025 18:21 UTC
121 points
41 comments15 min readLW link

Misal­ign­ment and Strate­gic Un­der­perfor­mance: An Anal­y­sis of Sand­bag­ging and Ex­plo­ra­tion Hacking

8 May 2025 19:06 UTC
77 points
3 comments15 min readLW link

Han­dling schemers if shut­down is not an option

Buck18 Apr 2025 14:39 UTC
39 points
2 comments14 min readLW link

Ctrl-Z: Con­trol­ling AI Agents via Resampling

16 Apr 2025 16:21 UTC
124 points
0 comments20 min readLW link

How to eval­u­ate con­trol mea­sures for LLM agents? A tra­jec­tory from to­day to superintelligence

14 Apr 2025 16:45 UTC
29 points
1 comment2 min readLW link

Sub­ver­sion Strat­egy Eval: Can lan­guage mod­els state­lessly strate­gize to sub­vert con­trol pro­to­cols?

24 Mar 2025 17:55 UTC
35 points
0 comments8 min readLW link