Buck

Karma: 16,861

CEO at Redwood Research.

AI safety is a highly collaborative field—almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I’m saying this here because it would feel repetitive to say “these ideas were developed in collaboration with various people” in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

If we are ever arguing on LessWrong and you feel like it’s kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I’ll probably be willing to call to discuss briefly.

Buck 1 Feb 2026 19:19 UTC
9 points
0
in reply to: Yair Halberstadt’s comment on: On ‘Inventing Temperature’ and the realness of properties
Note that temperature does not have this property! Objects generically have different heat capacities at different temperatures.

Buck 1 Feb 2026 3:12 UTC
9 points
3
on: Buck’s Shortform
A question that I think would be a fun AI safety interview question, except that it would give too much of an advantage to people who happened to have thought about the relevant topics before: Explain the differences in risks posed by paperclip maximizers and paperclip minimizers.

Buck 29 Jan 2026 16:38 UTC
66 points
1
on: Buck’s Shortform
@ryan_greenblatt and I are going to record another podcast together. We’d love to hear topics that you’d like us to discuss. (The questions people proposed last time are here, for reference.)

Buck 27 Jan 2026 20:05 UTC
LW: 7 AF: 5
0
AF
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
(I haven’t spelled out what I think about decision theory publicly, so presumably this won’t be completely informative to you. A quick summary is that I think that questions related to decision theory and anthropics are very important for how the future goes, and relevant to some aspects of the earliest risks from misaligned power-seeking AI.)
My impression is that people at AI companies who have similar opinions to me about risk from misalignment tend to also have pretty similar opinions on decision theory etc. AI company staff with more different opinions tend to have not thought much about decision theory etc.

Buck 27 Jan 2026 17:27 UTC
LW: 18 AF: 9
6
AF
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
Fwiw I’m not sure this is right; I think that a lot of questions about decision theory become pretty obvious once you start thinking about digital minds and simulations. And my guess is that a lot of FDT-like ideas would have become popular among people like the ones I work with, once people were thinking about those questions.

Buck 15 Jan 2026 19:47 UTC
9 points
0
on: Reflections on TA-ing Harvard’s first AI safety course
I really enjoyed speaking! The students had lots of great questions.

Buck 14 Jan 2026 16:34 UTC
23 points
0
on: Backyard cat fight shows Schelling points preexist language
if you like to think about this kind of thing, I recommend actually reading Schelling’s “the strategy of conflict”; I thought it had a bunch of interesting points that haven’t made their way into the water supply.

Buck 13 Jan 2026 5:35 UTC
4 points
0
in reply to: MichaelLowe’s comment on: David Matolcsi’s Shortform
I think I disagree—doing research like this (especially several such projects) is really helpful for getting hired!

The inaugural Redwood Research podcast

Buck and ryan_greenblatt

4 Jan 2026 22:11 UTC

138 points

9 comments142 min readLW link

Buck 3 Jan 2026 4:27 UTC
4 points
2
in reply to: Bronson Schoen’s comment on: Overwhelming Superintelligence
Yeah, I think control is unlikely to work for galaxy brained superintelligences. It’s unclear how superintelligent they have to be before control is totally unworkable.

Buck 3 Jan 2026 4:26 UTC
4 points
2
in reply to: Raemon’s comment on: Overwhelming Superintelligence
Were you suggesting something other than “remove the parentheses?” Or did it seem like I was thinking about it in a confused way? Not sure which direction you thought the mistake was in.
I think that it is worth conceptually distinguishing AIs that are uncontrollable from AIs that are able to build uncontrollable AIs, because the way you should handle those two kinds of AI are importantly different.

Buck 3 Jan 2026 4:25 UTC
6 points
2
in reply to: Raemon’s comment on: Overwhelming Superintelligence
I think this more feature-than-bug – the problem is that it’s overwhelming. There are multiple ways to be overwhelming, what we want to avoid is a situation where an overwhelming, unfriendly AI exists. One way is not build AI of a given power level. The other is to increase the robustness of civilization. (I agree the term is fuzzy, but I think realistically the territory is fuzzy).
When you’re thinking about how to mitigate the risks, it really matters which of these we’re talking about. I think there is some level of AI capability at which it’s basically hopeless to control the AIs; this is what I use “galaxy-brained superintelligence” to refer to. If you just want to talk about AIs that pose substantial risk of takeover, you probably shouldn’t use the word superintelligence in there, because they don’t obviously have to be superintelligences to pose takeover risk. (And it’s weird to use “overwhelmingly” as an adverb that modifies “superintelligent”, because the overwhelmingness isn’t about the level of intelligence, it’s about that and also the world. You could say “overwhelming, superintelligent AI” if you want to talk specifically about AIs that are overwhelming and also superintelligent, but that’s normally not what we want to talk about.)

Buck 2 Jan 2026 19:51 UTC
4 points
0
in reply to: Raemon’s comment on: Overwhelming Superintelligence
I don’t really understand what you’re saying. I think it’s very likely that [ETA: non-galaxy-brained] superintelligent AIs will be able to build galaxy-brained superintelligences within months to years if they are given (or can steal) the resources needed to produce them. I don’t think it’s obvious that they can do this with extremely limited resources.

Buck 2 Jan 2026 19:50 UTC
10 points
2
on: Overwhelming Superintelligence
I agree that it’s useful to have the concept of “Superintelligence that is so qualitatively intelligent that it’s very hard for us to be confident about what it will or won’t be able to accomplish, even given lots of constraints and limited resources.” I usually use “galaxy-brained superintelligence” for this in conversation, but obviously that’s a kind of dumb term. Maybe “massively qualitatively superintelligent” works? Bostrom uses “quality superintelligence”.
OpenPhil talked about the concept of “transformative AI” specifically because they were trying to talk about a broader class of AIs (though galaxy-brained superintelligence was a core part of their concern).
I don’t love “overwhelmingly superintelligent” because AIs don’t necessarily have to be qualitatively smarter than humanity to overwhelm it—whether we are overwhelmed by AIs that are “superintelligent” (in the weak sense that they’re qualitatively more intelligent than any human) IMO is affected by the quality of takeover countermeasures in place.
The type of AI I’m most directly worried about is “overwhelmingly superhuman compared to humanity.” (And, AIs that might quickly bootstrap to become overwhelmingly superhuman).
I think it’s a mistake to just mention that second thing as a parenthetical. There’s a huge difference between AIs that are already galaxy-brained superintelligences and AIs that could quickly build galaxy-brained superintelligences or modify themselves into galaxy-brained superintelligences—we should try to prevent the former category of AIs from building galaxy-brained superintelligences in ways we don’t approve of.

Buck 24 Dec 2025 19:01 UTC
6 points
0
on: Acausal communication between isolated universes through simulation
Yes, I think it’s reasonable to describe this as the creatures acausally communicating. (Though I would have described this differently; I think that all the physics stuff you said is not necessary for the core idea you want to talk about.)

Buck 24 Dec 2025 0:45 UTC
5 points
0
in reply to: Jeremy Gillen’s comment on: Tim Hua’s Shortform
If you wrote a rude comment in response to me, I wouldn’t feel bad about myself, but I would feel annoyed at you. (I feel bad about myself when I think my comments were foolish in retrospect or when I think they were unnecessarily rude in retrospect; the rudeness of replies to me don’t really affect how I feel about myself.) Other people are more likely to be hurt by rude comments, I think.
I wouldn’t be surprised if Tim found your comment frustrating and it made him less likely to want to write things like this in future. I don’t super agree with Tim’s post, but I do think LW is better if it’s the kind of place where people like him write posts like that (and then get polite pushback).
I have other thoughts here but they’re not very important.

Buck 24 Dec 2025 0:07 UTC
10 points
3
in reply to: yams’s comment on: Why does Eliezer make abrasive public comments?
I can see why the different things I’ve said on this might seem inconsistent :P It’s also very possible I’m wrong here, I’m not confident about this and have only spent a few hours in conversation about it. And if I wasn’t recently personally angered by Eliezer’s behavior, I wouldn’t have mentioned this opinion publicly. But here’s my current model.
My current sense is that IABIED hasn’t had that much of an effect on public perception of AI risk, compared to things like AI 2027. My previous sense was that there are huge downsides of Eliezer (and co) being more influential on the topic of AI safety, but MIRI had some chance of succeeding at getting lots of attention, so I was overall positive on you and other MIRI people putting your time into promoting the book. Because the book didn’t go as well as seemed plausible, promoting Eliezer’s perspective seems less like an efficient way of popularizing concern about AI risk, and less outweighs the disadvantages of him being having negative effects inside the AI safety community.
For example, my guess is that it’s worse for the MIRI governance team to be at MIRI than elsewhere except in as much as they gain prominence due to Eliezer association; if that second factor is weaker, it looks less good for them to be there.
I think my impression of the book is somewhat more negative than it was when it first came out, based on various discussions I’ve had with people about it. But this isn’t a big factor.
Does this make sense?
“The main thing Eliezer and MIRI have been doing since shifting focus to comms addressed a ‘shocking oversight’ that it’s hard to imagine anyone else doing a better job addressing” (lmk if this doesn’t feel like an accurate paraphrase)
This paraphrase doesn’t quite preserve the meaning I intended. I think many people would have done a somewhat better job.

Buck 23 Dec 2025 18:34 UTC
12 points
6
in reply to: DirectedEvolution’s comment on: Why does Eliezer make abrasive public comments?
Eliezer definitely doesn’t think of it as an ally (or at least, not a good ally who he is appreciative of and wants to be on good terms with).

Buck 23 Dec 2025 5:18 UTC
2 points
0
in reply to: RobertM’s comment on: Why does Eliezer make abrasive public comments?
How does the intro sentence seem triggered? How would you have written it?

Buck 23 Dec 2025 2:55 UTC
2 points
0
in reply to: Ben Pace’s comment on: Contradict my take on OpenPhil’s past AI beliefs
(Yeah, I was responding to the earlier version. I meant that in some cases you might want to cause someone to be taken more seriously but not want to cause people to think you take them more seriously (or not want to make that salient, or to make people think that you want them to think you want it to be salient, or whatever). Those are just different objectives you might have.)

Buck

The inau­gu­ral Red­wood Re­search podcast

The inaugural Redwood Research podcast