owencb

Karma: 2,386

owencb 22 Jan 2026 23:18 UTC
5 points
1
in reply to: nsage’s comment on: How (and why) to read Drexler on AI
I take this to be pretty strong evidence that this is not a good article for people reading Drexler to start with! (FWIW I valued reading it, but I’m now realising that the value I got was largely in understanding a bit better how Eric’s sweep of ideas connect, and perhaps that wouldn’t have been available to me if I hadn’t had the background context.)
Edit: I edited the original post to change the recommendation there slightly.

owencb 22 Jan 2026 12:42 UTC
2 points
−2
in reply to: Wei Dai’s comment on: How (and why) to read Drexler on AI
I feel like trying properly to explain it would veer more into speculating-about-his-psychology than I really want to. But it doesn’t seem totally inexplicable to me, and I’d imagine that an explanation might look something like:
Eric doesn’t think it’s his comparative advantage to answer these questions; he also sometimes experiences people raising them as distracting from the core messages he is trying to convey.
(To be clear, I’m not claiming that this is what is happening; I’m just trying to explain why it doesn’t feel in-principle inexplicable to me.)

owencb 16 Jan 2026 14:17 UTC
5 points
3
in reply to: XelaP’s comment on: On green
I think it’s easy to locally adopt bits of Greenish perspective when one can see how they would be instrumentally useful.
The claim I’m making is that it’s often a good idea to adopt bits of Greenish perspective even when you can’t see how they would be instrumentally useful—because a reasonable chunk of the time they will be instrumentally useful and you just can’t see it yet.
I don’t think that requires adopting Green’s justifications as terminal, but it does require you to adopt some generator-of-Greenish-perspective that isn’t just “Blue led me to a Greenish conclusion in this particular case”.

owencb 15 Jan 2026 11:34 UTC
24 points
11
in reply to: johnswentworth’s comment on: On green
I feel like you’re rejecting the instrumental value of Green based on a particular story you’ve invented about why green might be instrumentally valuable.
But … IDK, it seems to me like a lot of the value of Green relates to being a boundedly rational actor, in a world with other minds. When I envision a world with a bunch of actors who appreciate Green and try to stay connected to that in their actions, I think they’re less likely to fuck it up than a world with the same actors who otherwise disregard green. I think they’re more likely to respect Chesterton’s fences (and not cause unilateralist’s curse type catastrophes), and they’re more likely to act in ways which provide good interfaces for other people and make it easy for others to have justified trust in what’s happening.
If I imagine instead a single actor in an otherwise mindless universe, I much less have the feeling that things will go better if they appreciate Green.
Indeed, to go a little meta and a little speculative: it kind of feels like you’re making an epistemic error in this comment which I could round off as “too much Deep Atheism / lack of Green”: I don’t see a respect for the possibility that other minds, in liking Green, might be latching onto something which is deeper than what you’re perceiving it as; instead assuming that the hypothesis you thought of is the only one worth considering.

owencb 18 Dec 2025 16:05 UTC
2 points
0
in reply to: Raemon’s comment on: Acting Wholesomely
This makes a lot of sense! I do find that the way my brain wants to fit your experience in with my conception of wholesomeness is that you were perhaps not attending enough to the part of the whole that was your own internal experience and needs?

owencb 18 Dec 2025 0:49 UTC
2 points
0
in reply to: Ben Pace’s comment on: Acting Wholesomely
Oh nice, I kind of vibe with “meditation on a theme” as a description of what this post is doing and failing to do.

owencb 17 Dec 2025 11:43 UTC
15 points
0
on: The Choice Transition
Overall I’m really happy with this post.
It crystallized a bunch of thoughts I’d had for a while before this, and has been useful as a conceptual building block that’s fed into my general thinking about the situation with AI, and the value of accelerating tools to improve epistemics and coordination. I often find myself wanting to link people to it.
Possible weaknesses:
- While I think the basic analysis looks directionally correct, I wonder if in places it’s a bit oversimplified (like maybe you could usefully unpack the concept of “deliberate steering”)
  - Seems fine for early work on this?
- Empirically, the concept doesn’t seem to have been catchy
  - I think that we’ve done more to spread the general idea than the specific “choice transition” frame or name
  - Maybe we could have done better; maybe that’s just OK
- I sometimes wish the post was shorter, to give people a crisper pointer to the core idea?
  - There’s a problem here of multiple audiences
  - Possible it should have a short summary piece too somewhere

owencb 17 Dec 2025 11:17 UTC
5 points
0
on: AI, centralization, and the One Ring
This was written a few months after Situational Awareness. I felt like there was kind of a missing mood in x-risk discourse around that piece, and this was an attempt to convey both the mood and something of the generators of the mood.
Since then, the mood has shifted, to something that feels healthier to me. 80,000 Hours has a problem profile on extreme power concentration. At this point I mostly wouldn’t link back to this post (preferring to link e.g. to more substantive research), although I might if I just really wanted to convey the mood to someone. I’m not really sure whether my article had any counterfactual responsibility for the research people have done in the interim.

owencb 17 Dec 2025 11:07 UTC
4 points
0
on: Caring about excellence
I’m happy with this post. I think it captures something meta-level which is important in orienting to doing a good job of all sorts of work, and I occasionally want to point people to this.
Most of the thoughts probably aren’t super original, but for something this important I am surprised that there isn’t much more explicit discussion—it seems like it’s often just talked about at the level of a few sentences, and regarded as a matter of taste, or something. For people who aspire to do valuable work, I guess it’s generally worth spending a few hours a year explicitly thinking about the tradeoffs here and how to navigate them in particular situations—and then probably worth at least a bit of scaffolding or general thinking about the topic.

owencb 17 Dec 2025 11:01 UTC
LW: 4 AF: 2
0
AF
on: Decomposing Agency — capabilities without desires
I like this post and am glad that we wrote it.
Despite that, I feel keenly aware that it’s asking a lot more questions than it’s answering. I don’t think I’ve got massively further in the intervening year in having good answers to those questions. The way this thinking seems to me to be most helpful is as a background model to help avoid confused assumptions when thinking about the future of AI. I do think this has impacted the way I think about AI risk, but I haven’t managed to articulate that well yet (maybe in 2026 …).

owencb 17 Dec 2025 10:46 UTC
4 points
2
on: Acting Wholesomely
Looking back, I have mixed feelings about this post (and series).
On the one hand, I think they’re getting at something really important. Rereading them, I feel like they’re pointing to a stance I aspire to inhabit, and there’s some value in the pointers they’re giving. I’m not sure that I know better content on quite this topic.
On the other hand, they feel … kind of slightly half-baked, or naming something-in-the-vicinity of what matters, rather than giving the true name of the thing. I don’t find myself naturally drawn to linking people to this, because I feel a dissatisfaction that it’s something-like-rambling rather than being the best version of itself.
I do still hope that someone else thinks about this material, reinvents a better version for themself, and writes about that.

owencb 11 Jul 2025 16:27 UTC
4 points
0
in reply to: Richard_Ngo’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
No competition open right now, but I think there’s an audience (myself included) for more good thoughts on this topic, if you have something that feels like it might be worth sharing.

owencb 1 Jul 2025 19:15 UTC
9 points
4
in reply to: johnswentworth’s comment on: Embedded Altruism [slides]
Of course I’m into trying to understand things better (and that’s a good slice of what I recommend!), but:
- You need to make decisions in the interim
- There is a bunch of detail that won’t be captured by whatever your high level models are (like what will be the impacts of wording an email this way versus that)
- I think that for complete decisions you’d have a model of the whole future unfolding of civilization, and this is hard enough that we’re not going to do it with “a few years of study”

owencb 16 Apr 2025 21:04 UTC
3 points
0
in reply to: benwr’s comment on: Not all capabilities will be created equal: focus on strategically superhuman agents
It seems fine to me to have the goalposts moving, but then I think it’s important to trace through the implications of that.
Like, if the goalposts can move then this seems like perhaps the most obvious way out of the predicament; to keep the goalposts ever ahead of AI capabilities. But when I read your post I get the vibe that you’re not imagining this as a possibility?

owencb 12 Apr 2025 21:59 UTC
5 points
0
on: Not all capabilities will be created equal: focus on strategically superhuman agents
If we are going to build these agents without “losing the game”, either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there’s a day when an AI agent is created without either of these conditions, that’s the day I’d consider humanity to have lost.
Something seems funny to me here.
It might be to do with the boundaries of your definition. If humans agents are getting empowered by strategically superhuman (in an everyday sense) AI systems (agentic or otherwise), perhaps that raises the bar for what counts as superhuman for the purposes of this post? If so I think the argument would make sense to me, but it feels a bit funny to me to have this definition which is such a moving goalpost, and also might never get crossed even as AI gets arbitrarily powerful.
Alternatively, it might be that your definition is kind of an everyday one, but in that case your conclusion seems pretty surprising. Like it seems easy to me to imagine worlds where there are some agents without either of those conditions, but that they’re not better than the empowered humans.
Or perhaps something else is going on. Just trying to voice my confusions.
I do appreciate the attempt to analyse which kinds of capabilities are actually crucial.

owencb 18 Nov 2024 20:27 UTC
2 points
0
in reply to: Gordon Seidoh Worley’s comment on: The Choice Transition
It’s been a long time since I read those books, but if I’m remembering roughly right: Asimov seems to describe a world where choice is in a finely balanced equilibrium with other forces (I’m inclined to think: implausibly so—if it could manage this level of control at great distances in time, one would think that it could manage to exert more effective control over things at somewhat less distance).

owencb 22 Sep 2024 11:19 UTC
5 points
0
in reply to: sweenesm’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
I’ve now sent emails contacting all of the prize-winners.

owencb 14 Sep 2024 13:06 UTC
2 points
0
in reply to: owencb’s comment on: AI, centralization, and the One Ring
Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it’s maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons.

I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.

owencb 14 Sep 2024 11:38 UTC
2 points
0
in reply to: Wei Dai’s comment on: AI, centralization, and the One Ring
1. Yeah. As well as another consequentialist argument, which is just that it will be bad for other people to be dominated. Somehow the arguments feel less natively consequentialist, and so it seems somehow easier to hold them in these other frames, and then translate them into consequentialist ontology if that’s relevant; but also it would be very reasonable to mention them in the footnote.
2. My first reaction was that I do mention the downsides. But I realise that that was a bit buried in the text, and I can see that that could be misleading about my overall view. I’ve now edited the second paragraph of the post to be more explicit about this. I appreciate the pushback.

owencb 13 Sep 2024 20:29 UTC
3 points
0
in reply to: habryka’s comment on: AI, centralization, and the One Ring
Ha, thanks!
(It was part of the reason. Normally I’d have made the effort to import, but here I felt a bit like maybe it was just slightly funny to post the one-sided thing, which nudged against linking rather than posting; and also I thought I’d take the opportunity to see experimentally whether it seemed to lead to less engagement. But those reasons were not overwhelming, and now that you’ve put the full text here I don’t find myself very tempted to remove it. :) )