Oh nice, I kind of vibe with “meditation on a theme” as a description of what this post is doing and failing to do.
owencb
Overall I’m really happy with this post.
It crystallized a bunch of thoughts I’d had for a while before this, and has been useful as a conceptual building block that’s fed into my general thinking about the situation with AI, and the value of accelerating tools to improve epistemics and coordination. I often find myself wanting to link people to it.
Possible weaknesses:
While I think the basic analysis looks directionally correct, I wonder if in places it’s a bit oversimplified (like maybe you could usefully unpack the concept of “deliberate steering”)
Seems fine for early work on this?
Empirically, the concept doesn’t seem to have been catchy
I think that we’ve done more to spread the general idea than the specific “choice transition” frame or name
Maybe we could have done better; maybe that’s just OK
I sometimes wish the post was shorter, to give people a crisper pointer to the core idea?
There’s a problem here of multiple audiences
Possible it should have a short summary piece too somewhere
This was written a few months after Situational Awareness. I felt like there was kind of a missing mood in x-risk discourse around that piece, and this was an attempt to convey both the mood and something of the generators of the mood.
Since then, the mood has shifted, to something that feels healthier to me. 80,000 Hours has a problem profile on extreme power concentration. At this point I mostly wouldn’t link back to this post (preferring to link e.g. to more substantive research), although I might if I just really wanted to convey the mood to someone. I’m not really sure whether my article had any counterfactual responsibility for the research people have done in the interim.
I’m happy with this post. I think it captures something meta-level which is important in orienting to doing a good job of all sorts of work, and I occasionally want to point people to this.
Most of the thoughts probably aren’t super original, but for something this important I am surprised that there isn’t much more explicit discussion—it seems like it’s often just talked about at the level of a few sentences, and regarded as a matter of taste, or something. For people who aspire to do valuable work, I guess it’s generally worth spending a few hours a year explicitly thinking about the tradeoffs here and how to navigate them in particular situations—and then probably worth at least a bit of scaffolding or general thinking about the topic.
I like this post and am glad that we wrote it.
Despite that, I feel keenly aware that it’s asking a lot more questions than it’s answering. I don’t think I’ve got massively further in the intervening year in having good answers to those questions. The way this thinking seems to me to be most helpful is as a background model to help avoid confused assumptions when thinking about the future of AI. I do think this has impacted the way I think about AI risk, but I haven’t managed to articulate that well yet (maybe in 2026 …).
Looking back, I have mixed feelings about this post (and series).
On the one hand, I think they’re getting at something really important. Rereading them, I feel like they’re pointing to a stance I aspire to inhabit, and there’s some value in the pointers they’re giving. I’m not sure that I know better content on quite this topic.
On the other hand, they feel … kind of slightly half-baked, or naming something-in-the-vicinity of what matters, rather than giving the true name of the thing. I don’t find myself naturally drawn to linking people to this, because I feel a dissatisfaction that it’s something-like-rambling rather than being the best version of itself.
I do still hope that someone else thinks about this material, reinvents a better version for themself, and writes about that.
No competition open right now, but I think there’s an audience (myself included) for more good thoughts on this topic, if you have something that feels like it might be worth sharing.
Of course I’m into trying to understand things better (and that’s a good slice of what I recommend!), but:
You need to make decisions in the interim
There is a bunch of detail that won’t be captured by whatever your high level models are (like what will be the impacts of wording an email this way versus that)
I think that for complete decisions you’d have a model of the whole future unfolding of civilization, and this is hard enough that we’re not going to do it with “a few years of study”
It seems fine to me to have the goalposts moving, but then I think it’s important to trace through the implications of that.
Like, if the goalposts can move then this seems like perhaps the most obvious way out of the predicament; to keep the goalposts ever ahead of AI capabilities. But when I read your post I get the vibe that you’re not imagining this as a possibility?
If we are going to build these agents without “losing the game”, either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there’s a day when an AI agent is created without either of these conditions, that’s the day I’d consider humanity to have lost.
Something seems funny to me here.
It might be to do with the boundaries of your definition. If humans agents are getting empowered by strategically superhuman (in an everyday sense) AI systems (agentic or otherwise), perhaps that raises the bar for what counts as superhuman for the purposes of this post? If so I think the argument would make sense to me, but it feels a bit funny to me to have this definition which is such a moving goalpost, and also might never get crossed even as AI gets arbitrarily powerful.
Alternatively, it might be that your definition is kind of an everyday one, but in that case your conclusion seems pretty surprising. Like it seems easy to me to imagine worlds where there are some agents without either of those conditions, but that they’re not better than the empowered humans.
Or perhaps something else is going on. Just trying to voice my confusions.
I do appreciate the attempt to analyse which kinds of capabilities are actually crucial.
It’s been a long time since I read those books, but if I’m remembering roughly right: Asimov seems to describe a world where choice is in a finely balanced equilibrium with other forces (I’m inclined to think: implausibly so—if it could manage this level of control at great distances in time, one would think that it could manage to exert more effective control over things at somewhat less distance).
I’ve now sent emails contacting all of the prize-winners.
Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it’s maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons.
I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.
Yeah. As well as another consequentialist argument, which is just that it will be bad for other people to be dominated. Somehow the arguments feel less natively consequentialist, and so it seems somehow easier to hold them in these other frames, and then translate them into consequentialist ontology if that’s relevant; but also it would be very reasonable to mention them in the footnote.
My first reaction was that I do mention the downsides. But I realise that that was a bit buried in the text, and I can see that that could be misleading about my overall view. I’ve now edited the second paragraph of the post to be more explicit about this. I appreciate the pushback.
Ha, thanks!
(It was part of the reason. Normally I’d have made the effort to import, but here I felt a bit like maybe it was just slightly funny to post the one-sided thing, which nudged against linking rather than posting; and also I thought I’d take the opportunity to see experimentally whether it seemed to lead to less engagement. But those reasons were not overwhelming, and now that you’ve put the full text here I don’t find myself very tempted to remove it. :) )
The judging process should be complete in the next few days. I expect we’ll write to winners at the end of next week, although it’s possible that will be delayed. A public announcement of the winners is likely to be a few more weeks.
I don’t see why (1) says you should be very early. Isn’t the decrease in measure for each individual observer precisely outweighed by their increasing multitudes?
This kind of checks out to me. At least, I agree that it’s evidence against treating quantum computers as primitive that humans, despite living in a quantum world, find classical computers more natural.
I guess I feel more like I’m in a position of ignorance, though, and wouldn’t be shocked to find some argument that quantum has in some other a priori sense a deep naturalness which other niche physics theories lack.
You say that quantum computers are more complex to specify, but is this a function of using a classical computer in the speed prior? I’m wondering if it could somehow be quantum all the way down.
This makes a lot of sense! I do find that the way my brain wants to fit your experience in with my conception of wholesomeness is that you were perhaps not attending enough to the part of the whole that was your own internal experience and needs?