owencb

Karma: 2,239

owencb 11 Jul 2025 16:27 UTC
4 points
0
in reply to: Richard_Ngo’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
No competition open right now, but I think there’s an audience (myself included) for more good thoughts on this topic, if you have something that feels like it might be worth sharing.

owencb 1 Jul 2025 19:15 UTC
8 points
3
in reply to: johnswentworth’s comment on: Embedded Altruism [slides]
Of course I’m into trying to understand things better (and that’s a good slice of what I recommend!), but:
- You need to make decisions in the interim
- There is a bunch of detail that won’t be captured by whatever your high level models are (like what will be the impacts of wording an email this way versus that)
- I think that for complete decisions you’d have a model of the whole future unfolding of civilization, and this is hard enough that we’re not going to do it with “a few years of study”

Embedded Altruism [slides]

owencb1 Jul 2025 13:02 UTC

20 points

3 comments1 min readLW link

The crucible — how I think about the situation with AI

owencb5 May 2025 13:18 UTC

25 points

1 comment8 min readLW link

(strangecities.substack.com)

owencb 16 Apr 2025 21:04 UTC
3 points
0
in reply to: benwr’s comment on: Not all capabilities will be created equal: focus on strategically superhuman agents
It seems fine to me to have the goalposts moving, but then I think it’s important to trace through the implications of that.
Like, if the goalposts can move then this seems like perhaps the most obvious way out of the predicament; to keep the goalposts ever ahead of AI capabilities. But when I read your post I get the vibe that you’re not imagining this as a possibility?

owencb 12 Apr 2025 21:59 UTC
5 points
0
on: Not all capabilities will be created equal: focus on strategically superhuman agents
If we are going to build these agents without “losing the game”, either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there’s a day when an AI agent is created without either of these conditions, that’s the day I’d consider humanity to have lost.
Something seems funny to me here.
It might be to do with the boundaries of your definition. If humans agents are getting empowered by strategically superhuman (in an everyday sense) AI systems (agentic or otherwise), perhaps that raises the bar for what counts as superhuman for the purposes of this post? If so I think the argument would make sense to me, but it feels a bit funny to me to have this definition which is such a moving goalpost, and also might never get crossed even as AI gets arbitrarily powerful.
Alternatively, it might be that your definition is kind of an everyday one, but in that case your conclusion seems pretty surprising. Like it seems easy to me to imagine worlds where there are some agents without either of those conditions, but that they’re not better than the empowered humans.
Or perhaps something else is going on. Just trying to voice my confusions.
I do appreciate the attempt to analyse which kinds of capabilities are actually crucial.

Disempowerment spirals as a likely mechanism for existential catastrophe

Raymond Douglas and owencb

10 Apr 2025 14:37 UTC

74 points

7 comments5 min readLW link

Knowledge, Reasoning, and Superintelligence

owencb26 Mar 2025 23:28 UTC

21 points

1 comment7 min readLW link

(strangecities.substack.com)

AI Tools for Existential Security

Lizka and owencb

14 Mar 2025 18:38 UTC

22 points

4 comments11 min readLW link

(www.forethought.org)

owencb 18 Nov 2024 20:27 UTC
2 points
0
in reply to: Gordon Seidoh Worley’s comment on: The Choice Transition
It’s been a long time since I read those books, but if I’m remembering roughly right: Asimov seems to describe a world where choice is in a finely balanced equilibrium with other forces (I’m inclined to think: implausibly so—if it could manage this level of control at great distances in time, one would think that it could manage to exert more effective control over things at somewhat less distance).

The Choice Transition

owencb and Raymond Douglas

18 Nov 2024 12:30 UTC

53 points

4 comments15 min readLW link

(strangecities.substack.com)

A brief history of the automated corporation

owencb4 Nov 2024 14:35 UTC

26 points

1 comment5 min readLW link

(strangecities.substack.com)

Winners of the Essay competition on the Automation of Wisdom and Philosophy

owencb and AI Impacts

28 Oct 2024 17:10 UTC

40 points

3 comments30 min readLW link

(blog.aiimpacts.org)

AI safety tax dynamics

owencb23 Oct 2024 12:18 UTC

22 points

0 comments6 min readLW link

(strangecities.substack.com)

Safety tax functions

owencb20 Oct 2024 14:08 UTC

31 points

0 comments6 min readLW link

(strangecities.substack.com)

owencb 22 Sep 2024 11:19 UTC
5 points
0
in reply to: sweenesm’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
I’ve now sent emails contacting all of the prize-winners.

owencb 14 Sep 2024 13:06 UTC
2 points
0
in reply to: owencb’s comment on: AI, centralization, and the One Ring
Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it’s maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons.

I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.

owencb 14 Sep 2024 11:38 UTC
2 points
0
in reply to: Wei Dai’s comment on: AI, centralization, and the One Ring
1. Yeah. As well as another consequentialist argument, which is just that it will be bad for other people to be dominated. Somehow the arguments feel less natively consequentialist, and so it seems somehow easier to hold them in these other frames, and then translate them into consequentialist ontology if that’s relevant; but also it would be very reasonable to mention them in the footnote.
2. My first reaction was that I do mention the downsides. But I realise that that was a bit buried in the text, and I can see that that could be misleading about my overall view. I’ve now edited the second paragraph of the post to be more explicit about this. I appreciate the pushback.

owencb 13 Sep 2024 20:29 UTC
3 points
0
in reply to: habryka’s comment on: AI, centralization, and the One Ring
Ha, thanks!
(It was part of the reason. Normally I’d have made the effort to import, but here I felt a bit like maybe it was just slightly funny to post the one-sided thing, which nudged against linking rather than posting; and also I thought I’d take the opportunity to see experimentally whether it seemed to lead to less engagement. But those reasons were not overwhelming, and now that you’ve put the full text here I don’t find myself very tempted to remove it. :) )

owencb 13 Sep 2024 14:54 UTC
8 points
2
in reply to: sweenesm’s comment on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
The judging process should be complete in the next few days. I expect we’ll write to winners at the end of next week, although it’s possible that will be delayed. A public announcement of the winners is likely to be a few more weeks.