rosehadshar

Karma: 1,237

What would adults in the room know about AI risk?

rosehadshar20 Nov 2025 9:11 UTC

16 points

2 comments3 min readLW link

rosehadshar 8 Oct 2025 12:50 UTC
2 points
0
in reply to: L Rudolf L’s comment on: Sense-making about extreme power concentration
Yup sorry, the Tom above is actually Rose!
I like your distinction between narrow and broad IC dynamics. I was basically thinking just about narrow, but now agree that there is also potentially some broader thing.
How likely do you think it is that helper-nanobots outcompete auto-nanobots? Two possible things that could be going on are:
1. I’m unhelpfully abstracting away what kind of AI systems we end up with, but actually this significantly impacts how likely power concentration is and I should think more about it
2. Theoretically the distinction between helper and auto-nanobots is significant, but in practice it’s very unlikely that helper-nanobots will be competitive, so it’s fine to ignore the possibility and treat auto-nanobots as ‘AI’

rosehadshar 15 Sep 2025 7:56 UTC
2 points
0
in reply to: Dalmert’s comment on: Sense-making about extreme power concentration
All of 1-4 seem plausible to me, and I don’t centrally expect that power concentration will lead to everyone dying.
Even if all of 1-4 hold, I think the future will probably be a lot less good than it could have been:
- 4 is more likely to mean that earth becomes a nature reserve for humans or something, than that the stars are equitably allocated
- I’m worried that there are bad selection effects such that 3 already screens out some kinds of altruists (e.g. ones who aren’t willing to strategy steal). Some good stuff might still happen to existing humans, but the future will miss out on some values completely
- I’m worried about power corrupting/there being no checks and balances/there being no incentives to keep doing good stuff for others

rosehadshar 12 Sep 2025 8:16 UTC
4 points
0
in reply to: Jonas Hallgren’s comment on: Sense-making about extreme power concentration
Thanks, agree that ‘emergent dynamics’ is woolly above.
I guess I don’t think the y-axis should be the temporal dimension. To give some cartoon examples:
- I’d put an extremely Machievellian 10 year plan on the part of a cabal of politicians to backslide into a dictatorship then seize power over the rest of the world near the top end of the axis
- I’d put unfavourable order of capabilities, where in an unplanned way superpersuasion comes online before defenses, and actors fail to coordinate not to deploy it because of competitive dynamics, near the bottom end of the axis. Even if the whole thing unfolds over a few months
I do think the y-axis is pretty correlated with temporal scales, but I don’t think it’s the same. I also don’t think physical violence is the same, though it’s probably also correlated (cf the backsliding example which is v powerseeking but not v violent).

The thing I had in mind was more like, should I imagine some actor consciously trying to bring power concentration about? To the extent that’s a good model, it’s power-seeking. Or should I imagine that no actor is consciously planning this, but the net result of the system is still extreme power concentration? If that’s a good model, it’s emergent dynamics.
Idk, I see that this is messy and probably there’s some other better concept here

Sense-making about extreme power concentration

rosehadshar11 Sep 2025 10:09 UTC

69 points

25 comments4 min readLW link

Good government

rosehadshar10 Sep 2025 13:22 UTC

26 points

0 comments6 min readLW link

rosehadshar 4 Aug 2025 16:13 UTC
LW: 2 AF: 1
0
AF
in reply to: Lukas Finnveden’s comment on: AI-enabled coups: a small group could use AI to seize power
I think I agree that, once an AI-enabled coup has happened, the expected remaining AI takeover risk would be much lower. This is partly because it ends the race within the country where the takeover happened (though it wouldn’t necessarily end the international race), but also partly just because of the evidential update: apparently AI is now capable of taking over countries, and apparently someone could instruct the AIs to do that, and the AIs handed the power right back to that person! Seems like alignment is working.
I don’t currently agree that the remaining AI takeover risk would be much lower:
- The international race seems like a big deal. Ending the domestic race is good, but I’d still expect reckless competition I think. Maybe you’re imagining that a large chunk of powergrabs are motivated by stopping the race? I’m a bit sceptical.
- I don’t think the evidential update is that strong. If misaligned AI found it convenient to take over the US using humans, why should we expect them to immediately cease to find humans useful at that point? They might keep using humans as they accumulate more power, up until some later point.
- There’s another evidential update which I think is much stronger, which is that the world has completely dropped the ball on an important thing almost no one wants (powergrabs), where there are tractable things they could have done, and some of those things would directly reduce AI takeover risk (infosec, alignment audits etc). In a world where coups over the US are possible, I expect we’ve failed to do basic alignment stuff too.
Curious what you think.

rosehadshar 27 Jun 2025 12:52 UTC
3 points
0
in reply to: cadca’s comment on: The Industrial Explosion
I think it might be a bit clearer to communicate the stages by naming them based on the main vector of improvement throughout the entire stage, i.e. ‘optimization of labor’ for stage one, ‘automation of labor’ for stage two, ‘miniturization’ for stage three.

I think these names are better names for the underlying dynamics, at least—thanks for suggesting them. (I’m less clear they are better labels for the stages overall, as they are a bit more abstract.)

The Industrial Explosion

rosehadshar and Tom Davidson

26 Jun 2025 14:41 UTC

124 points

66 comments15 min readLW link

(www.forethought.org)

AI-enabled coups: a small group could use AI to seize power

Tom Davidson, Lukas Finnveden and rosehadshar

16 Apr 2025 16:51 UTC

132 points

23 comments7 min readLW link

Three Types of Intelligence Explosion

rosehadshar, Tom Davidson and wdmacaskill

17 Mar 2025 14:47 UTC

40 points

8 comments3 min readLW link

(www.forethought.org)

Intelsat as a Model for International AGI Governance

rosehadshar and wdmacaskill

13 Mar 2025 12:58 UTC

45 points

0 comments1 min readLW link

(www.forethought.org)

rosehadshar 10 Dec 2024 10:30 UTC
1 point
0
in reply to: Rohin Shah’s comment on: Should there be just one western AGI project?
Changed to motivation, thanks for the suggestion.
I agree that centralising to make AI safe would make a difference. It seems a lot less likely to me than centralising to beat China (there’s already loads of beat China rhetoric, and it doesn’t seem very likely to go away).

rosehadshar 9 Dec 2024 13:36 UTC
1 point
0
in reply to: Oscar’s comment on: Should there be just one western AGI project?
“it is potentially a lot easier to stop a single project than to stop many projects simultaneously” → agree.

rosehadshar 9 Dec 2024 13:36 UTC
2 points
0
in reply to: Oscar’s comment on: Should there be just one western AGI project?
I think I still believe the thing we initially wrote:
- Agree with you that there might be strong incentives to sell stuff at monopoloy prices (and I’m worried about this). But if there’s a big gap, you can do this without selling your most advanced models. (You sell access to weaker models for a big mark up, and keep the most advanced ones to yourselves to help you further entrench your monopoly/your edge over any and all other actors.)
- I’m sceptical of worlds where 5 similarly advanced AGI projects don’t bother to sell
  - Presumably any one of those could defect at any time and sell at a decent price. Why doesn’t this happen?
  - Eventually they need to start making revenue, right? They can’t just exist on investment forever
    (I am also not an economist though and interested in pushback.)

rosehadshar 9 Dec 2024 13:30 UTC
3 points
0
in reply to: Rohin Shah’s comment on: Should there be just one western AGI project?
Thanks, I expect you’re right that there’s some confusion in my thinking here.

Haven’t got to the bottom of it yet, but on more incentive to steal the weights:
- partly I’m reasoning in the way that you guess, more resources → more capabilities → more incentives
- I’m also thinking “stronger signal that the US is all in and thinks this is really important → raises p(China should also be all in) from a Chinese perspective → more likely China invests hard in stealing the weights”
- these aren’t independent lines of reasoning, as the stronger signal is sent by spending more resources
- but I tentatively think that it’s not the case that at a fixed capability level the incentives to steal the weights are the same. I think they’d be higher with a centralised project, as conditional on a centralised project there’s more reason for China to believe a) AGI is the one thing that matters, b) the US is out to dominate

rosehadshar 6 Dec 2024 11:52 UTC
3 points
0
in reply to: Aaron_Scher’s comment on: Should there be just one western AGI project?
Thanks, I agree this is an important argument.
Two counterpoints:
- The more projects you have, the more attempts at alignment you have. It’s not obvious to me that more draws are net bad, at least at the margin of 1 to 2 or 3.
- I’m more worried about the harms from a misaligned singleton than from a misaligned (or multiple misaligned) systems in a wider ecosystem which includes powerful aligned systems.

rosehadshar 6 Dec 2024 11:36 UTC
1 point
0
in reply to: Gurkenglas’s comment on: Should there be just one western AGI project?
Thanks! Fwiw I agree with Zvi on “At a minimum, let’s not fire off a starting gun to a race that we might well not win, even if all of humanity wasn’t very likely to lose it, over a ‘missile gap’ style lie that we are somehow not currently in the lead.”

rosehadshar 5 Dec 2024 14:15 UTC
2 points
1
in reply to: calebp99’s comment on: Should there be just one western AGI project?
Thanks for these questions!

Earlier attacks: My thinking here is that centralisation might a) cause China to get serious about stealing the weights sooner, and b) therefore allow less time for building up great infosec. So it would be overall bad for infosec. (It’s true the models would be weaker, so stealing the weights earlier might not matter so much. But I don’t feel very confident that strong infosec would be in place before the models are dangerous (with or without centralisation))

More attack surface: I am trying to compare multiple projects with a single project. The attack surface of a single project might be bigger if the single project itself is very large. As a toy example, imagine 3 labs with 100 employees each. But then USG centralises everything to beat China and pours loads more resources into AGI development. The centralised project has 1000 staff; the counterfactual was 300 staff spread across 3 projects.

China stealing weights: sorry, I agree that it’s harder for everyone including China, and that all else equal this disincentivises stealing the weights. But a) China is more competent than other actors, so for a fixed increase in difficulty China will be less disincentivised than other actors, b) China has bigger incentives to steal the weights to begin with, and c) for China in particular there might be incentives that push the other way (centralising could increase race dynamics between the US and China, and potentially reduce China’s chances of developing AGI first without stealing the weights), and those might counteract the disincentive. Does that make more sense?

rosehadshar 5 Dec 2024 14:06 UTC
3 points
0
in reply to: Gurkenglas’s comment on: Should there be just one western AGI project?
My main take here is that it seems really unlikely that the US and China would agree to work together on this.

rosehadshar

What would adults in the room know about AI risk?

Sense-mak­ing about ex­treme power concentration

Good government

The In­dus­trial Explosion

AI-en­abled coups: a small group could use AI to seize power

Three Types of In­tel­li­gence Explosion

In­tel­sat as a Model for In­ter­na­tional AGI Governance

Sense-making about extreme power concentration

The Industrial Explosion

AI-enabled coups: a small group could use AI to seize power

Three Types of Intelligence Explosion

Intelsat as a Model for International AGI Governance