Charbel-Raphaël comments on Plans A, B, C, and D for misalignment risk

Charbel-Raphaël 26 Nov 2025 11:55 UTC

LW: 0 AF: -1

I’ve asked Claude to make a rough assessment on this. Tldr, the proba goes from 13% to ~27 and this propagates to plan C and D.

sabotaging Chinese AI companies?

Claude: Ryan’s response is suggestive but incomplete. “Sabotaging Chinese AI companies” gestures at a possible answer but doesn’t constitute a full defense because:

It’s extremely escalatory and might not be politically viable even with high US government buy-in
Its effectiveness is uncertain—how much lead time would successful sabotage actually buy? Months? Years?
It’s not obviously repeatable; China would harden against further attacks
It could provoke dangerous counter-responses

To be fair to Ryan, the original post does mention “helping the US government ensure non-proliferation/lead time” under Plan B, so the concern isn’t entirely unaddressed. And sabotage is just one of several possible mechanisms (export controls, talent restrictions, etc.).

My assessment: Charbel identifies a genuine tension in the framework, and Ryan’s brief reply opens a line of argument rather than closing the issue. A fuller defense would need to explain how Plan B achieves the assumed lead time against a near-peer competitor, and what probability of success that mechanism has.

Charbel: Reason like a superforecaster and assign a probability more accurate than 13% taking into account all of this.

Claude: This requires decomposing the problem carefully.

The core question: In Plan B, can the US actually use 1-3 years for safety work, or does China competition make this illusory?

Key sub-scenarios within Plan B:

Scenario	Description	Probability	Takeover Risk
B1	US successfully maintains lead via export controls, compute advantages, possible sabotage; safety work proceeds	~25%	~12%
B2	Lead partially erodes, ~1 year of real safety time, rushed execution	~35%	~28%
B3	Plan B is unstable—slowing means losing, race dynamics reassert, collapses toward Plan C/D	~25%	~40%
B4	China reaches transformative AI first or simultaneously despite US efforts	~15%	~65%

Weighted estimate: ~27%

Why higher than Ryan’s 13%:

He’s implicitly assuming lead maintenance succeeds; I’m treating this as uncertain
“Competent execution” within the US doesn’t solve external China dynamics
Plan B may be partially unstable as a strategy

Why much lower than Charbel’s implied ~75%:

Plan B ≠ Plan E. Government buy-in provides real resources even if imperfect
Safety work can proceed concurrently with capabilities, not requiring full pause
Aggressive US action (export controls already show some effect) buys some time
Ryan’s response about sabotage, while incomplete, points to a real option space

My 90% confidence interval: 15-45%

Charbel identifies a genuine problem but overstates its severity. Ryan’s estimate looks ~2x too optimistic.

Charbel: Ok, now recreate the table with all the probabilities , considering this?

Claude: Applying the same “China problem” logic consistently across scenarios:

Plan	Probability	Original Risk	Revised Risk	Rationale	Expected Contribution
A	5%	7%	8%	International agreements can still fail; slight upward revision	0.40%
B	10%	13%	27%	China dynamics make lead maintenance uncertain (my earlier analysis)	2.70%
C	25%	20%	30%	Same China problem, even worse with 2-9 month lead; spending lead = losing it	7.50%
D	45%	45%	48%	Already assumes race dynamics; slight increase for consistency	21.60%
E	15%	75%	75%	Already at high baseline; no revision needed	11.25%
Total	100%	-	-	-	43.45%

Key changes:

Total expected takeover risk rises from 38.15% → 43.45% (~5 percentage point increase)
The China problem affects Plans B and C most severely since they assume lead time that may be illusory
Plan A is partially insulated because it includes international coordination
Plan D/E already assume race dynamics, so less revision needed

Implication: Charbel’s critique doesn’t just affect Plan B—it propagates through the framework. Any plan assuming unilateral slowdown without China buy-in inherits this vulnerability.

ryan_greenblatt 26 Nov 2025 16:44 UTC
LW: 2 AF: 2
0
AF Parent
I’m just literally assuming that Plan B involves a moderate amount of lead time via the US having a lead or trying pretty hard to sabotage China, this is part of the plan/assumptions.