Raemon comments on Plans A, B, C, and D for misalignment risk

Raemon 9 Oct 2025 19:47 UTC
4 points
0
So I think this paragraph isn’t really right, because “slowdown’ != ‘pause’, and slowdowns might still be really really helpful and enough to get you a long way.
I think “everyone agrees to a noticeably smaller next-run-size” seems like a fine thing to do as the first coordination attempt.
I think there is something good about having an early step (maybe after that one), which somehow forces people to actually orient on “okay, suppose we actually had to prioritize interpretability and evals now until they were able to keep pace with capabilities, how would we seriously do that?”
(I don’t currently have a good operationalize of this that seems robust, but, it seems plausible by the time we’re meaningfully able to decide to do anything like this, someone may have come up with a good policy with that effect. I can definitely see this backfiring and causing people to get better at some kind of software that is then harder to control).
I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I’m not sure exactly what you mean by “in an uncontrolled fashion”.. if you mean “have a bunch of inspectors making sure the flow of new chips isn’t being smuggled to illegal projects”, then I agree with this, on my initial read I thought you meant something like “pause chip production until they start producing GPUS with HEMs in them”, which I think is probably bad.
In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are:
1. compute is controllable, far more than software, and so differentially advances legal projects.
2. more compute for safety. We want to be able to pay a big safety tax, more compute straightforwardly helps.
3. extra compute progress funges against software progress, which is scarier.
4. compute is destroyable (e.g. we can reverse and destroy compute, if we want to eat an overhang), but software progress mostly isn’t (you can’t unpublish reserach).
Mmm, nod I can see it. I’d need to think more to figure out a considered opinion on this but seems a-priori reasonable.
I think one of the things I want is to have executed each type of control you might want to exert, at least for a shorter period of time, to test whether you’re able to do it at all. But, having the early compute steps be more focused on “they have remote-shutdown options but can continue production” or at least a policy-level “there are enforcers sitting outside the compute centers that could choose to forcibly shut it down fairly quickly”.