Thomas Larsen comments on Plans A, B, C, and D for misalignment risk

Thomas Larsen 9 Oct 2025 19:15 UTC
4 points
0
I think I mostly am on board with this comment. Some thoughts:
Before I did a rapid-growth of capabilities, I would want a globally set target of “we are able to make some kind of interpretability strides or evals that let us make better able to predict the outcome of the next training run.” (
- this feels a bit overly binary to me. I think that understanding-based safety cases will be necessary for ASI. But behavioral methods seem like they might be sufficient before hand.
- I don’t know what you mean by “rapid growth”. It seems like you might be imagining the “shut it all down → solve alignment during pause → rapidly scale after you’ve solved alignment” plan. I think we probably should never do a “rapid scaleup”
Another reaction I have is that a constraint to coordination will probably be “is the other guy doing a blacksite which will screw us over”. So I think there’s a viability bump at the point of “allow legal capabiliites scaling at least as fast as the max size blacksite that you would have a hard time detecting”.
- I would want to do at least some early global pause on large training runs, to check if you are actually capable of doing that at all. (in conjunction with some efforts attempting to build international goodwill about it)
So I think this paragraph isn’t really right, because “slowdown’ != ‘pause’, and slowdowns might still be really really helpful and enough to get you a long way.
- One of the more important things to do as soon as it’s viable, is to stop production of more compute in an uncontrolled fashion. (I’m guessing this plays out with some kind of pork deals for nVidia and other leaders^[2], where the early steps are ‘consolidate compute’, and then them producing the chips that are more monitorable, and which they get to make money from, but also are sort of nationalized). This prevents a big overhang.
I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I’m not sure exactly what you mean by “in an uncontrolled fashion”.. if you mean “have a bunch of inspectors making sure the flow of new chips isn’t being smuggled to illegal projects”, then I agree with this, on my initial read I thought you meant something like “pause chip production until they start producing GPUS with HEMs in them”, which I think is probably bad.
In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are:
1. compute is controllable, far more than software, and so differentially advances legal projects.
2. more compute for safety. We want to be able to pay a big safety tax, more compute straightforwardly helps.
3. extra compute progress funges against software progress, which is scarier.
4. compute is destroyable (e.g. we can reverse and destroy compute, if we want to eat an overhang), but software progress mostly isn’t (you can’t unpublish reserach).
(this comment might be confusing because I typed it quickly, happy to clarify if you want)
- Raemon 9 Oct 2025 19:47 UTC
  4 points
  0
  Parent
  So I think this paragraph isn’t really right, because “slowdown’ != ‘pause’, and slowdowns might still be really really helpful and enough to get you a long way.
  I think “everyone agrees to a noticeably smaller next-run-size” seems like a fine thing to do as the first coordination attempt.
  I think there is something good about having an early step (maybe after that one), which somehow forces people to actually orient on “okay, suppose we actually had to prioritize interpretability and evals now until they were able to keep pace with capabilities, how would we seriously do that?”
  (I don’t currently have a good operationalize of this that seems robust, but, it seems plausible by the time we’re meaningfully able to decide to do anything like this, someone may have come up with a good policy with that effect. I can definitely see this backfiring and causing people to get better at some kind of software that is then harder to control).
  I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I’m not sure exactly what you mean by “in an uncontrolled fashion”.. if you mean “have a bunch of inspectors making sure the flow of new chips isn’t being smuggled to illegal projects”, then I agree with this, on my initial read I thought you meant something like “pause chip production until they start producing GPUS with HEMs in them”, which I think is probably bad.
  In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are:
  compute is controllable, far more than software, and so differentially advances legal projects.
  more compute for safety. We want to be able to pay a big safety tax, more compute straightforwardly helps.
  extra compute progress funges against software progress, which is scarier.
  compute is destroyable (e.g. we can reverse and destroy compute, if we want to eat an overhang), but software progress mostly isn’t (you can’t unpublish reserach).
  Mmm, nod I can see it. I’d need to think more to figure out a considered opinion on this but seems a-priori reasonable.
  I think one of the things I want is to have executed each type of control you might want to exert, at least for a shorter period of time, to test whether you’re able to do it at all. But, having the early compute steps be more focused on “they have remote-shutdown options but can continue production” or at least a policy-level “there are enforcers sitting outside the compute centers that could choose to forcibly shut it down fairly quickly”.