Basic science and pure mathematics enable their own subsequent iterations without having them as explicit targets or even without being able to imagine these developments, while doing the work crucial in making them possible.
Extensive preparation never happened with a thing that is ready to be attempted experimentally, because in those cases we just do the experiments, there is no reason not to. With AGI, the reason not to do this is the unbounded blast radius of a failure, an unprecedented problem. Unprecedented things are less plausible, but unfortunately this can’t be expected to have happened before, because then you are no longer here to update on the observation.
If the blast radius is not unbounded, if most failures can be contained, then it’s more reasonable to attempt to develop AGI in the usual way, without extensive preparation that doesn’t involve actually attempting to build it. If preparation in general doesn’t help, it doesn’t help AGIs either, making them less dangerous and reducing the scope of failure, and so preparation for building them is not as needed. If preparation does help, it also helps AGIs, and so preparation is needed.
If the blast radius is not unbounded, if most failures can be contained, then it’s more reasonable to attempt to develop AGI in the usual way, without extensive preparation that doesn’t involve actually attempting to build it
Is it true or not true that there is no evidence for an “unbounded” blast radius for any AI model someone has trained. I am not aware of any evidence.
What would constitute evidence that the situation was now in the “unbounded” failure case? How would you prove it?
So we don’t end up in a loop, assume someone has demonstrated a major danger with current AI models. Assume there is a really obvious method of control that will contain the problem. Now what? It seems to me like the next step would be to restrict AI development in a similar way to how cobalt-60 sources are restricted, where only institutions with licenses, inspections, and methods of control can handle the stuff, but that’s still not a pause...
When could you ever reach a situation where a stronger control mechanism won’t work?
Like I try to imagine it, and I can imagine more and more layers of defense—“don’t read anything the model wrote”, “more firewalls, more isolation, servers in a salt mine”—but never a point where you couldn’t agree it was under control. Like if you make a radioactive source more radioactive you just add more inches of shielding until the dose is acceptable.
The blast radius of AGIs is unbounded in the same way as that of humanity, there is potential for taking over all of the future. There are many ways of containing it, and alignment is a way of making the blast a good thing. The point is that a sufficiently catastrophic failure that doesn’t involve containing the blast is unusually impactful. Arguments about ease of containing the blast are separate from this point in the way I intended it.
If you don’t expect AGIs to become overwhelmingly powerful faster than they are made robustly aligned, containing the blast takes care of itself right until it becomes unnecessary. But with the opposite expectation, containing becomes both necessary (since early AGIs are not yet robustly aligned) and infeasible (since early AGIs are very powerful). So there’s a question of which expectation is correct, but the consequences of either position seem to straightforwardly follow.
Do any examples of preparation over an extended length of time exist in human history?
I would suspect they do not for the simple reason that preparation in advance of a need you don’t have, has no roi.
Basic science and pure mathematics enable their own subsequent iterations without having them as explicit targets or even without being able to imagine these developments, while doing the work crucial in making them possible.
Extensive preparation never happened with a thing that is ready to be attempted experimentally, because in those cases we just do the experiments, there is no reason not to. With AGI, the reason not to do this is the unbounded blast radius of a failure, an unprecedented problem. Unprecedented things are less plausible, but unfortunately this can’t be expected to have happened before, because then you are no longer here to update on the observation.
If the blast radius is not unbounded, if most failures can be contained, then it’s more reasonable to attempt to develop AGI in the usual way, without extensive preparation that doesn’t involve actually attempting to build it. If preparation in general doesn’t help, it doesn’t help AGIs either, making them less dangerous and reducing the scope of failure, and so preparation for building them is not as needed. If preparation does help, it also helps AGIs, and so preparation is needed.
Is it true or not true that there is no evidence for an “unbounded” blast radius for any AI model someone has trained. I am not aware of any evidence.
What would constitute evidence that the situation was now in the “unbounded” failure case? How would you prove it?
So we don’t end up in a loop, assume someone has demonstrated a major danger with current AI models. Assume there is a really obvious method of control that will contain the problem. Now what? It seems to me like the next step would be to restrict AI development in a similar way to how cobalt-60 sources are restricted, where only institutions with licenses, inspections, and methods of control can handle the stuff, but that’s still not a pause...
When could you ever reach a situation where a stronger control mechanism won’t work?
Like I try to imagine it, and I can imagine more and more layers of defense—“don’t read anything the model wrote”, “more firewalls, more isolation, servers in a salt mine”—but never a point where you couldn’t agree it was under control. Like if you make a radioactive source more radioactive you just add more inches of shielding until the dose is acceptable.
The blast radius of AGIs is unbounded in the same way as that of humanity, there is potential for taking over all of the future. There are many ways of containing it, and alignment is a way of making the blast a good thing. The point is that a sufficiently catastrophic failure that doesn’t involve containing the blast is unusually impactful. Arguments about ease of containing the blast are separate from this point in the way I intended it.
If you don’t expect AGIs to become overwhelmingly powerful faster than they are made robustly aligned, containing the blast takes care of itself right until it becomes unnecessary. But with the opposite expectation, containing becomes both necessary (since early AGIs are not yet robustly aligned) and infeasible (since early AGIs are very powerful). So there’s a question of which expectation is correct, but the consequences of either position seem to straightforwardly follow.