An AI control solution is per se a way to control what a AI is doing. If you have AI control, you have the option to tell your AI, don’t go FOOM, and have that work.
You would not expect a control measure to continue to work if you told an AI under an AI control protocol to go FOOM.
Improvements in training efficiency are only realized if you actually train the model, and AI control allows you to take the decision to realize training efficiency gains by training a model to a higher point of performance away from the AI that is controlled.
FOOM for software requires that that decision is always yes (either since people keep pushing or the model is in the drivers seat).
So put broadly, the AI control agendas answer to what should you do with an AI system that could go FOOM is not to let it try. Since before it goes FOOM, the model is not able to beat the controls, and going FOOM takes time where the model is working on improving itself not trying hard to not get violently disassembled, an AI control protocol is supposed to be able to turn an AI that goes FOOM over the explicit controls over the course of hours, weeks or months into a deactivated machine.
AI control protocols want to fail loud for this reason. (But a breakout will involving trying to get silent failure for the same reason)
An AI control solution is per se a way to control what a AI is doing. If you have AI control, you have the option to tell your AI, don’t go FOOM, and have that work.
You would not expect a control measure to continue to work if you told an AI under an AI control protocol to go FOOM.
Improvements in training efficiency are only realized if you actually train the model, and AI control allows you to take the decision to realize training efficiency gains by training a model to a higher point of performance away from the AI that is controlled.
FOOM for software requires that that decision is always yes (either since people keep pushing or the model is in the drivers seat).
So put broadly, the AI control agendas answer to what should you do with an AI system that could go FOOM is not to let it try. Since before it goes FOOM, the model is not able to beat the controls, and going FOOM takes time where the model is working on improving itself not trying hard to not get violently disassembled, an AI control protocol is supposed to be able to turn an AI that goes FOOM over the explicit controls over the course of hours, weeks or months into a deactivated machine.
AI control protocols want to fail loud for this reason. (But a breakout will involving trying to get silent failure for the same reason)