As someone who has been advocating a lot recently for more emphasis on safety evaluations and testing generally, I here again feel compelled to say that you really ought to include a ‘test’ phase in-between training and deployment.
Testing
Parameters are fixed.
Failures are fine.
Model is sandboxed.
Fixed and/or dynamic testing distribution. Chosen specifically to be different in important ways from the training distribution.
Furthermore, I think that have sharp distinctions between each phase, and well-sandboxed train and test environments is of critical importance going forward. I think trying to get this to be the case is going to take some work from AI governance and safety-advocate folks. I absolutely don’t think we should quietly surrender this point.
I basically agree, ensuring that failures are fine during training would sure be great. (And I also agree that if we have a setting where failure is fine, we want to use that for a bunch of evaluation/red-teaming/...). As two caveats, there are definitely limits to how powerful an AI system you can sandbox IMO, and I’m not sure how feasible sandboxing even weak-ish models is from the governance side (WebGPT-style training just seems really useful).
I think this takes away a lot of the disadvantages, especially if you combine this with some amount of simulation of backends to give interactability (not an intractably hard task, but would take some funding and development time).
As someone who has been advocating a lot recently for more emphasis on safety evaluations and testing generally, I here again feel compelled to say that you really ought to include a ‘test’ phase in-between training and deployment.
Testing
Parameters are fixed.
Failures are fine.
Model is sandboxed.
Fixed and/or dynamic testing distribution. Chosen specifically to be different in important ways from the training distribution.
Furthermore, I think that have sharp distinctions between each phase, and well-sandboxed train and test environments is of critical importance going forward. I think trying to get this to be the case is going to take some work from AI governance and safety-advocate folks. I absolutely don’t think we should quietly surrender this point.
I basically agree, ensuring that failures are fine during training would sure be great. (And I also agree that if we have a setting where failure is fine, we want to use that for a bunch of evaluation/red-teaming/...). As two caveats, there are definitely limits to how powerful an AI system you can sandbox IMO, and I’m not sure how feasible sandboxing even weak-ish models is from the governance side (WebGPT-style training just seems really useful).
Yes, I agree that actually getting companies to consistently use the sandboxing is the hardest piece of the puzzle. I think there’s some promising advancements making it easier to have the benefits of not-sandboxing despite being in a sandbox. For instance: Using a snapshot of the entire web, hosted on an isolated network https://ai.facebook.com/blog/introducing-sphere-meta-ais-web-scale-corpus-for-better-knowledge-intensive-nlp/
I think this takes away a lot of the disadvantages, especially if you combine this with some amount of simulation of backends to give interactability (not an intractably hard task, but would take some funding and development time).