I think “AI R&D” or “datacenter security” are a little too broad.
I can imagine cases where we could deploy even existing models as an extra layer for datacenter security (e.g. anomaly detection). As long as this is for adding security (not replacing humans), and we are not relying on 100% success of this model, then this can be a positive application, and certainly not one that should be “paused.”
With AI R&D again the question is how you deploy it, if you are using a model in containers supervised by human employees then that’s fine. If you are letting them autonomously carry out large scale training runs with little to no supervision that is a completely different matter.
At the moment, I think the right mental model is to think of current AI models as analogous to employees that have a certain skill profile (which we can measure via evals etc..) and also with some small probability could do something completely crazy. With appropriate supervision, such employees could also be useful, but you would not fully trust them with sensitive infrastructure.
As I wrote in my essay, I think the difficult point would be if we get to the “alignment uncanny valley”—alignment is at sufficiently good level (e.g., probability of failure be small enough) so that people are actually tempted to entrust models with such sensitive tasks, but we don’t have strong control of this probability to ensure we can drive it arbitrarily close to zero, and so there are risks of edge cases.
I think “AI R&D” or “datacenter security” are a little too broad.
I can imagine cases where we could deploy even existing models as an extra layer for datacenter security (e.g. anomaly detection). As long as this is for adding security (not replacing humans), and we are not relying on 100% success of this model, then this can be a positive application, and certainly not one that should be “paused.”
With AI R&D again the question is how you deploy it, if you are using a model in containers supervised by human employees then that’s fine. If you are letting them autonomously carry out large scale training runs with little to no supervision that is a completely different matter.
At the moment, I think the right mental model is to think of current AI models as analogous to employees that have a certain skill profile (which we can measure via evals etc..) and also with some small probability could do something completely crazy. With appropriate supervision, such employees could also be useful, but you would not fully trust them with sensitive infrastructure.
As I wrote in my essay, I think the difficult point would be if we get to the “alignment uncanny valley”—alignment is at sufficiently good level (e.g., probability of failure be small enough) so that people are actually tempted to entrust models with such sensitive tasks, but we don’t have strong control of this probability to ensure we can drive it arbitrarily close to zero, and so there are risks of edge cases.