Would you agree that AI R&D and datacenter security are safety-critical domains?
(Not saying such deployment has started yet, or at least not to a sufficient level to be concerned. But e.g. I would say that if you are going to have loads of very smart AI agents doing lots of autonomous coding and monitoring of your datacenters, analogous to as if they were employees, then they pose an ‘insider threat’ risk, and could potentially e.g. sabotage their successor systems or the alignment or security work happening in the company. Misalignments in these sorts of AIs could, in various ways, end up causing misalignments in successor AIs. During an intelligence explosion / period of AI R&D automation, this could result in misaligned ASI. True, such ASI would not be deployed outside the datacenter yet, but I think the point to intervene is before then, rather than after.)
I think “AI R&D” or “datacenter security” are a little too broad.
I can imagine cases where we could deploy even existing models as an extra layer for datacenter security (e.g. anomaly detection). As long as this is for adding security (not replacing humans), and we are not relying on 100% success of this model, then this can be a positive application, and certainly not one that should be “paused.”
With AI R&D again the question is how you deploy it, if you are using a model in containers supervised by human employees then that’s fine. If you are letting them autonomously carry out large scale training runs with little to no supervision that is a completely different matter.
At the moment, I think the right mental model is to think of current AI models as analogous to employees that have a certain skill profile (which we can measure via evals etc..) and also with some small probability could do something completely crazy. With appropriate supervision, such employees could also be useful, but you would not fully trust them with sensitive infrastructure.
As I wrote in my essay, I think the difficult point would be if we get to the “alignment uncanny valley”—alignment is at sufficiently good level (e.g., probability of failure be small enough) so that people are actually tempted to entrust models with such sensitive tasks, but we don’t have strong control of this probability to ensure we can drive it arbitrarily close to zero, and so there are risks of edge cases.
I’m surprised by the implication here which, if I read you correctly, is a belief that AI hasn’t yet been deployed ot safety-critical domains?
OpenAI has a ton of usage related to healthcare, for instance. I think that this is basically all fine, well-justified and very likely net-positive, but it does strike me as a safety-critical domain. Does it not to you?
“Healthcare” is pretty broad—certainly some parts of it are safety critical and some are less. I am not familiar with all the applications of language models for healthcare but If you are using LLM for improving efficiency in healthcare documentation then I would not call it safety critical. If you are connecting an LLM to a robot performing surgery then I would call it safety critical.
It’s also a question of whether AIs outputs are used without supervision. If doctors or patients ask a charbot questions, I would not call it safety critical since the AI is not autonomously making the decisions.
Fair distinctions yeah. I’d still be surprised if AI isn’t yet deployed in safety-critical domains, but I hear you re: the view on specific healthcare stuff
I am much more optimistic in getting AIs to reliably follow instructions (see https://www.lesswrong.com/posts/faAX5Buxc7cdjkXQG/machines-of-faithful-obedience )
But agree that we should not deploy systems (whether AI or not) in safety critical domains without extensive testing.
I don’t think that’s a very controversial opinion. In fact I’m not sure “pause” is the right term since I don’t think such deployment has started.
Would you agree that AI R&D and datacenter security are safety-critical domains?
(Not saying such deployment has started yet, or at least not to a sufficient level to be concerned. But e.g. I would say that if you are going to have loads of very smart AI agents doing lots of autonomous coding and monitoring of your datacenters, analogous to as if they were employees, then they pose an ‘insider threat’ risk, and could potentially e.g. sabotage their successor systems or the alignment or security work happening in the company. Misalignments in these sorts of AIs could, in various ways, end up causing misalignments in successor AIs. During an intelligence explosion / period of AI R&D automation, this could result in misaligned ASI. True, such ASI would not be deployed outside the datacenter yet, but I think the point to intervene is before then, rather than after.)
I think “AI R&D” or “datacenter security” are a little too broad.
I can imagine cases where we could deploy even existing models as an extra layer for datacenter security (e.g. anomaly detection). As long as this is for adding security (not replacing humans), and we are not relying on 100% success of this model, then this can be a positive application, and certainly not one that should be “paused.”
With AI R&D again the question is how you deploy it, if you are using a model in containers supervised by human employees then that’s fine. If you are letting them autonomously carry out large scale training runs with little to no supervision that is a completely different matter.
At the moment, I think the right mental model is to think of current AI models as analogous to employees that have a certain skill profile (which we can measure via evals etc..) and also with some small probability could do something completely crazy. With appropriate supervision, such employees could also be useful, but you would not fully trust them with sensitive infrastructure.
As I wrote in my essay, I think the difficult point would be if we get to the “alignment uncanny valley”—alignment is at sufficiently good level (e.g., probability of failure be small enough) so that people are actually tempted to entrust models with such sensitive tasks, but we don’t have strong control of this probability to ensure we can drive it arbitrarily close to zero, and so there are risks of edge cases.
I’m surprised by the implication here which, if I read you correctly, is a belief that AI hasn’t yet been deployed ot safety-critical domains?
OpenAI has a ton of usage related to healthcare, for instance. I think that this is basically all fine, well-justified and very likely net-positive, but it does strike me as a safety-critical domain. Does it not to you?
“Healthcare” is pretty broad—certainly some parts of it are safety critical and some are less. I am not familiar with all the applications of language models for healthcare but If you are using LLM for improving efficiency in healthcare documentation then I would not call it safety critical. If you are connecting an LLM to a robot performing surgery then I would call it safety critical.
It’s also a question of whether AIs outputs are used without supervision. If doctors or patients ask a charbot questions, I would not call it safety critical since the AI is not autonomously making the decisions.
Fair distinctions yeah. I’d still be surprised if AI isn’t yet deployed in safety-critical domains, but I hear you re: the view on specific healthcare stuff