In engineering and design, there is a process that includes, among other stages, specification, creation, verification and validation, and deployment. Verification and validation are where most people focus when thinking about safety—can we make sure the system performs correctly?
Factually, no, I don’t think this is where most people’s thoughts are. Apart from the stages of the engineering process that you enumerated, there are also manufacturing (i.e., training, in the case of ML systems) and operations (post-deployment phase). AI safety “thought” is more-or-less evenly distributed between design (thinking about how systems should be engineered/architectured/designed in order to be safe), manufacturing (RLHF, scalable oversight, etc.), V&V (evals, red teaming), and operations (monitoring). I discussed this in a little more detail here.
There is no pre-defined standard or set of characteristics that are checked, no real ability to consider application specific requirements, and no ability to specify where the system should not or must not be used.
No standard—agree, “no ability to consider requirements”—disagree, you can consider “requirements”[1]. “No ability to specify where the system should not or must not be used”—agreed if you specifically use the verb “to specify”, but we also can consider “where the system should not or must not be used”.
Until we have some safety standard for machine learning models, they aren’t “partly safe” or “assumed safe,” or “good enough for consumer use.” If we lack a standard for safety, ideally one where there is consensus that it is sufficient for a specific application, then exploration or verification of the safety of a machine learning model is meaningless.
You seem to try to import quite an outdated understanding of safety and reliability engineering. Standards are impossible in the case of AI, but they are also unnecessary, as was evidenced by the experience in various domains of engineering, where the presence of standards doesn’t “save” one from drifting into failures.
So, I disagree that evals and red teaming in application to AI are “meaningless” because there are no standards.
However, I agree that we must actively invite state-of-the-art thinking about high reliability and resilience in engineering to the conversation about AI safety & resilience, AGI org design for reliability, etc. I’d love to see OpenAI, Anthropic, and Google hire top thinkers in this field as staff or consultants.
Modern systems engineering methodology moves away from “requirements”, i.e., deontic specification of what a system should/must do or not do, to more descriptive modality, i.e., use cases. This transition is already complete in modern software engineering, where classic engineering “requirements” are unheard of in the last 10 years. But other engineering fields (electrical, robotic, etc.) move in this direction, too.
AI safety “thought” is more-or-less evenly distributed
Agreed—I wasn’t criticizing AI safety here, I was talking about the conceptual models that people outside of AI safety have—as was mentioned in several other comments. So my point was about what people outside of AI safety think about when talking about ML models, trying to correct a broken mental model.
So, I disagree that evals and red teaming in application to AI are “meaningless” because there are no standards.
I did not say anything about evals and red teaming in application to AI, other than in comments where I said I think they are a great idea. And the fact that they are happening very clearly implies that there is some possibility that the models perform poorly, which, again, was the point.
You seem to try to import quite an outdated understanding of safety and reliability engineering.
Perhaps it’s outdated, but it is the understanding which engineers who I have spoken to who work on reliability and systems engineering actually have, and it matches research I did on resilience most of a decade ago, e.g. this. And I agree that there is discussion in both older and more recent journal articles about how some firms do things in various ways that might be an improvement, but it’s not the standard. And even when doing agile systems engineering, use cases more often supplement or exist alongside requirements, they don’t replace them. Though terminology in this domain is so far from standardized that you’d need to talk about a specific company, or even a specific project’s process and definitions to have a more meaningful discussion.
Standards are impossible in the case of AI, but they are also unnecessary, as was evidenced by the experience in various domains of engineering, where the presence of standards doesn’t “save” one from drifting into failures.
I don’t disagree with the conclusion, but the logic here simply doesn’t work to prove anything. It implies that standards are insufficient, not that they are not necessary.
In engineering and design, there is a process that includes, among other stages, specification, creation, verification and validation, and deployment. Verification and validation are where most people focus when thinking about safety—can we make sure the system performs correctly? I think this is a conceptual error that I want to address.
It seems that you put the first two sentences “in the mouth of people outside of AI safety”, and they describe some conceptual error, while the third sentence is “yours”. However, I don’t understand what exactly is the error you are trying to correct because the first sentence is uncontroversial, and the second sentence is a question, so I don’t understand what (erroneous) idea does it express. It’s really unclear what you are trying to say here.
I did not say anything about evals and red teaming in application to AI, other than in comments where I said I think they are a great idea.
I don’t understand how else to interpret the sentence from the post “If we lack a standard for safety, ideally one where there is consensus that it is sufficient for a specific application, then exploration or verification of the safety of a machine learning model is meaningless.”, because to me, evals and red teaming are “exploration and verification of the safety of a machine learning model” (unless you want to say that the word “verification” cannot apply if there are no standards, but then just replace it with “checking”). So, again, I’m very confused about what you are trying to say :(
Perhaps it’s outdated, but it is the understanding which engineers who I have spoken to who work on reliability and systems engineering actually have, and it matches research I did on resilience most of a decade ago, e.g. this.
My statement that you import an outdated view was based on that I understood that you declared “evals and red teaming meaningless in the absence of standards”. If this is not your statement, there is no import of outdated understanding.
I mean, standards are useful. They are sort of like industry-wide, strictly imposed “checklists”, and checklists do help with reliability overall. When checklists are introduced, the number of incidents goes down reliably. But, it’s also recognised that it doesn’t go down to zero, and the presence of a standard shouldn’t reduce the vigilance of anyone involved, especially when we are dealing with such a high stakes thing as AI.
So, introducing standards of AI safety based on some evals and red teaming benchmarks would be good. While cultivating a shared recognition that these “standards” absolutely don’t guarantee safety, and marketing, PR, GR, and CEOs shouldn’t use the phrases like “our system is safe because it complies with the standard”. Maybe to prevent abuse, this standard should be called something like “AI safety baseline standard”. Also, it’s important to recognise that the standard will mostly exist for the catching-up crowd of companies and orgs building AIs. Checking the most powerful and SoTA systems against the “standards” at the leading labs will be only a very small part of the safety and alignment engineering process that should lead to their release.
Do you agree with this? Which particular point from the two paragraphs above “people outside of AI safety” are confused about or don’t realise?
Factually, no, I don’t think this is where most people’s thoughts are. Apart from the stages of the engineering process that you enumerated, there are also manufacturing (i.e., training, in the case of ML systems) and operations (post-deployment phase). AI safety “thought” is more-or-less evenly distributed between design (thinking about how systems should be engineered/architectured/designed in order to be safe), manufacturing (RLHF, scalable oversight, etc.), V&V (evals, red teaming), and operations (monitoring). I discussed this in a little more detail here.
No standard—agree, “no ability to consider requirements”—disagree, you can consider “requirements”[1]. “No ability to specify where the system should not or must not be used”—agreed if you specifically use the verb “to specify”, but we also can consider “where the system should not or must not be used”.
You seem to try to import quite an outdated understanding of safety and reliability engineering. Standards are impossible in the case of AI, but they are also unnecessary, as was evidenced by the experience in various domains of engineering, where the presence of standards doesn’t “save” one from drifting into failures.
So, I disagree that evals and red teaming in application to AI are “meaningless” because there are no standards.
However, I agree that we must actively invite state-of-the-art thinking about high reliability and resilience in engineering to the conversation about AI safety & resilience, AGI org design for reliability, etc. I’d love to see OpenAI, Anthropic, and Google hire top thinkers in this field as staff or consultants.
Modern systems engineering methodology moves away from “requirements”, i.e., deontic specification of what a system should/must do or not do, to more descriptive modality, i.e., use cases. This transition is already complete in modern software engineering, where classic engineering “requirements” are unheard of in the last 10 years. But other engineering fields (electrical, robotic, etc.) move in this direction, too.
Agreed—I wasn’t criticizing AI safety here, I was talking about the conceptual models that people outside of AI safety have—as was mentioned in several other comments. So my point was about what people outside of AI safety think about when talking about ML models, trying to correct a broken mental model.
I did not say anything about evals and red teaming in application to AI, other than in comments where I said I think they are a great idea. And the fact that they are happening very clearly implies that there is some possibility that the models perform poorly, which, again, was the point.
Perhaps it’s outdated, but it is the understanding which engineers who I have spoken to who work on reliability and systems engineering actually have, and it matches research I did on resilience most of a decade ago, e.g. this. And I agree that there is discussion in both older and more recent journal articles about how some firms do things in various ways that might be an improvement, but it’s not the standard. And even when doing agile systems engineering, use cases more often supplement or exist alongside requirements, they don’t replace them. Though terminology in this domain is so far from standardized that you’d need to talk about a specific company, or even a specific project’s process and definitions to have a more meaningful discussion.
I don’t disagree with the conclusion, but the logic here simply doesn’t work to prove anything. It implies that standards are insufficient, not that they are not necessary.
Ok, in this passage:
It seems that you put the first two sentences “in the mouth of people outside of AI safety”, and they describe some conceptual error, while the third sentence is “yours”. However, I don’t understand what exactly is the error you are trying to correct because the first sentence is uncontroversial, and the second sentence is a question, so I don’t understand what (erroneous) idea does it express. It’s really unclear what you are trying to say here.
I don’t understand how else to interpret the sentence from the post “If we lack a standard for safety, ideally one where there is consensus that it is sufficient for a specific application, then exploration or verification of the safety of a machine learning model is meaningless.”, because to me, evals and red teaming are “exploration and verification of the safety of a machine learning model” (unless you want to say that the word “verification” cannot apply if there are no standards, but then just replace it with “checking”). So, again, I’m very confused about what you are trying to say :(
My statement that you import an outdated view was based on that I understood that you declared “evals and red teaming meaningless in the absence of standards”. If this is not your statement, there is no import of outdated understanding.
I mean, standards are useful. They are sort of like industry-wide, strictly imposed “checklists”, and checklists do help with reliability overall. When checklists are introduced, the number of incidents goes down reliably. But, it’s also recognised that it doesn’t go down to zero, and the presence of a standard shouldn’t reduce the vigilance of anyone involved, especially when we are dealing with such a high stakes thing as AI.
So, introducing standards of AI safety based on some evals and red teaming benchmarks would be good. While cultivating a shared recognition that these “standards” absolutely don’t guarantee safety, and marketing, PR, GR, and CEOs shouldn’t use the phrases like “our system is safe because it complies with the standard”. Maybe to prevent abuse, this standard should be called something like “AI safety baseline standard”. Also, it’s important to recognise that the standard will mostly exist for the catching-up crowd of companies and orgs building AIs. Checking the most powerful and SoTA systems against the “standards” at the leading labs will be only a very small part of the safety and alignment engineering process that should lead to their release.
Do you agree with this? Which particular point from the two paragraphs above “people outside of AI safety” are confused about or don’t realise?