Asset protection (e.g., restricting access to models to a limited nameset of people, general infosec)
Restricting deployment (only models with a risk score of “medium” or below can be deployed)
Restricting development (models with a risk score of “critical” cannot be developed further until safety techniques have been applied that get it down to “high.” Although they kind of get to decide when they think their safety techniques have worked sufficiently well.)
My one-sentence reaction after reading the doc for the first time is something like “it doesn’t really tell us how OpenAI plans to address the misalignment risks that many of us are concerned about, but with that in mind, it’s actually a fairly reasonable document with some fairly concrete commitments”).
They mention three types of mitigations:
Asset protection (e.g., restricting access to models to a limited nameset of people, general infosec)
Restricting deployment (only models with a risk score of “medium” or below can be deployed)
Restricting development (models with a risk score of “critical” cannot be developed further until safety techniques have been applied that get it down to “high.” Although they kind of get to decide when they think their safety techniques have worked sufficiently well.)
My one-sentence reaction after reading the doc for the first time is something like “it doesn’t really tell us how OpenAI plans to address the misalignment risks that many of us are concerned about, but with that in mind, it’s actually a fairly reasonable document with some fairly concrete commitments”).