I think this stuff is mostly a red herring: the safety standards in OpenAI’s new PF are super vague and so it will presumably always be able to say it meets them and will never have to use this.[1]
But if this ever matters, I think it’s good: it means OpenAI is more likely to make such a public statement and is slightly less incentivized to deceive employees + external observers about capabilities and safeguard adequacy. OpenAI unilaterally pausing is not on the table; if safeguards are inadequate, I’d rather OpenAI say so.
The High standard is super vague: just like “safeguards should sufficiently minimize the risk of severe harm” + level of evidence is totally unspecified for “potential safeguard efficacy assessments.” And some of the misalignment safeguards are confused/bad, and this is bad since the PF they may be disjunctive — if OpenAI is wrong about a single “safeguard efficacy assessment” that makes the whole plan invalid. And it’s bad that misalignment safeguards are only clearly triggered by cyber capabilities, especially since the cyber High threshold is vague / too high.
I think this stuff is mostly a red herring: the safety standards in OpenAI’s new PF are super vague and so it will presumably always be able to say it meets them and will never have to use this.[1]
But if this ever matters, I think it’s good: it means OpenAI is more likely to make such a public statement and is slightly less incentivized to deceive employees + external observers about capabilities and safeguard adequacy. OpenAI unilaterally pausing is not on the table; if safeguards are inadequate, I’d rather OpenAI say so.
I think my main PF complaints are:
The High standard is super vague: just like “safeguards should sufficiently minimize the risk of severe harm” + level of evidence is totally unspecified for “potential safeguard efficacy assessments.” And some of the misalignment safeguards are confused/bad, and this is bad since the PF they may be disjunctive — if OpenAI is wrong about a single “safeguard efficacy assessment” that makes the whole plan invalid. And it’s bad that misalignment safeguards are only clearly triggered by cyber capabilities, especially since the cyber High threshold is vague / too high.
For more see OpenAI rewrote its Preparedness Framework.
Unrelated to vagueness they can also just change the framework again at any time.