Leon Lang comments on Leon Lang’s Shortform

Leon Lang 16 Apr 2025 11:14 UTC
23 points
2
An interesting part in OpenAI’s new version of the preparedness framework:
- Zach Stein-Perlman 16 Apr 2025 17:37 UTC
  7 points
  0
  Parent
  I think this stuff is mostly a red herring: the safety standards in OpenAI’s new PF are super vague and so it will presumably always be able to say it meets them and will never have to use this.^[1]
  But if this ever matters, I think it’s good: it means OpenAI is more likely to make such a public statement and is slightly less incentivized to deceive employees + external observers about capabilities and safeguard adequacy. OpenAI unilaterally pausing is not on the table; if safeguards are inadequate, I’d rather OpenAI say so.
  1. ^
    I think my main PF complaints are:
    The High standard is super vague: just like “safeguards should sufficiently minimize the risk of severe harm” + level of evidence is totally unspecified for “potential safeguard efficacy assessments.” And some of the misalignment safeguards are confused/bad, and this is bad since the PF they may be disjunctive — if OpenAI is wrong about a single “safeguard efficacy assessment” that makes the whole plan invalid. And it’s bad that misalignment safeguards are only clearly triggered by cyber capabilities, especially since the cyber High threshold is vague / too high.
    For more see OpenAI rewrote its Preparedness Framework.
  - cubefox 16 Apr 2025 18:50 UTC
    17 points
    16
    Parent
    Unrelated to vagueness they can also just change the framework again at any time.
- Simon Lermen 16 Apr 2025 13:13 UTC
  3 points
  0
  Parent
  This seems to have been foreshadowed by this tweet in February:
  https://x.com/ChrisPainterYup/status/1886691559023767897
  Would be good to keep track of this change